2026 - Colabra

I've been building with LLMs since March 2023

The full arc of how Colabra adopted LLMs, from the first GPT ticket in March 2023 through hallucination fixes, vector search, the pivot, and prompt cache optimization today.

People ask me if Colabra is an AI company or a company that uses AI. The honest answer is: I don't think that distinction means much anymore. What I can tell you is when we started, and what we were doing at each step.

The first Linear ticket in our codebase that mentions GPT is from March 15, 2023. It's called "AI Copilot POC." My cofounder Philip wired an OpenAI GPT-powered assistant into a Jupyter notebook, running over real Colabra experimental data. We shipped a working version four weeks later.

In March 2023, ChatGPT had been public for three months. Nobody I knew in our industry was shipping LLM features yet. We were a pre-clinical R&D platform then, not an M&A diligence platform, but the instinct was the same as it is today: find the hardest cognitive bottleneck in a customer's workflow, and see if a language model can absorb it.

One of the people who shaped that instinct early was Ashley Pilipiszyn, former Technical Director at OpenAI. Ashley helped sharpen my sense that architecture mattered, secure deployment mattered, enterprise trust mattered, and the ceiling for an AI-native company was much higher than a feature wrapped around a model. That influence showed up in the way I thought about cloud choices, security posture, hiring, and product ambition long before the market had caught up to the idea that AI would become infrastructure. When I say Colabra was AI-native from day one, I don't mean we put "AI" on a pitch deck. I mean the people we learned from were building the stack the rest of the industry is still catching up to.

That ticket matters because everything we've built since has been a refinement of the same loop. Not a reinvention. A refinement.

The first hard problem was hallucination

By August 2023, we had the Copilot live in product. It worked. It also lied.

The fix was embarrassingly simple in retrospect. We added one line to the master prompt: "Never reply with information not in the context." Our ticket for this is called "Stricter AI instructions." It's three sentences long. It was one of the most important things we shipped that year.

This is the first lesson I'd give a founder thinking about building with LLMs: the hard problems aren't usually the ones you think. We thought the hard problem was getting the model to answer. The actual hard problem was getting the model to refuse to answer when it didn't know.

Vector search, quietly, in 2023

In November 2023, we migrated our search layer from Azure Cognitive Search's full-text mode to the vector hybrid mode. Our ticket described the upgrade as letting users ask things like "Find experiments that struggled with E. coli growth." Semantic retrieval against internal data.

We became a RAG company in 2023. We didn't announce it. We didn't blog about it. We just shipped it, because customers were asking questions that keyword search couldn't answer.

This is when most of the industry was still debating whether RAG was a real pattern or a fad. It was real. It still is. Most of what people now call "agentic AI" is RAG with a better loop around it.

The pivot we almost didn't make

In August 2024, we sent an investor update with the subject line "we've made a pivot." We were no longer a pre-clinical R&D platform. We were becoming a buy-side M&A due diligence platform.

The Linear tickets from spring 2024 show a product team still shipping features for scientists. Experiment metadata, JSC reports, alliance management. Our customer calls in April and May 2024 were with alliance managers at Scribe Therapeutics and Vertex. We were asking if AI could make their JRC reports easier to write.

Then we realized something. The shape of the problem we'd solved for scientists, turning a pile of documents into an organized, queryable, evidence-linked set of findings, was the exact shape of the problem in M&A due diligence. Different vocabulary. Same job.

The first M&A-native AI ticket in our Linear is from March 2025. It proposed an "Assign to AI" button on tasks. It got rejected seven months later. It wasn't how buyers wanted to work. They didn't want to click Assign to AI on 300 documents. They wanted the AI to read everything and hand them an issues list.

Attempt two is what we sell today. I bring this up because the lesson is worth stating: we had to fail at the M&A version of our AI before we got to the right one. Nobody talks about that part. The version of the product you see is the one that survived two rewrites.

What we're working on now

In February 2026, Philip opened a ticket called "AI cache optimization." The description links to a blog post from the Manus team on context engineering for AI agents.

This is where we are today. We're optimizing prompt caches. We're load balancing across Azure OpenAI deployments because we hit the 1M tokens per minute rate limit in production. We're reading the same research the frontier agent companies are reading.

Our project in Linear is still named "Copilot," three years after the first ticket. I kind of like that. It's a fossil. It's the reminder that the thing we're building today grew out of a Jupyter notebook experiment from a Wednesday in March 2023.

Why I think this matters for customers

A lot of companies in our category started their products in 2022 or 2023, after the ChatGPT wave. We started ours in 2020. We shipped LLM features in 2023. We pivoted, rebuilt, and got to our current product through real iteration, not a deck.

When a prospect asks us why they should trust Colabra over a company with more funding, I don't lead with pedigree or customer logos. I tell them this: we've been wrong about how to build with LLMs at least twice, and we've corrected. Companies that have never been wrong have never shipped.

The things our product does well today (clause-level citations, entity risk screening across six external databases, gap analysis against a built-in diligence playbook) are not things you can build in a sprint. They're the output of four years of learning what these models can and cannot be trusted to do.

That's the whole argument.