As LLMs have become more powerful, hallucinations have proven stubbornly difficult to avoid. Errors occur even in the smartest models, and while there are ways to catch these errors, the industry is still looking for the best way to do it.
Probablywhich just raised $9 million in seed funding from Andreessen Horowitz, is trying to create a more rigorous way to catch these mistakes.
As founder Peter Elias (pictured above) puts it, the company’s goal is to prevent illusions and simple factual errors from ever reaching the user, and to achieve the kind of 99.99% accuracy that’s common in deterministic systems but much harder to achieve with AI. As it turns out, bringing LLMs to this level of precision requires rethinking many of the basic assumptions of artificial intelligence engineering.
Probably its first product is a data science tool, built to produce fast answers from complex data sets. Each result is accompanied by a report and audit trail of how it was developed, an increasingly common practice among AI tools.
But keeping errors from creeping into those summaries required an elaborate braiding system that Elias describes as a “data science engineering suit.” LLM first-pass responses are checked against a deterministic validation system, which bounces any results that do not match the dataset. Most importantly, LLM is trained against the validator and the entire system is optimized for fast and accurate responses, the company said.
“What we learned in making this was that the better the braid mechanics, the weaker the model can be,” says Elias. “If you can improve the context enough, the model doesn’t have to work very hard to get it right. Basically, it’s an exercise in reducing ambiguity.”
This allows Probably’s data science tool to work on significantly smaller AI models. Elias says the current version runs on a model that’s “four orders of magnitude weaker than previous models,” meaning it can run on local hardware (ie, a desktop instead of a data center), which reduces a huge amount of the token cost associated with using AI.
It’s a welcome idea at a time when contract costs are rising and many customers are reassessing their AI budgets. And, Elias’ idea doesn’t end with data science, as the same engine can be extended to cover use cases like accounting or medical services — as Elias puts it, “any precision-sensitive use case.”
“I think it’s really interesting that the big AI labs haven’t even tried to do this,” says Elias. “They have an incentive not to, because they make money the more times you have to fix the model.”
When you purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.
