Customer support and service are among the most popular areas of voice AI right now. But building a product that sounds human and responds without noticeable lag is proving much more difficult in some markets than others — and most of the major players weren’t built with Africa and the Middle East in mind.
AethexAIa startup founded last year to fill that gap, raised $3 million in pre-seed funding led by 4DX Ventures, with participation from Enza Capital, Dorm Room Fund, Mojo Ventures and Stanford GSB 26 Fund. Individual investors include Stanford professors, telecom executives and artificial intelligence researchers from Anthropic.
Instead of using existing orchestration tools like Vapi and LiveKit, the company built its own micromodel and orchestration layer from scratch to handle the local dialects of English, French, and Arabic spoken in its target markets — a decision driven, as we’ll learn, by the particular requirements of operating in the region.
The company is also launching its platform for businesses to test its technology and subscribe to its services, along with an API and SDK for developers to experiment with its models.
The startup was founded by Mariama Diallo and Ayooluwa Odemuyiwa. CEO Diallo worked at Goldman Sachs and later joined YC-backed ModelML as a product and development recruiter. CTO Odemuyiwa graduated from Caltech, worked at Meta and attended Stanford Business School before co-founding the company. The couple wanted to create something for emerging markets and started looking for opportunities.
Businesses around the world are scrambling to adopt artificial intelligence tools to automate parts of their operations. But that doesn’t always work out. In Egypt, a call center automated a significant share of its calls but toppled the system due to poor results, the founders discovered. Several support centers in Africa told them that finding and hiring engineers to automate calls at the right cost was a persistent headache.
“The latency and jitter we saw on automated calls in this region was outrageous. If we had become an orchestrator, we might have had to use large models hosted out of region, resulting in higher latency. We realized that for this to work, we need to use very small models and reduce latency at every step,” Odemuriunchs said of the company’s decision. orchestration layer.
The AI labs that develop their latest models typically spend millions to train them and acquire data. AethexAI has found a solution for both. Instead of chasing the biggest possible models, it decided that small models are enough to deal with the latency problem while maintaining accuracy, and developed its own Kora series, with parameters ranging from 300 million to 1.7 billion. This is a fraction of the size of LLMs, which is exactly the point.
To train these models, the startup used anonymous recordings from a call center operator. He also sent hard drives to radio stations across Africa to collect more audio data. To reduce costs, he created a network of university students to annotate data and pronounce local names. As a result, the startup says, it now handles more than 17,000 calls a day.
On the business side, the company makes sure to guide customers new to voice AI through the process, offering on-site demos and workshops to help them identify the best use cases for automation.
“We always tell customers that we can’t be everything to everyone right now. We’re small. When we start talking to a company, we ask them to pick one use case that’s most important to them to start with [with]Diallo said.
The startup is open to work in all industries, but for now, much of its use cases involve debt collection calls, customer activation, or KYC — Know Your Customer verification, the standard authentication process used by banks and telcos. The company is hiring future contract engineers to serve local markets and build channel partnerships with telecom providers to manage telephony for voice AI calls. Plug-and-play solutions, he says, simply won’t work here.
Walter Baddoo, co-founder and managing partner of 4DX Ventures, argues that the African and Middle Eastern market is fundamentally different from the markets most voice AI companies were built to serve.
“Businesses in Africa and the Middle East handle around three times the call volume of their Western counterparts as voice is still the dominant channel for customer interaction,” he said. “Existing systems were built for Western markets characterized by high-end GPU infrastructure, standard English and European speech environments, and enterprise workflows common to the US and Europe. This creates real gaps when businesses need systems that handle dialects, code switching, and informal speech patterns and that work within their existing phone infrastructure and real-world pricing.”
In other words, while companies like ElevenLabs, Deepgram, Sierra, and Cognigy are expanding globally at a rapid pace, the markets they were built for and the markets they are entering are not always the same thing. Startups like AethexAI are betting that the gaps — models specializing in local dialects, on-the-ground collaborations, infrastructure built for the region — represent a market opening that the giants have neither the incentive nor the architecture to close.
When you purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.
