Openai brings new AI models and voice models to API that the company claims to improve its previous releases.
For Openai, models fit the wider “Agentic” Vision: Building Automated Systems that can achieve independent tasks on behalf of users. The definition of “Agent” may be disputed, but the head of the product Openai Olivier Godment described an interpretation as a chatbot that can talk to the customers of a business.
“We will see more and more agents appear in the coming months,” Godment told TechCrunch during an information. “And so the general issue helps customers and developers to exploit agents who are useful, available and accurate.”
Openai claims that the new speech text model, “GPT-4o-mini-ts”, not only offers a more distinctive and realistic speech, but is also more “painful” than previous speech discussion models. Developers can command the GPT-4o-mini-ts on how to say things in the natural language-for example, “speak like a crazy scientist” or “use a tranquil voice, as an awareness teacher”.
Here is a “true crime”, outdated voice:
And here is a sample of female “professional” voice:
Jeff Harris, a member of the product staff at Openai, told Techcrunch that the goal is to let developers adapt both “experience” and “frame”.
“In different contexts, you don’t just want a flat, monotonous voice,” Harris said. “If you are in a customer support experience and you want the voice to be apologetic because it has made a mistake, you can really have the voice to have this feeling in it … Our great belief, here, is that developers and users really want to control not only what is being said, but how things are talking about.”
Concerning the new OpenAi text speech models, the “GPT-4o-Transcribe” and “GPT-4O-Mini-Transcribe”, effectively replace the Whisper Long-in-Sooth transcription model. It was trained in “different, high quality audio data sets”, new models can better record the bow and varied reason, the Openai claims, and even in chaotic environments.
They are also less likely to deform, Harris added. The whispers tend to make words – and even whole passages – in conversations, introducing everything, from racial comments to fantastic medical treatments in transcripts.
“[T]The models are very much improved against this front, “Harris said. [in this context] means that models are just listening to the words [and] They do not complete details that they did not hear. ”
However, your kilometers may vary depending on the language transcribed.
According to Openai’s internal reference points, the GPT-4O-transcribe, the more accurate than the two transcription models, has a “word error percentage” approaching 30% (from 120%) for Infing and Dravidian languages such as Tamil, Telugu, Malayalam and Kannada. This means that three of the 10 words from the model will differ from a human transcript in these languages.
In a break from tradition, Openai does not plan to make new transcription models openly available. The company Historically released new versions of Whisper For commercial use with MIT license.
Harris said the trastrice GPT-4o-transcribe and GPT-4O-mini-transcribe are “much larger than whispers” and are therefore not good candidates for open release.
“[T]It’s not the kind of model you can run locally on your laptop, like whisper, “he continued.”[W]You want to make sure that if we release things at an open source, we do it carefully and have a model that is really improved for this particular need. And we believe that end user devices are one of the most interesting cases for open source models. ”
Updated March 20, 2025, 11:54 am Pt to clarify the language Around the word error rate and updated the reference results chart with a more recent version.