Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Crunchyroll confirms data breach after hackers claim unauthorized access

Insight Partners removes investment post for Delve amid ‘false compliance’ claims.

Zoox is bringing its robotaxis to Austin and Miami

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Mirage raises $75M to continue building models for AI video editing app Captions

    24 March 2026

    Bernie Sanders’ AI ‘gotcha’ video fails, but the memes are great

    24 March 2026

    Are AI tokens the new signing bonus or just a cost of doing business?

    23 March 2026

    Want to build a robot snowman?

    23 March 2026

    Why Wall Street Didn’t Win Nvidia’s Big Conference

    22 March 2026
  • Apps

    Pinterest is launching a new feature for promoting a Pin

    24 March 2026

    Apple Maps may receive advertisements

    24 March 2026

    Facebook is launching a new monetization program to attract popular creators from TikTok, YouTube

    23 March 2026

    Apps that distract you from the endless cycle of scrolling

    23 March 2026

    The features powered by Gemini in Google Workspace that are worth using

    22 March 2026
  • Crypto

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025

    MoviePass opens Mogul fantasy league game to the public

    29 October 2025
  • Fintech

    Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

    23 March 2026

    Amid legal turmoil, Kalshi is temporarily banned in Nevada

    20 March 2026

    Nominations for the Startup Battlefield 200 are still open

    19 March 2026

    Kalshi’s legal woes pile up as Arizona files first criminal charges for ‘illegal gambling operation’

    17 March 2026

    Fuse raises $25M to disrupt legacy loan origination systems used by US credit unions

    16 March 2026
  • Hardware

    Ultrahuman boosts US push with Ring Pro as Oura tightens its grip

    24 March 2026

    Amazon is working on a new smartphone with Alexa at its core, the report says

    20 March 2026

    CEO Carl Pei says nothing about smartphone apps disappearing as they’re replaced by artificial intelligence agents

    18 March 2026

    MacBook Neo, AirPods Max 2, iPhone 17e and everything else Apple announced this month

    18 March 2026

    Oura enters India’s smart ring market with Ring 4

    17 March 2026
  • Media & Entertainment

    Tubi joins forces with popular TikTokers to create original streaming content

    19 March 2026

    Patreon CEO calls AI companies’ fair use argument ‘bogus’, says creators should be paid

    18 March 2026

    Meet Vurt, the first mobile streaming platform for indie filmmakers embracing vertical video

    18 March 2026

    BuzzFeed debuts AI applications for new revenue

    17 March 2026

    Facebook makes it easy for creators to report copycats

    14 March 2026
  • Security

    Crunchyroll confirms data breach after hackers claim unauthorized access

    24 March 2026

    Delve halts demos, Insight Partners sheds investment position amid ‘false compliance’ claims

    24 March 2026

    The FBI says Iranian hackers are using Telegram to steal data in malware attacks

    23 March 2026

    Delve accused of misleading customers with ‘false compliance’

    22 March 2026

    Delve accused of misleading customers with ‘false compliance’

    21 March 2026
  • Startups

    Insight Partners removes investment post for Delve amid ‘false compliance’ claims.

    24 March 2026

    Bengaluru food delivery startup Swish raises $38 million, its third round in 18 months

    24 March 2026

    Cursor admits that his new coding model was built on top of Moonshot AI’s Kimi

    23 March 2026

    Microsoft hires Sequoia-backed AI collaboration platform team Cove

    21 March 2026

    Consumer-focused privacy firm Cloaked raises $375 million as it expands into the enterprise

    20 March 2026
  • Transportation

    Zoox is bringing its robotaxis to Austin and Miami

    24 March 2026

    Zipline raises another $200 million to fuel drone delivery expansion

    24 March 2026

    TechCrunch Mobility: Uber everywhere, at once

    23 March 2026

    The SEC ends its four-year investigation into EV startup Faraday Future

    23 March 2026

    Uber taps Rivian to build robotaxis in deal worth up to $1.25 billion

    22 March 2026
  • Venture

    Startup Gimlet Labs solves the AI ​​inference problem in a surprisingly elegant way

    24 March 2026

    AI startups are eating up the venture industry, and the returns, so far, are good

    21 March 2026

    Sequen raised $16 million to bring TikTok-style personalization technology to any consumer company

    19 March 2026

    AI ‘boys club’ could widen wealth gap for women, says Rana el Kaliouby

    18 March 2026

    Billionaires made a promise – now some want to leave

    17 March 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Largest text-to-speech AI model still shows ’emerging capabilities’
AI

Largest text-to-speech AI model still shows ’emerging capabilities’

techtost.comBy techtost.com15 February 202404 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Largest Text To Speech Ai Model Still Shows 'emerging Capabilities'
Share
Facebook Twitter LinkedIn Pinterest Email

Researchers at Amazon have trained the largest text-to-speech model to date, which they claim exhibits “emergent” properties that improve its ability to naturally speak even complex sentences. The breakthrough could be what technology needs to escape the uncanny valley.

These models were always going to grow and improve, but the researchers specifically hoped to see the kind of jump in skill we saw when language models got past a certain size. For reasons unknown to us, once LLMs get past a certain point, they start to be much more robust and flexible, able to perform tasks they were not trained to do.

That doesn’t mean they gain emote or anything, just that after a certain point their performance on certain chat AI tasks hockey sticks. The Amazon AGI team – it’s no secret what they’re aiming for – thought the same could happen as text-to-speech models grew, and their research shows that this is indeed the case.

The new model is called Big Adaptive Streamable TTS with Emergent abilities, which they have distorted into the abbreviation BASE TTS. The largest version of the model uses 100,000 hours of public domain speech, 90% of which is in English, the rest in German, Dutch and Spanish.

With 980 million parameters, BASE-large appears to be the largest model in this class. They also trained 400M and 150M parameter models based on 10,000 and 1,000 hours of audio respectively, for comparison — the idea is that if one of these models exhibits emerging behaviors but another does not, you have a range of where those behaviors start to emerge.

As it turns out, the medium-sized model showed the jump in ability the team was looking for, not necessarily in ordinary speech quality (reviewed better but only by two points) but in the set of emergent abilities they observed and measured. Here are examples of complex texts mentioned in the document:

  • Composite words: The Beckhams decided to rent a charming, stone-built, quaint country cottage.
  • Feelings: “Oh my God! Are we really going to the Maldives? It’s incredible!” Jenny squealed, bouncing on her tiptoes with boundless glee.
  • Foreign words: “Mr. Henry, renowned for his wickedness, orchestrated a seven-course meal, each course a piece de resistance.
  • Paralinguistics (ie legible non-words): “Shh, Lucy, shhh, we mustn’t wake your brother,” whispered Tom, as they passed the nursery.
  • Punctuation: She received a strange message from her brother: ‘Emergency @ home? call ASAP! Mom and Dad are worried…#familymatters.”
  • Questions: But the Brexit question remains: After all the trials and tribulations, will ministers find the answers in time?
  • Syntactic complexities: The film starring De Moya, who was recently honored with a Lifetime Achievement Award in 2022, was a big hit despite mixed reviews.

“These sentences are designed to contain challenging tasks – parsing garden sentences, putting word stress on long compound nouns, producing emotional or whispered speech, or producing the correct phonemes for foreign words like ‘qi’ or punctuation like ‘@’ . – none of which BASE TTS is explicitly trained to perform,” the authors write.

Such features usually trigger text-to-speech engines that mispronounce, skip words, use strange accents, or make some other blunder. The BASE TTS still had problems, but fared much better than its contemporaries — models like the Tortoise and VALL-E.

There are a bunch of examples of these difficult texts being spoken completely naturally by the new model in the space they made for it. Of course these were selected by the researchers, so they’re necessarily cherry-picked, but it’s impressive regardless. Here’s a couple if you don’t want to click:


Because the three BASE TTS models share an architecture, it seems clear that the size of the model and the extent of the training data appear to be the cause of the model’s ability to handle some of the above complexities. Please note that this is still an experimental model and process — not a commercial model or anything. Further research should identify the tipping point for the emerging capability and how to train and develop the resulting model effectively.

In particular, this model is “streamable”, as the name says – meaning it doesn’t need to generate entire sentences at once, but goes moment by moment at a relatively low bit rate. The team also tried to package speech metadata such as emotionality, prosody and so on into a separate low-bandwidth stream that could accompany the vanilla audio.

It looks like text-to-speech models may have a prime moment in 2024 — just in time for the election! But there’s no doubting the utility of this technology, particularly accessibility. The team notes that it has declined to release the source of the model and other data due to the risk of it being exploited by bad actors. However, the cat will come out of that bag eventually.

Amazon capabilities Emerging largest model shows speech synthesis text to speech texttospeech
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAs Threads Undermines Politics, Bluesky CEO Shows Custom Feeds and User Choices on Social Media
Next Article Rasa, an enterprise-focused programming platform for conversational GenAI, raises $30 million
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Mirage raises $75M to continue building models for AI video editing app Captions

24 March 2026

Bernie Sanders’ AI ‘gotcha’ video fails, but the memes are great

24 March 2026

Are AI tokens the new signing bonus or just a cost of doing business?

23 March 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Crunchyroll confirms data breach after hackers claim unauthorized access

24 March 2026

Insight Partners removes investment post for Delve amid ‘false compliance’ claims.

24 March 2026

Zoox is bringing its robotaxis to Austin and Miami

24 March 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

23 March 2026

Amid legal turmoil, Kalshi is temporarily banned in Nevada

20 March 2026

Nominations for the Startup Battlefield 200 are still open

19 March 2026
Startups

Insight Partners removes investment post for Delve amid ‘false compliance’ claims.

Bengaluru food delivery startup Swish raises $38 million, its third round in 18 months

Cursor admits that his new coding model was built on top of Moonshot AI’s Kimi

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.