Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Apple’s MacBook Neo is winning over a new generation of buyers

Ex-Anduril engineer raises $42 million for Amazon composite parts maker

Squishmallows, dentures and an ‘I Heart Hot Dads’ bag: Uber found thousands of items left in robotaxis

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Cyera eyes $12B valuation at 80x ARR multiple despite operating losses

    3 June 2026

    Anthropic scales Claude Mythos to critical infrastructure in 15+ countries

    2 June 2026

    Florida sues OpenAI’s Sam Altman in first-of-its-kind violent crime lawsuit

    2 June 2026

    The internet is being remade for machines

    1 June 2026

    Understanding the AI ​​psychosis debate

    31 May 2026
  • Apps

    Google Launches Fake Call Detection to Protect Against AI Impersonation Scams

    3 June 2026

    Meta is testing ‘Series’ for episodic Reels on Instagram and Facebook

    2 June 2026

    A new app, The Mall, creates a universal flow for online shopping

    2 June 2026

    DuckDuckGo makes its ‘AI-free’ search engine easier to access as traffic grows

    1 June 2026

    TikTok’s road to becoming a super app

    31 May 2026
  • Crypto

    Startup Battlefield 200 applications close today

    27 May 2026

    5 days left: Save up to $410 on Disrupt 2026 passes

    25 May 2026

    As crypto cools, a16z crypto raises $2.2 billion in capital

    6 May 2026

    Coinbase to lay off 14% of staff as part of broader restructuring

    5 May 2026

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026
  • Fintech

    Last 24 hours to save up to $410 on your Disrupt 2026 ticket

    29 May 2026

    2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

    28 May 2026

    Robinhood now allows your AI agents to trade stocks

    28 May 2026

    Disrupt 2026 Early Bird ticket savings expire in 3 days

    27 May 2026

    Disrupt 2026 Early Bird ticket prices end May 29

    26 May 2026
  • Hardware

    Apple’s MacBook Neo is winning over a new generation of buyers

    3 June 2026

    Cyberdecks are having a moment, rejecting big tech surveillance with style and substance

    3 June 2026

    Nvidia chases $200 billion CPU market with AI agent computing from Microsoft, Dell and HP

    2 June 2026

    This $300 Pizza Oven Can Easily Help Revive Your Summer Pizza Nights

    30 May 2026

    Kiwibit’s artificial intelligence bird feeder is my new backyard friend

    29 May 2026
  • Media & Entertainment

    A startup, Everand, is now bringing together e-books, audiobooks and book clubs as a challenge to Amazon

    2 June 2026

    The two biggest movies of this weekend were both directed by YouTubers

    31 May 2026

    The two biggest movies of this weekend were both directed by YouTubers

    30 May 2026

    YouTube will automatically flag videos with artificial intelligence

    28 May 2026

    Meta launches Instagram, Facebook and WhatsApp subscriptions, with more to follow, including AI plans

    27 May 2026
  • Security

    Password manager Dashlane says hackers stole some customers’ password vaults

    2 June 2026

    Hackers took over Instagram accounts by tricking the Meta AI support chatbot into granting access

    1 June 2026

    Iranian hackers blamed for breach of Los Angeles transit system that took weeks to recover

    30 May 2026

    Microsoft is under fire for threatening a security researcher with a criminal investigation

    29 May 2026

    A security flaw in prison payphone service Pay Tel exposed publicly the driver’s licenses of more than 300,000 callers

    29 May 2026
  • Startups

    Ex-Anduril engineer raises $42 million for Amazon composite parts maker

    3 June 2026

    Board, the new gaming startup from Mirror founder Brynn Putnam, raises $20 million, has already sold thousands

    2 June 2026

    From Stage to Future: Where Are Startup Battlefield Alumni Now?

    2 June 2026

    Revolut offers service to thousands of users in India ahead of wider rollout

    1 June 2026

    The deadline to submit applications for the Startup Battlefield 200 has been extended to June 8

    30 May 2026
  • Transportation

    Squishmallows, dentures and an ‘I Heart Hot Dads’ bag: Uber found thousands of items left in robotaxis

    3 June 2026

    Defense tech darling Mach Industries hits $1.8 billion valuation, 4x jump in one year

    2 June 2026

    SpaceX says it may issue ‘significant’ equity in ‘future transactions’

    1 June 2026

    TechCrunch Mobility: It doesn’t matter that people hate the Ferrari Luce

    31 May 2026

    Rivian is under investigation for rear suspension failures on R1 models

    30 May 2026
  • Venture

    Because VivaTech 2026 is the place to see Europe’s AI strategy taking shape

    3 June 2026

    How Europe’s AI strategy diverges from Silicon Valley’s

    2 June 2026

    How to make the Startup Battlefield Top 20 — and what each company gets regardless

    2 June 2026

    Black founders raise highest quarterly funding since 2022, but there’s a catch

    31 May 2026

    Snap alums reveal Ghost Angels fund

    31 May 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Largest text-to-speech AI model still shows ’emerging capabilities’
AI

Largest text-to-speech AI model still shows ’emerging capabilities’

techtost.comBy techtost.com15 February 202404 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Largest Text To Speech Ai Model Still Shows 'emerging Capabilities'
Share
Facebook Twitter LinkedIn Pinterest Email

Researchers at Amazon have trained the largest text-to-speech model to date, which they claim exhibits “emergent” properties that improve its ability to naturally speak even complex sentences. The breakthrough could be what technology needs to escape the uncanny valley.

These models were always going to grow and improve, but the researchers specifically hoped to see the kind of jump in skill we saw when language models got past a certain size. For reasons unknown to us, once LLMs get past a certain point, they start to be much more robust and flexible, able to perform tasks they were not trained to do.

That doesn’t mean they gain emote or anything, just that after a certain point their performance on certain chat AI tasks hockey sticks. The Amazon AGI team – it’s no secret what they’re aiming for – thought the same could happen as text-to-speech models grew, and their research shows that this is indeed the case.

The new model is called Big Adaptive Streamable TTS with Emergent abilities, which they have distorted into the abbreviation BASE TTS. The largest version of the model uses 100,000 hours of public domain speech, 90% of which is in English, the rest in German, Dutch and Spanish.

With 980 million parameters, BASE-large appears to be the largest model in this class. They also trained 400M and 150M parameter models based on 10,000 and 1,000 hours of audio respectively, for comparison — the idea is that if one of these models exhibits emerging behaviors but another does not, you have a range of where those behaviors start to emerge.

As it turns out, the medium-sized model showed the jump in ability the team was looking for, not necessarily in ordinary speech quality (reviewed better but only by two points) but in the set of emergent abilities they observed and measured. Here are examples of complex texts mentioned in the document:

  • Composite words: The Beckhams decided to rent a charming, stone-built, quaint country cottage.
  • Feelings: “Oh my God! Are we really going to the Maldives? It’s incredible!” Jenny squealed, bouncing on her tiptoes with boundless glee.
  • Foreign words: “Mr. Henry, renowned for his wickedness, orchestrated a seven-course meal, each course a piece de resistance.
  • Paralinguistics (ie legible non-words): “Shh, Lucy, shhh, we mustn’t wake your brother,” whispered Tom, as they passed the nursery.
  • Punctuation: She received a strange message from her brother: ‘Emergency @ home? call ASAP! Mom and Dad are worried…#familymatters.”
  • Questions: But the Brexit question remains: After all the trials and tribulations, will ministers find the answers in time?
  • Syntactic complexities: The film starring De Moya, who was recently honored with a Lifetime Achievement Award in 2022, was a big hit despite mixed reviews.

“These sentences are designed to contain challenging tasks – parsing garden sentences, putting word stress on long compound nouns, producing emotional or whispered speech, or producing the correct phonemes for foreign words like ‘qi’ or punctuation like ‘@’ . – none of which BASE TTS is explicitly trained to perform,” the authors write.

Such features usually trigger text-to-speech engines that mispronounce, skip words, use strange accents, or make some other blunder. The BASE TTS still had problems, but fared much better than its contemporaries — models like the Tortoise and VALL-E.

There are a bunch of examples of these difficult texts being spoken completely naturally by the new model in the space they made for it. Of course these were selected by the researchers, so they’re necessarily cherry-picked, but it’s impressive regardless. Here’s a couple if you don’t want to click:


Because the three BASE TTS models share an architecture, it seems clear that the size of the model and the extent of the training data appear to be the cause of the model’s ability to handle some of the above complexities. Please note that this is still an experimental model and process — not a commercial model or anything. Further research should identify the tipping point for the emerging capability and how to train and develop the resulting model effectively.

In particular, this model is “streamable”, as the name says – meaning it doesn’t need to generate entire sentences at once, but goes moment by moment at a relatively low bit rate. The team also tried to package speech metadata such as emotionality, prosody and so on into a separate low-bandwidth stream that could accompany the vanilla audio.

It looks like text-to-speech models may have a prime moment in 2024 — just in time for the election! But there’s no doubting the utility of this technology, particularly accessibility. The team notes that it has declined to release the source of the model and other data due to the risk of it being exploited by bad actors. However, the cat will come out of that bag eventually.

Amazon capabilities Emerging largest model shows speech synthesis text to speech texttospeech
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAs Threads Undermines Politics, Bluesky CEO Shows Custom Feeds and User Choices on Social Media
Next Article Rasa, an enterprise-focused programming platform for conversational GenAI, raises $30 million
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Ex-Anduril engineer raises $42 million for Amazon composite parts maker

3 June 2026

Cyera eyes $12B valuation at 80x ARR multiple despite operating losses

3 June 2026

A startup, Everand, is now bringing together e-books, audiobooks and book clubs as a challenge to Amazon

2 June 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Apple’s MacBook Neo is winning over a new generation of buyers

3 June 2026

Ex-Anduril engineer raises $42 million for Amazon composite parts maker

3 June 2026

Squishmallows, dentures and an ‘I Heart Hot Dads’ bag: Uber found thousands of items left in robotaxis

3 June 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Last 24 hours to save up to $410 on your Disrupt 2026 ticket

29 May 2026

2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

28 May 2026

Robinhood now allows your AI agents to trade stocks

28 May 2026
Startups

Ex-Anduril engineer raises $42 million for Amazon composite parts maker

Board, the new gaming startup from Mirror founder Brynn Putnam, raises $20 million, has already sold thousands

From Stage to Future: Where Are Startup Battlefield Alumni Now?

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.