Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Accel doubles down on Fibr AI as agents turn static websites into one-to-one experiences

SNAK Venture Partners raises $50 million in capital to support vertical acquisitions

Benchmark raises $225 million in dedicated funds to double Cerebras

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Benchmark raises $225 million in dedicated funds to double Cerebras

    7 February 2026

    How artificial intelligence is helping to solve the labor issue in treating rare diseases

    6 February 2026

    Amazon and Google are winning the AI ​​capital race — but what’s the prize?

    6 February 2026

    AWS revenue continues to grow as cloud demand remains high

    5 February 2026

    Sam Altman tested Claude’s Super Bowl commercials brilliantly

    5 February 2026
  • Apps

    EU says TikTok must disable ‘addictive’ features like infinite scrolling, fix recommendation engine

    7 February 2026

    Here’s how Roblox’s age controls work

    6 February 2026

    Meta is testing a standalone app for its AI-generated ‘Vibes’ videos

    6 February 2026

    Reddit sees AI search as the next big opportunity

    5 February 2026

    Tinder looks to AI to help fight dating app ‘fatigue’ and burnout

    5 February 2026
  • Crypto

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025

    MoviePass opens Mogul fantasy league game to the public

    29 October 2025
  • Fintech

    Stripe Alumni Raise €30M Series A for Duna, Backed by Stripe and Adyen Executives

    5 February 2026

    Fintech CEO and Forbes 30 Under 30 alum indicted for alleged fraud

    3 February 2026

    How Sequoia-backed Ethos went public while rivals lagged behind

    30 January 2026

    5 days left for TechCrunch Disrupt 2026 +1 pass with 50%

    26 January 2026

    50% off +1 ends | TechCrunch

    23 January 2026
  • Hardware

    Kindle Scribe Colorsoft is an expensive but beautiful color e-ink tablet with AI features

    6 February 2026

    Ring brings “Search Party” feature for finding lost dogs to non-Ring camera owners

    2 February 2026

    India offers zero taxes till 2047 to attract global AI workloads

    1 February 2026

    Microsoft won’t stop buying AI chips from Nvidia, AMD even after its own is released, says Nadella

    30 January 2026

    The iPhone just had its best quarter ever

    30 January 2026
  • Media & Entertainment

    “Industry” Season 4 captures tech fraud better than any show on TV right now

    7 February 2026

    Spotify’s new feature lets you explore the story behind the song you’re listening to

    6 February 2026

    The Washington Post retreats from Silicon Valley when it matters most

    6 February 2026

    Spotify is in the business of selling books and adding new audiobook features

    5 February 2026

    Amazon will begin testing AI tools for film and TV production next month

    5 February 2026
  • Security

    Senator, who has repeatedly warned of secret US government surveillance, raises new alarm over ‘CIA activities’

    7 February 2026

    Substack confirms that the data breach affects users’ email addresses and phone numbers

    6 February 2026

    One of Europe’s biggest universities was offline for days after the cyber attack

    6 February 2026

    Cyber ​​tech giant Conduent’s hot air balloon data breach affects millions more Americans

    5 February 2026

    Hackers Release Personal Information Stolen During Harvard, UPenn Data Breach

    5 February 2026
  • Startups

    Accel doubles down on Fibr AI as agents turn static websites into one-to-one experiences

    7 February 2026

    ElevenLabs Raises $500M From Sequoia At $11B Valuation

    7 February 2026

    Fundamental raises $255 million in Series A with a new approach to big data analytics

    6 February 2026

    a16z VC wants founders to stop stressing about crazy ARR numbers

    6 February 2026

    Lunar Energy raises $232 million to develop home batteries that support the grid

    5 February 2026
  • Transportation

    Prince Andrew’s adviser suggested Jeffrey Epstein invest in EV startups like Lucid Motors

    7 February 2026

    Apeiron Labs Takes $9.5M to Flood Oceans with Autonomous Underwater Robots

    5 February 2026

    Uber appoints new CFO as its AV plans accelerate

    5 February 2026

    Skyryse lands another $300 million to make flying, even helicopters, simple and safe

    4 February 2026

    China is leading the fight against hidden car door handles

    3 February 2026
  • Venture

    SNAK Venture Partners raises $50 million in capital to support vertical acquisitions

    7 February 2026

    Reddit says it’s looking for more acquisitions in adtech and elsewhere

    7 February 2026

    Secondary sales are shifting from founders’ windfalls to employee retention tools

    6 February 2026

    Sapiom Raises $15M to Help AI Agents Buy Their Own Tech Tools

    6 February 2026

    What a16z actually funds (and what it ignores) when it comes to AI infra

    5 February 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Largest text-to-speech AI model still shows ’emerging capabilities’
AI

Largest text-to-speech AI model still shows ’emerging capabilities’

techtost.comBy techtost.com15 February 202404 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Largest Text To Speech Ai Model Still Shows 'emerging Capabilities'
Share
Facebook Twitter LinkedIn Pinterest Email

Researchers at Amazon have trained the largest text-to-speech model to date, which they claim exhibits “emergent” properties that improve its ability to naturally speak even complex sentences. The breakthrough could be what technology needs to escape the uncanny valley.

These models were always going to grow and improve, but the researchers specifically hoped to see the kind of jump in skill we saw when language models got past a certain size. For reasons unknown to us, once LLMs get past a certain point, they start to be much more robust and flexible, able to perform tasks they were not trained to do.

That doesn’t mean they gain emote or anything, just that after a certain point their performance on certain chat AI tasks hockey sticks. The Amazon AGI team – it’s no secret what they’re aiming for – thought the same could happen as text-to-speech models grew, and their research shows that this is indeed the case.

The new model is called Big Adaptive Streamable TTS with Emergent abilities, which they have distorted into the abbreviation BASE TTS. The largest version of the model uses 100,000 hours of public domain speech, 90% of which is in English, the rest in German, Dutch and Spanish.

With 980 million parameters, BASE-large appears to be the largest model in this class. They also trained 400M and 150M parameter models based on 10,000 and 1,000 hours of audio respectively, for comparison — the idea is that if one of these models exhibits emerging behaviors but another does not, you have a range of where those behaviors start to emerge.

As it turns out, the medium-sized model showed the jump in ability the team was looking for, not necessarily in ordinary speech quality (reviewed better but only by two points) but in the set of emergent abilities they observed and measured. Here are examples of complex texts mentioned in the document:

  • Composite words: The Beckhams decided to rent a charming, stone-built, quaint country cottage.
  • Feelings: “Oh my God! Are we really going to the Maldives? It’s incredible!” Jenny squealed, bouncing on her tiptoes with boundless glee.
  • Foreign words: “Mr. Henry, renowned for his wickedness, orchestrated a seven-course meal, each course a piece de resistance.
  • Paralinguistics (ie legible non-words): “Shh, Lucy, shhh, we mustn’t wake your brother,” whispered Tom, as they passed the nursery.
  • Punctuation: She received a strange message from her brother: ‘Emergency @ home? call ASAP! Mom and Dad are worried…#familymatters.”
  • Questions: But the Brexit question remains: After all the trials and tribulations, will ministers find the answers in time?
  • Syntactic complexities: The film starring De Moya, who was recently honored with a Lifetime Achievement Award in 2022, was a big hit despite mixed reviews.

“These sentences are designed to contain challenging tasks – parsing garden sentences, putting word stress on long compound nouns, producing emotional or whispered speech, or producing the correct phonemes for foreign words like ‘qi’ or punctuation like ‘@’ . – none of which BASE TTS is explicitly trained to perform,” the authors write.

Such features usually trigger text-to-speech engines that mispronounce, skip words, use strange accents, or make some other blunder. The BASE TTS still had problems, but fared much better than its contemporaries — models like the Tortoise and VALL-E.

There are a bunch of examples of these difficult texts being spoken completely naturally by the new model in the space they made for it. Of course these were selected by the researchers, so they’re necessarily cherry-picked, but it’s impressive regardless. Here’s a couple if you don’t want to click:


Because the three BASE TTS models share an architecture, it seems clear that the size of the model and the extent of the training data appear to be the cause of the model’s ability to handle some of the above complexities. Please note that this is still an experimental model and process — not a commercial model or anything. Further research should identify the tipping point for the emerging capability and how to train and develop the resulting model effectively.

In particular, this model is “streamable”, as the name says – meaning it doesn’t need to generate entire sentences at once, but goes moment by moment at a relatively low bit rate. The team also tried to package speech metadata such as emotionality, prosody and so on into a separate low-bandwidth stream that could accompany the vanilla audio.

It looks like text-to-speech models may have a prime moment in 2024 — just in time for the election! But there’s no doubting the utility of this technology, particularly accessibility. The team notes that it has declined to release the source of the model and other data due to the risk of it being exploited by bad actors. However, the cat will come out of that bag eventually.

Amazon capabilities Emerging largest model shows speech synthesis text to speech texttospeech
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAs Threads Undermines Politics, Bluesky CEO Shows Custom Feeds and User Choices on Social Media
Next Article Rasa, an enterprise-focused programming platform for conversational GenAI, raises $30 million
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Benchmark raises $225 million in dedicated funds to double Cerebras

7 February 2026

How artificial intelligence is helping to solve the labor issue in treating rare diseases

6 February 2026

Amazon and Google are winning the AI ​​capital race — but what’s the prize?

6 February 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Accel doubles down on Fibr AI as agents turn static websites into one-to-one experiences

7 February 2026

SNAK Venture Partners raises $50 million in capital to support vertical acquisitions

7 February 2026

Benchmark raises $225 million in dedicated funds to double Cerebras

7 February 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Stripe Alumni Raise €30M Series A for Duna, Backed by Stripe and Adyen Executives

5 February 2026

Fintech CEO and Forbes 30 Under 30 alum indicted for alleged fraud

3 February 2026

How Sequoia-backed Ethos went public while rivals lagged behind

30 January 2026
Startups

Accel doubles down on Fibr AI as agents turn static websites into one-to-one experiences

ElevenLabs Raises $500M From Sequoia At $11B Valuation

Fundamental raises $255 million in Series A with a new approach to big data analytics

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.