Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Esther and Anne Wojcicki support new healthcare accelerator, fund

Tesla just increased its spending plan to $25 billion — this is where the money is going

Keep up with X’s new AI-powered custom streams

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Tesla just increased its spending plan to $25 billion — this is where the money is going

    23 April 2026

    OpenAI partners with Infosys to bring AI tools to more businesses

    22 April 2026

    Unauthorized group gained access to Anthropic’s proprietary Mythos cyber tool, report claims

    22 April 2026

    NSA Spies Reportedly Using Anthropic’s Mythos, Despite Pentagon Controversy

    21 April 2026

    It’s not just one thing – it’s another thing

    21 April 2026
  • Apps

    Keep up with X’s new AI-powered custom streams

    23 April 2026

    X makes it more expensive to publish links through its API

    22 April 2026

    Apple’s Cal AI crackdown signals it still controls the App Store

    22 April 2026

    GRAI believes that AI can make music more social, not replace artists

    21 April 2026

    WhatsApp is testing a premium subscription, but it’s mostly cosmetic

    21 April 2026
  • Crypto

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025
  • Fintech

    Cash App targets a new type of customer: children aged 6 to 12 years

    22 April 2026

    Revolut eyes up to $200 billion valuation in potential IPO

    22 April 2026

    Once close enough for a takeover, Stripe and Airwallex are now going after each other

    18 April 2026

    Airwallex is set to take on Stripe and the rest of the payments industry — in the physical world

    16 April 2026

    Cash app launches ‘pay later’ feature for P2P transfers

    3 April 2026
  • Hardware

    Apple’s John Ternus will run one of the most powerful companies in the world. work is a minefield

    22 April 2026

    Tim Cook steps down as Apple CEO: Here’s a look at his 15-year legacy, from new products and services to China expansion

    22 April 2026

    Who is John Ternus, the new CEO of Apple?

    21 April 2026

    Tim Cook steps down as Apple CEO, while John Ternus takes over

    21 April 2026

    Amazon Unveils Slimmer Fire TV Stick HD, Opens Ember Artline TVs for Pre-Order

    16 April 2026
  • Media & Entertainment

    YouTube extends its AI similarity detection technology to celebrities

    21 April 2026

    Deezer says 44% of songs uploaded to its platform every day are created with artificial intelligence

    20 April 2026

    Netflix plans to add a vertical video stream, use AI for recommendations

    17 April 2026

    Netflix co-founder and chairman Reed Hastings is stepping down from the board

    17 April 2026

    All we like is soulfulness

    16 April 2026
  • Security

    Apple fixes bug used by police to extract deleted chat messages from iPhones

    22 April 2026

    As US spy laws expire, lawmakers divided over protecting Americans from warrantless surveillance

    22 April 2026

    Ransomware dealer pleads guilty to helping ransomware gang

    21 April 2026

    App host Vercel says it was hacked and customer data stolen

    21 April 2026

    Mastodon says its flagship server has been hit by a DDoS attack

    20 April 2026
  • Startups

    Cathie Woods’ ARK makes first major investment in startup Lucra — and it’s not AI

    22 April 2026

    AI research lab NeoCognition offers $40 million to build agents that learn like humans

    22 April 2026

    You’ve heard of hybrid cars. Now meet a hybrid cement plant.

    19 April 2026

    Loop raises $95 million to build supply chain artificial intelligence that predicts disruptions

    18 April 2026

    Sources: Runner in talks to raise $2B+ at $50B valuation as business grows

    18 April 2026
  • Transportation

    Redwood Materials lays off 10% in restructuring to pursue energy storage business

    22 April 2026

    Amazon taps Sweden’s Einride for its electric big rigs

    21 April 2026

    The Rivian factory was hit by a tornado before the R2 was released

    20 April 2026

    TechCrunch Mobility: Uber enters the era of assetmaxxing

    20 April 2026

    Uber will now collect your returns from your doorstep

    17 April 2026
  • Venture

    Esther and Anne Wojcicki support new healthcare accelerator, fund

    23 April 2026

    Anthropic rejects VC funding that values ​​it at $800B+, for now

    16 April 2026

    Financial risk management platform Pillar raises $20 million in rounds led by a16z

    15 April 2026

    Vercel CEO Guillermo Rauch signals IPO readiness as AI agents drive revenue

    14 April 2026

    Nvidia-backed SiFive hits $3.65 billion valuation for open AI chips

    11 April 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Running AI models turns into a memory game
AI

Running AI models turns into a memory game

techtost.comBy techtost.com18 February 202603 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Running Ai Models Turns Into A Memory Game
Share
Facebook Twitter LinkedIn Pinterest Email

When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs — but memory is an increasingly important part of the picture. As superscalers prepare to build new billion-dollar data centers, the price of DRAM chips has soared about 7 times in the last year.

At the same time, there is an increasing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. Companies that own it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business.

Semiconductor Analyzer Doug O’Loughlin has an interesting look at the importance of memory chips in his Substack where he chats with Val Bercovici, Head of AI at Weka. They’re both types of semiconductors, so the focus is more on the chips than the broader architecture. The implications for AI software are also very significant.

I was particularly struck by this passage, in which Bercovici examines the increasing complexity of Anthropic direct caching documentation:

Tell it is if we go to Anthropic’s direct caching pricing page. It started as a very simple page six or seven months ago, especially as Claude Code came out — just “use caching, it’s cheaper”. Now it’s an encyclopedia of advice on exactly how much cache writes to pre-purchase. You have 5-minute levels, which are very common across the industry, or 1-hour levels — and nothing more. This is a very important element. Then, of course, you have all kinds of arbitrage opportunities around pricing for cache reads based on the number of cache writes you’ve pre-purchased.

The question here is how long Claude caches your prompt: You can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to pull data that’s still in cache, so if you manage it right, you can save a lot. However, there’s a catch: Each new piece of data you add to the query may display something else than the cache window.

This is complex stuff, but the bottom line is pretty simple: Memory management in AI models is going to be a huge part of AI in the future. Companies that do it well will rise to the top.

And there is much progress to be made in this new field. Back in October, I covered a startup called Tensormesh that was working on a layer in the stack known as cache optimization.

Techcrunch event

Boston, MA
|
June 23, 2026

Opportunities exist elsewhere in the stack. For example, lower down the stack, there’s the question of how data centers use the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, though it’s pretty deep into the hardware.) Further up, end users figure out how to structure their model clusters to take advantage of shared cache.

As companies get better at orchestrating memory, they will use fewer tokens and inference will become cheaper. Meantime, Models become more efficient in processing each tokenpushing costs even further. As server costs come down, many applications that don’t seem viable now will start to increase their profitability.

Claude dram Exclusive game Humane inference cost memory models running turns
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleUS court bans OpenAI from using ‘Cameo’
Next Article SpendRule Raises $2M, Comes From Stealth To Help Hospitals Track Spending
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Tesla just increased its spending plan to $25 billion — this is where the money is going

23 April 2026

Cathie Woods’ ARK makes first major investment in startup Lucra — and it’s not AI

22 April 2026

OpenAI partners with Infosys to bring AI tools to more businesses

22 April 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Esther and Anne Wojcicki support new healthcare accelerator, fund

23 April 2026

Tesla just increased its spending plan to $25 billion — this is where the money is going

23 April 2026

Keep up with X’s new AI-powered custom streams

23 April 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Cash App targets a new type of customer: children aged 6 to 12 years

22 April 2026

Revolut eyes up to $200 billion valuation in potential IPO

22 April 2026

Once close enough for a takeover, Stripe and Airwallex are now going after each other

18 April 2026
Startups

Cathie Woods’ ARK makes first major investment in startup Lucra — and it’s not AI

AI research lab NeoCognition offers $40 million to build agents that learn like humans

You’ve heard of hybrid cars. Now meet a hybrid cement plant.

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.