Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Startup Battlefield is back in Australia — here’s what happened last time we came to Sydney

Defense technology, artificial intelligence and fundraising take center stage at StrictlyVC Los Angeles

Ahead of IPO, Anthropic’s Daniela Amodei Dispels Doubts About AI Returns

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Ahead of IPO, Anthropic’s Daniela Amodei Dispels Doubts About AI Returns

    5 June 2026

    Is Silicon Valley ready to put robots in people’s homes? Hello Robot it is.

    4 June 2026

    Lovable signs multi-year deal with Google Cloud to increase usage 5x, source says

    4 June 2026

    These two founders left Goldman and Meta to build voice AI for markets that everyone else was ignoring

    3 June 2026

    Cyera eyes $12B valuation at 80x ARR multiple despite operating losses

    3 June 2026
  • Apps

    Apple approves Poke as first AI agent on Messages for Business platform

    5 June 2026

    Apple touts $1.4 trillion in App Store fees and sales, 90% commission-free

    4 June 2026

    Substack’s new Response Rules feature lets creators control how people respond

    4 June 2026

    Amazon will display AI product images when you search for some reason

    3 June 2026

    Google Launches Fake Call Detection to Protect Against AI Impersonation Scams

    3 June 2026
  • Crypto

    Startup Battlefield 200 applications close today

    27 May 2026

    5 days left: Save up to $410 on Disrupt 2026 passes

    25 May 2026

    As crypto cools, a16z crypto raises $2.2 billion in capital

    6 May 2026

    Coinbase to lay off 14% of staff as part of broader restructuring

    5 May 2026

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026
  • Fintech

    Ramp raises $750M at $44B valuation as investors thirst for fintechs with AI history

    5 June 2026

    Last 24 hours to save up to $410 on your Disrupt 2026 ticket

    29 May 2026

    2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

    28 May 2026

    Robinhood now allows your AI agents to trade stocks

    28 May 2026

    Disrupt 2026 Early Bird ticket savings expire in 3 days

    27 May 2026
  • Hardware

    What to expect from WWDC 2026: The long-awaited Siri refresh and Apple Intelligence updates

    5 June 2026

    Oura Ring 5 review: Thinner, lighter, better

    4 June 2026

    Meta mercifully released the VR fitness game Supernatural instead of just killing it

    4 June 2026

    Apple’s MacBook Neo is winning over a new generation of buyers

    3 June 2026

    Cyberdecks are having a moment, rejecting big tech surveillance with style and substance

    3 June 2026
  • Media & Entertainment

    Meet Wander, a StumbleUpon-inspired tool for discovering the ‘small web’

    4 June 2026

    Publishers will be able to opt out of AI Search, thanks to the new setting

    4 June 2026

    Still facing copyright lawsuits, AI music maker Suno raises another $400 million

    3 June 2026

    A startup, Everand, is now bringing together e-books, audiobooks and book clubs as a challenge to Amazon

    2 June 2026

    The two biggest movies of this weekend were both directed by YouTubers

    31 May 2026
  • Security

    Chinese spies use LinkedIn to trick Westerners into sharing sensitive information

    4 June 2026

    Instagram alerts users targeted by hackers during AI chatbot attacks

    4 June 2026

    Ultrahuman says hackers accessed customer wellness data through an internal tool

    3 June 2026

    Password manager Dashlane says hackers stole some customers’ password vaults

    2 June 2026

    Hackers took over Instagram accounts by tricking the Meta AI support chatbot into granting access

    1 June 2026
  • Startups

    Startup Battlefield is back in Australia — here’s what happened last time we came to Sydney

    5 June 2026

    Focused Energy raises massive $240M Series A for laser-powered fusion technology

    4 June 2026

    Quick Commerce FirstClub Doubles Valuation to $255M in Nine Months

    4 June 2026

    Coralogix Raises $200M in Bet It Takes Someone to Track AI Agents

    3 June 2026

    Ex-Anduril engineer raises $42 million for Amazon composite parts maker

    3 June 2026
  • Transportation

    Carvana ties up with Bezos-backed Slate Auto as it plans new car sales

    4 June 2026

    Uber will roll out 500 data collection vehicles this year

    4 June 2026

    Squishmallows, dentures and an ‘I Heart Hot Dads’ bag: Uber found thousands of items left in robotaxis

    3 June 2026

    Defense tech darling Mach Industries hits $1.8 billion valuation, 4x jump in one year

    2 June 2026

    SpaceX says it may issue ‘significant’ equity in ‘future transactions’

    1 June 2026
  • Venture

    Defense technology, artificial intelligence and fundraising take center stage at StrictlyVC Los Angeles

    5 June 2026

    Benchmark raises its first growth capital as part of $2 billion capital raising

    4 June 2026

    Former Meta CTO Raises $250 Million Climate Fund

    3 June 2026

    Because VivaTech 2026 is the place to see Europe’s AI strategy taking shape

    3 June 2026

    How Europe’s AI strategy diverges from Silicon Valley’s

    2 June 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Running AI models turns into a memory game
AI

Running AI models turns into a memory game

techtost.comBy techtost.com18 February 202603 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Running Ai Models Turns Into A Memory Game
Share
Facebook Twitter LinkedIn Pinterest Email

When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs — but memory is an increasingly important part of the picture. As superscalers prepare to build new billion-dollar data centers, the price of DRAM chips has soared about 7 times in the last year.

At the same time, there is an increasing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. Companies that own it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business.

Semiconductor Analyzer Doug O’Loughlin has an interesting look at the importance of memory chips in his Substack where he chats with Val Bercovici, Head of AI at Weka. They’re both types of semiconductors, so the focus is more on the chips than the broader architecture. The implications for AI software are also very significant.

I was particularly struck by this passage, in which Bercovici examines the increasing complexity of Anthropic direct caching documentation:

Tell it is if we go to Anthropic’s direct caching pricing page. It started as a very simple page six or seven months ago, especially as Claude Code came out — just “use caching, it’s cheaper”. Now it’s an encyclopedia of advice on exactly how much cache writes to pre-purchase. You have 5-minute levels, which are very common across the industry, or 1-hour levels — and nothing more. This is a very important element. Then, of course, you have all kinds of arbitrage opportunities around pricing for cache reads based on the number of cache writes you’ve pre-purchased.

The question here is how long Claude caches your prompt: You can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to pull data that’s still in cache, so if you manage it right, you can save a lot. However, there’s a catch: Each new piece of data you add to the query may display something else than the cache window.

This is complex stuff, but the bottom line is pretty simple: Memory management in AI models is going to be a huge part of AI in the future. Companies that do it well will rise to the top.

And there is much progress to be made in this new field. Back in October, I covered a startup called Tensormesh that was working on a layer in the stack known as cache optimization.

Techcrunch event

Boston, MA
|
June 23, 2026

Opportunities exist elsewhere in the stack. For example, lower down the stack, there’s the question of how data centers use the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, though it’s pretty deep into the hardware.) Further up, end users figure out how to structure their model clusters to take advantage of shared cache.

As companies get better at orchestrating memory, they will use fewer tokens and inference will become cheaper. Meantime, Models become more efficient in processing each tokenpushing costs even further. As server costs come down, many applications that don’t seem viable now will start to increase their profitability.

Claude dram Exclusive game Humane inference cost memory models running turns
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleUS court bans OpenAI from using ‘Cameo’
Next Article SpendRule Raises $2M, Comes From Stealth To Help Hospitals Track Spending
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Ahead of IPO, Anthropic’s Daniela Amodei Dispels Doubts About AI Returns

5 June 2026

Carvana ties up with Bezos-backed Slate Auto as it plans new car sales

4 June 2026

Is Silicon Valley ready to put robots in people’s homes? Hello Robot it is.

4 June 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Startup Battlefield is back in Australia — here’s what happened last time we came to Sydney

5 June 2026

Defense technology, artificial intelligence and fundraising take center stage at StrictlyVC Los Angeles

5 June 2026

Ahead of IPO, Anthropic’s Daniela Amodei Dispels Doubts About AI Returns

5 June 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Ramp raises $750M at $44B valuation as investors thirst for fintechs with AI history

5 June 2026

Last 24 hours to save up to $410 on your Disrupt 2026 ticket

29 May 2026

2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

28 May 2026
Startups

Startup Battlefield is back in Australia — here’s what happened last time we came to Sydney

Focused Energy raises massive $240M Series A for laser-powered fusion technology

Quick Commerce FirstClub Doubles Valuation to $255M in Nine Months

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.