Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

US military contractor likely built iPhone hacking tools used by Russian spies in Ukraine

AI networking startup Eridu emerges from stealth with hefty $200M Series A

Electric air taxi maker Archer hits back at Joby alleging hidden Chinese ties

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Sandbar secures $23M Series A for AI note-taking ring

    10 March 2026

    OpenAI and Google employees are quick to defend Anthropic in the DOD lawsuit

    10 March 2026

    OpenAI hardware executive Caitlin Kalinowski resigns in response to Pentagon deal

    9 March 2026

    Will Pentagon standoff over Anthropic scare startups out of defense work?

    9 March 2026

    A roadmap for artificial intelligence, if anyone will listen

    8 March 2026
  • Apps

    X says it will suspend creators from revenue sharing program for AI posts without ‘armed conflict’ tag

    10 March 2026

    Periwinkle makes it even easier to host social media on Bluesky’s AT Protocol

    10 March 2026

    Meta will enable competing AI chatbots on WhatsApp in Europe, but for a fee

    9 March 2026

    Match Group COO out as dating apps struggle to connect with Gen Z

    9 March 2026

    Roblox launches real-time AI chat rewording to filter out banned language

    8 March 2026
  • Crypto

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025

    MoviePass opens Mogul fantasy league game to the public

    29 October 2025
  • Fintech

    X taps William Shatner to give invitations to his payment service, X Money

    4 March 2026

    Stripe wants to turn your AI costs into a profit center

    3 March 2026

    3 days left: Save up to $680 on your ticket to Disrupt 2026

    25 February 2026

    More startups surpass $10M ARR in 3 months than ever before

    24 February 2026

    Stripe, PayPal Ventures Bet on India’s Xflow to Fix Cross-Border B2B Payments

    24 February 2026
  • Hardware

    Whoop is launching a new blood test focused on women’s health

    10 March 2026

    Honor says its ‘Robot phone’ with moving camera can dance to music

    8 March 2026

    Apple unveils M5 Pro and M5 Max chips with new ‘Fusion Architecture’

    8 March 2026

    Eight Sleep raises $50 million at $1.5 billion valuation

    7 March 2026

    Quantum scale-up Pasqal plans $2 billion SPAC listing, vows to ‘remain French’

    7 March 2026
  • Media & Entertainment

    Xprize Founder Peter Diamandis Launches New Contest To Announce New ‘Star Trek’

    10 March 2026

    It looks like the DOJ isn’t going to break up Live Nation and Ticketmaster

    9 March 2026

    PopSockets founder David Barnett talks about building a viral business

    7 March 2026

    Netflix acquires Ben Affleck’s AI film production company InterPositive

    6 March 2026

    Amazon is rolling out a redesigned Fire TV app

    6 March 2026
  • Security

    US military contractor likely built iPhone hacking tools used by Russian spies in Ukraine

    10 March 2026

    An iPhone hacking toolkit used by Russian spies likely came from a US military contractor

    10 March 2026

    Russian government hackers are targeting Signal and WhatsApp users, Dutch spies warn

    9 March 2026

    The Ring’s Jamie Siminoff tries to calm privacy fears from the Super Bowl, but his answers may not help

    9 March 2026

    Google says half of all zero-days it tracked in 2025 targeted buggy enterprise technology

    7 March 2026
  • Startups

    AI networking startup Eridu emerges from stealth with hefty $200M Series A

    10 March 2026

    Bluesky CEO Jay Graber is stepping down

    10 March 2026

    Science Corp. raises $230 million as it races to bring its brain implant to market

    6 March 2026

    EXCLUSIVE: Luma Launches Creative AI Agents Powered by New ‘Unified Intelligence’ Models

    6 March 2026

    How 1,000+ Customer Calls Shaped a Groundbreaking AI Business

    5 March 2026
  • Transportation

    Electric air taxi maker Archer hits back at Joby alleging hidden Chinese ties

    10 March 2026

    Electric air taxis are set to fly in 26 states

    10 March 2026

    The 2027 Chevy Bolt is the McRib of the automotive world

    9 March 2026

    TechCrunch Mobility: Rivian’s R2 game

    9 March 2026

    OSHA death detection at Rivian warehouse

    7 March 2026
  • Venture

    This SpaceX Veteran Says The Next Big Thing In Space Is Satellites Returning To Earth

    10 March 2026

    Founders Fund is approaching $6 billion for its latest growth fund, sources say

    10 March 2026

    Robinhood’s startup fund stumbles in its NYSE debut

    7 March 2026

    City Detect, which uses artificial intelligence to help cities stay safe and clean, raises $13M Series A

    7 March 2026

    Lio raises $30 million from Andreessen Horowitz and others to automate business procurement

    5 March 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Running AI models turns into a memory game
AI

Running AI models turns into a memory game

techtost.comBy techtost.com18 February 202603 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Running Ai Models Turns Into A Memory Game
Share
Facebook Twitter LinkedIn Pinterest Email

When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs — but memory is an increasingly important part of the picture. As superscalers prepare to build new billion-dollar data centers, the price of DRAM chips has soared about 7 times in the last year.

At the same time, there is an increasing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. Companies that own it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business.

Semiconductor Analyzer Doug O’Loughlin has an interesting look at the importance of memory chips in his Substack where he chats with Val Bercovici, Head of AI at Weka. They’re both types of semiconductors, so the focus is more on the chips than the broader architecture. The implications for AI software are also very significant.

I was particularly struck by this passage, in which Bercovici examines the increasing complexity of Anthropic direct caching documentation:

Tell it is if we go to Anthropic’s direct caching pricing page. It started as a very simple page six or seven months ago, especially as Claude Code came out — just “use caching, it’s cheaper”. Now it’s an encyclopedia of advice on exactly how much cache writes to pre-purchase. You have 5-minute levels, which are very common across the industry, or 1-hour levels — and nothing more. This is a very important element. Then, of course, you have all kinds of arbitrage opportunities around pricing for cache reads based on the number of cache writes you’ve pre-purchased.

The question here is how long Claude caches your prompt: You can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to pull data that’s still in cache, so if you manage it right, you can save a lot. However, there’s a catch: Each new piece of data you add to the query may display something else than the cache window.

This is complex stuff, but the bottom line is pretty simple: Memory management in AI models is going to be a huge part of AI in the future. Companies that do it well will rise to the top.

And there is much progress to be made in this new field. Back in October, I covered a startup called Tensormesh that was working on a layer in the stack known as cache optimization.

Techcrunch event

Boston, MA
|
June 23, 2026

Opportunities exist elsewhere in the stack. For example, lower down the stack, there’s the question of how data centers use the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, though it’s pretty deep into the hardware.) Further up, end users figure out how to structure their model clusters to take advantage of shared cache.

As companies get better at orchestrating memory, they will use fewer tokens and inference will become cheaper. Meantime, Models become more efficient in processing each tokenpushing costs even further. As server costs come down, many applications that don’t seem viable now will start to increase their profitability.

Claude dram Exclusive game Humane inference cost memory models running turns
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleUS court bans OpenAI from using ‘Cameo’
Next Article SpendRule Raises $2M, Comes From Stealth To Help Hospitals Track Spending
bhanuprakash.cg
techtost.com
  • Website

Related Posts

US military contractor likely built iPhone hacking tools used by Russian spies in Ukraine

10 March 2026

AI networking startup Eridu emerges from stealth with hefty $200M Series A

10 March 2026

Sandbar secures $23M Series A for AI note-taking ring

10 March 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

US military contractor likely built iPhone hacking tools used by Russian spies in Ukraine

10 March 2026

AI networking startup Eridu emerges from stealth with hefty $200M Series A

10 March 2026

Electric air taxi maker Archer hits back at Joby alleging hidden Chinese ties

10 March 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

X taps William Shatner to give invitations to his payment service, X Money

4 March 2026

Stripe wants to turn your AI costs into a profit center

3 March 2026

3 days left: Save up to $680 on your ticket to Disrupt 2026

25 February 2026
Startups

AI networking startup Eridu emerges from stealth with hefty $200M Series A

Bluesky CEO Jay Graber is stepping down

Science Corp. raises $230 million as it races to bring its brain implant to market

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.