Different AI labs have different priorities. OpenAI has traditionally focused on consumer users, for example, while competitor Anthropic tends to target enterprises. Elon Musk’s xAI, which we recently discovered, puts a special emphasis on video game descriptions.
On Friday, Business Insider’s Grace Kay posted a detailed and extensive report on xAIthe artificial intelligence startup recently acquired by SpaceX, with a special focus on how Musk is making life difficult for employees. But this particular anecdote stood out:
In one instance last year, the model’s launch was delayed for several days because Musk was unhappy with the way the chatbot answered detailed questions about the video game “Baldur’s Gate,” according to people familiar with the matter. High-level engineers were pulled from other projects to improve responses before the launch, they said.
Of course, you can imagine the frustration of any respectable and experienced engineer who shows up to work thinking they’ll be dealing with fundamental knowledge and machine intelligence problems, only to be sidetracked into helping a 54-year-old win his video game. But the anecdote raises an even more pressing question: Did Musk finally get the gaming skills he wanted?
To answer this question, RPG enthusiast Ram Iyer put together a set of five general questions about Baldur’s Gate, which we ran through xAI and the three major models in a kind of quasi-benchmark I’ve decided to call “BaldurBench.”
In the interest of journalistic transparency, I have made all the chat transcripts public, so you can view them here: Grok, ChatGPT, Claudeand Gemini.
First, the good news: Grok gives really good information. His answers were a bit thick with gamer jargon — “save-wipe” instead of save and “DPS” instead of damage — but the answers were useful and well-informed, provided you knew what he was talking about. Grok is also very fond of tables and theorycraftwhich is about what you would expect.
There are a lot of Baldur’s Gate guides out there, and the models generally drew from the same, so the biggest differences were stylistic. ChatGPT prefers bulleted lists and sentence fragments, while Gemini loves to emphasize important words.
Techcrunch event
Boston, MA
|
June 9, 2026
The biggest surprise was Claude, who was especially concerned about giving me information that would spoil my experience of the game. When I asked about good party compositions, he closed the tutorial by saying, “Don’t stress too much and just play what you think is fun.” Thanks, Claude!
It’s important to keep in mind, this is an issue we are aware of (thanks to Business Insider’s report) that xAI has focused specifically on achieving parity. So we shouldn’t read too much into the fact that, after the reported sprint, Grok’s advice was about the same as the other models. Still, it’s nice to know that xAI can make it work if it tries.
