If you needed more proof that GenAI is prone to making things up, Google’s chatbot Gemini, formerly Bard, thinks the 2024 Super Bowl has already happened. It even has the (fantastic) stats to back it up.
According to a Reddit Thread, Gemini, powered by Google’s eponymous GenAI models, answers questions about Super Bowl LVIII as if the game ended yesterday — or weeks ago. Like many betting companies, it seems to favor the Chiefs over the 49ers (sorry, San Francisco fans).
Gemini embellishes quite creatively, in at least one instance giving a player a stat breakdown that suggests Kansas Chiefs quarterback Patrick Mahomes rushed for 286 yards for two touchdowns and an interception to Brock Purdy’s 253 yards and a touchdown.
It’s not just Gemini. Microsoft’s Copilot chatbot also insists the game is over and provides false reports to back up the claim. But — perhaps it reflects a San Francisco bias! — says the 49ers, not the Chiefs, emerged victorious “by a final score of 24-21.”
Copilot is powered by a GenAI model similar, if not identical, to the model supporting OpenAI’s ChatGPT (GPT-4). But in my tests, ChatGPT didn’t want to make the same mistake.
It’s all rather silly — and likely resolved by now, given that this reporter didn’t have the luck to reproduce Gemini’s responses in the Reddit thread. (I’d be shocked if Microsoft wasn’t also working on a fix.) But it also shows the main limitations of current GenAI — and the dangers of relying too much on it.
GenAI models have no real intelligence. By being fed a huge number of examples typically drawn from the public web, AI models learn how likely data (eg text) is to occur based on patterns, including the context of any context data.
This probability-based approach works extremely well at scale. But while the range of words and their probabilities are likely to arrive at text that makes sense is far from certain. LLMs can produce something that is grammatically correct but nonsensical, for example – like the Golden Gate claim. Or they can spew inaccuracies, propagating inaccuracies in their training data.
The Super Bowl misinformation is certainly not the most damaging example of GenAI failing. This distinction probably lies validating torment, booster ethnic and racial stereotypes or persuasive writing about conspiracy theories. It is, however, a useful reminder to double-check statements from GenAI bots. There’s a good chance it’s not true.