How many P’s are there in Google? According to Google, there are two.
There’s also “exactly 1 ‘r’ in the word ‘evil,'” says Google’s AI Overview, as well as two “d’s” in the word journalism, but it’s spelled: journalism. Google at least detected that there is a P in the US president’s last name, but spelled it as trpum.
You didn’t have to be a prophet to predict that Google’s overhaul of AI search going forward wouldn’t go well. We’ve done this before. The first time Google added AI Overviews to Search, the feature ended up citing satirical posts from The Onion and Reddit advising people to eat rocks and put glue on their pizza.
This time around, as Google doubles down on its commitment to making genetic AI the centerpiece of its 29-year-old flagship, it’s no wonder we’re seeing it falter.
“Measuring within words has been a known challenge for LLMs, and we’re working to fix this particular problem,” Google told TechCrunch in an emailed statement.
These basic spelling mistakes may look familiar. LLMs, the kind of artificial intelligence that powers chatbots and other text generators, are not built to understand spelling. It’s been a running joke for years that whenever a company unveils a new AI model, you have to ask them how many ‘r’s are in the word strawberry. These AI models—which can code an app in seconds or solve problems that have puzzled mathematicians for decades—are about as good as a kindergarten teacher at spelling.
However, the woes of Google’s AI overview go beyond silly misspellings. Google is already fixing an issue from last week where searching for the word “ignore” would return what looked like a dictionary definition of the word, only the definition appeared as “Understood. Let me know when you have a new message or question!” But these misspellings have remained fun because they’re so hard to undo.
As the researchers previously explained when we asked about these spelling puzzles, AI doesn’t perceive sentences as linguistic units made up of words and letters. Many LLMs are built on transformer models, which parse text into tokens, which can be full words, syllables, or letters, depending on the model. Instead of “reading” as a human would, the AI turns the text into numerical representations of itself, which are then adapted to the context to help the AI come up with a logical answer.
“LLMs are based on this transformer architecture, which mostly doesn’t read text. What happens when you enter a prompt is that it gets translated into coding,” Matthew Guzdial, an artificial intelligence researcher and assistant professor at the University of Alberta, told TechCrunch. “When he sees the word ‘h’, he has this encoding of what ‘the’ means, but he doesn’t know about ‘T’, ‘H’, ‘E’.
The token-based architecture powering LLMs like Google’s AI overview is inherently limiting, and researchers weren’t optimistic that they could solve the spelling problem.
“It’s kind of hard to get past the question of what exactly a ‘word’ should be for a language model, and even if we got experts to agree on a perfect vocabulary, the models would probably still find it useful to ‘chunk’ things up even further,” Sheridan Feucht, a PhD student studying the interpretation of large language models at Northeastern University, told TechCrunch. “My guess would be that there is no perfect brand factor because of this kind of ambiguity.”
This is not necessarily a pressing problem in the minds of researchers, as the usefulness of LLMs is not in their ability to spell. But these glaring failures help us remember that AI isn’t perfect, even if it can sometimes seem like an omniscient force beyond our understanding. We cannot blindly trust AI outputs without double-checking their accuracy.
When you purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.
