Google is trying to make waves with Gemini, a flagship suite of AI models, applications and services. But while the Gemini looks promising in some aspects, it falls short in others – as our informal review revealed.
So what is Gemini? How can you use it? And how does it measure up to the competition?
To make it easier to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll update as new Gemini models and features are released.
What is Gemini?
Gemini is from Google long promised, a family of next-generation GenAI models, developed by Google’s AI research labs DeepMind and Google Research. It comes in three flavors:
- Gemini Ultrathe iconic Gemini model.
- Gemini Proa “lite” Gemini model.
- Twins Nanoa smaller “distilled” model that runs on mobile devices like the Pixel 8 Pro.
All Gemini models were trained to be “innately multimodal” — in other words, able to work and use more than words. They were pre-trained and enhanced on a variety of audio, image and video, a large set of codebases and text in different languages.
This sets Gemini apart from models like Google’s LaMDA, which was trained exclusively on text data. LaMDA can’t understand or create anything other than text (eg essays, email drafts), but that’s not the case with the Gemini models.
What is the difference between Gemini apps and Gemini models?
Google, proving once again that it has no talent for branding, didn’t make it clear from the start that Gemini is separate and distinct from the Gemini web and mobile apps (formerly Bard). Gemini apps are just an interface through which some Gemini models can be accessed — think of it as a client for Google’s GenAI.
Incidentally, Gemini apps and models are also completely independent of Imagen 2, Google’s text-to-image model available in some of the company’s programming tools and environments. Don’t worry – you’re not the only one confused by this.
What can Gemini do?
Because Gemini models are multimodal, they can theoretically perform a range of multimodal tasks, from transcribing speech to captioning images and videos to creating artwork. Few of these features have made it to the product stage yet (more on that later), but Google is promising all of them—and more—at some point in the not-too-distant future.
Of course, it’s a little hard to take the company at its word.
Google seriously underdelivered with the initial release of Bard. And more recently he ruffled feathers with a video purporting to show Gemini’s abilities that turned out to be heavily edited and a little too ambitious.
However, assuming Google is more or less honest with its claims, here’s what the different tiers of Gemini will be able to do once they reach their full potential:
Gemini Ultra
Google says that the Gemini Ultra — thanks to its versatility — can be used to help with things like physics homework, solving step-by-step problems on a worksheet, and pointing out potential mistakes in already filled-in answers.
Gemini Ultra can also be applied to tasks such as locating scientific papers relevant to a particular problem, Google says — extracting information from those papers and “updating” a graph from one by creating the formulas necessary to recreate it chart with more recent data.
Gemini Ultra technically supports image creation, as mentioned earlier. But this feature hasn’t made it to the production version of the model yet—perhaps because the mechanism is more complex than how apps like ChatGPT generate images. Instead of asking for feed to an image generator (like DALL-E 3, in the case of ChatGPT), Gemini outputs images “natively”, without an intermediate step.
Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI developer platform, and AI Studio, Google’s online tool for app and platform developers. It also powers Gemini apps — but not for free. Access to Gemini Ultra through what Google calls Gemini Advanced requires a subscription to the Google One AI Premium Program, priced at $20 per month.
The AI Premium plan also connects Gemini to your broader Google Workspace account — think emails in Gmail, documents in Docs, presentations in Sheets, and recordings in Google Meet. This is useful, for example, for summarizing emails or taking notes from Gemini during a video call.
Gemini Pro
Google says Gemini Pro is an improvement over LaMDA in reasoning, planning and understanding capabilities.
An independent study Carnegie Mellon and BerriAI researchers found that Gemini Pro is indeed better than OpenAI’s GPT-3.5 at handling larger and more complex reasoning chains. But the study also found that, like all major language models, Gemini Pro particularly struggles with math problems involving lots of digits, and users have found many examples of poor reasoning and mistakes.
However, Google’s promised improvements — and the first one arrived in the form of Gemini 1.5 Pro.
Designed to replace Gemini 1.5 Pro (currently in preview) it has improved in many areas compared to its predecessor, perhaps most significantly in the amount of data it can process. Gemini 1.5 Pro can (in a limited private preview) take ~700,000 words or ~30,000 lines of code — 35 times the amount that Gemini 1.0 Pro can handle. And — the model is multimodal — it’s not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in a variety of different languages, albeit slowly (eg, searching for a scene in an hour-long video takes 30 seconds to a minute to process).
Gemini Pro is also available via API in Vertex AI to accept text as input and generate text as output. An additional endpoint, Gemini Pro Vision, can process text and images — including photos and videos — and text extraction according to OpenAI’s GPT-4 model with Vision.
In Vertex AI, developers can adapt Gemini Pro to specific environments and use cases using a fine-tuning or “grounding” process. Gemini Pro can also connect to external third-party APIs to perform certain actions.
In AI Studio, there are workflows for creating structured chat messages using Gemini Pro. Developers have access to both Gemini Pro and Gemini Pro Vision endpoints and can adjust model temperature to control the creative range of the production and provide examples to guide tone and style — and also tune the security settings.
Twins Nano
Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s powerful enough to run directly on (some) phones instead of sending work to a server somewhere. So far it powers two features on the Pixel 8 Pro: Summary in Recorder and Smart Reply in Gboard.
The Recorder app, which allows users to tap a button to record and transcribe audio, includes a summary of your recorded conversations, interviews, presentations and other passages with Gemini support. Users get these summaries even if they don’t have a Wi-Fi signal or connection available — and for privacy, no data leaves their phone in the process.
The Gemini Nano is also found on Gboard, Google’s keyboard app, as a developer preview. There, it turns on a feature called Smart Reply, which helps you suggest the next thing you want to say when you’re having a conversation in a messaging app. The feature initially only works with WhatsApp, but will come to more apps in 2024, Google says.
Is Gemini better than OpenAI’s GPT-4?
Google has several times is advertised Gemini’s superiority in benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results in “30 of 32 widely used academic benchmarks used in the research and development of large language models.” The company says the Gemini Pro, meanwhile, is more capable at tasks like content summarization, brainstorming and writing than the GPT-3.5.
But leaving aside the question of whether the benchmarks actually indicate a better model, the scores cited by Google appear to be marginally better than their OpenAI counterparts. And — as mentioned earlier — some first impressions weren’t great, with users and academics pointing out that Gemini Pro tends to get the basics wrong, struggles with translations and gives poor coding suggestions.
How much will Gemini cost?
Gemini Pro is free to use in Gemini apps and, currently, AI Studio and Vertex AI.
Once Gemini Pro is out of preview at Vertex, however, the model will cost $0.0025 per character while the output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (approximately 140 to 250 words) and, in the case of models like the Gemini Pro Vision, per image ($0.0025).
Let’s say a 500-word article contains 2,000 characters. Summarizing this article with Gemini Pro would cost $5. Meanwhile, creating an article of similar length would cost $0.1.
Ultra pricing has yet to be announced.
Where can you test Gemini?
Gemini Pro
The easiest place to get to know Gemini Pro is the Gemini apps. Pro and Ultra answer queries in a range of languages.
Gemini Pro and Ultra are also accessible in preview on Vertex AI via an API. The API is free to use “within limits” for now and supports some regions, including Europe, as well as features like chat functionality and filtering.
Elsewhere, Gemini Pro and Ultra can be found in AI Studio. Using the service, developers can iterate Gemini-based prompts and chatbots, then get API keys to use in their apps — or export the code to a more comprehensive IDE.
Duet AI for developers, Google’s suite of AI-powered assistance tools for code completion and generation, now uses Gemini models. And Google brought Gemini models to its developer tools for its Chrome and Firebase mobile developer platforms.
Twins Nano
Gemini Nano is on the Pixel 8 Pro — and it’s coming to other devices in the future. Developers interested in incorporating the model into their Android apps can Sign up for a sneak peek.