Google Gemini is a big step forward for AI models
For consumers, Google Bard and Pixel 8 Pro phones will offer a first tryst with the new model.
Google’s next-generation foundation model, Gemini, is ready for prime-time. Perceived to be more capable, flexible, and better optimised for smartphones and is believed to be the first AI model to score more than 90% on the MMLU, or Massive Multitask Language Understanding benchmark to evaluate how well a model understands language and can proceed with problem-solving. It must, after all, live up to a family legacy. AlphaGo (2016), Bert (2018), LaMDA (2020), MUM (2021) and PaLM 2 (2023), its predecessors.
Then there’s the competition. Microsoft has announced its Copilot AI chatbot now integrates OpenAI’s latest model, GPT-4 Turbo, alongside image generator DALL-E 3. Anthropic’s Claude 2.1 can analyse as many as 1,50,000 words in a single prompt, which is claimed to be an industry first.
Gemini 1.0 will be available in three sizes, a unique development in itself, which is defined by levels of capabilities and requirements of processing power. Developers, for Android and enterprises, will have greater flexibility to work with. Gemini Ultra will be its most powerful and capable model for complex implementations, Gemini Pro is designed to be a size smaller for scaling across more specific customisations, while Gemini Nano is the model optimised for on-device AI tasks, with smartphones expected to be in focus.
“Every technology shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or the web before it,” said Sundar Pichai, CEO of Google and Alphabet, in a note accompanying the Gemini unveil.
Google has every reason to be confident.
“Before bringing it to the public, we ran Gemini Pro through several industry-standard benchmarks. In six out of eight benchmarks, Gemini Pro outperformed GPT-3.5, including in MMLU (Massive Multitask Language Understanding), one of the key leading standards for measuring large AI models, and GSM8K, which measures grade school math reasoning,” says Sissie Hsiao, Vice President and General Manager for Assistant and Bard, at Google.
There is good reason for this confidence, more so because of the test results. Gemini Ultra became the first model to score 90.0% on MMLU (massive multitask language understanding), which benchmarks with a combination of 57 subjects including math, physics, history, law, medicine and ethics, for testing world knowledge and problem-solving abilities.
In comparison, OpenAI’s GPT-4 was the previous MMLU leader, with an 86.5% average. And representative of a big step forward for Google’s AI models, since PaLM 2 scored an average of 81.2% in 2023 benchmarks.
“It was built from the ground up to be multimodal, which means it can generalise and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video,” says Demis Hassabis, CEO, Google DeepMind.
In a briefing for Gemini, of which HT was a part, Google demoed the model’s understanding capabilities as part of the text process. It indicated an ability to understand changing specifics in a video Q&A session between a human and the AI model, with correct inferences undeniable. But it remains to be seen how well it will perform amidst the many variables of a real-world environment.
Soon, your interactions with Gemini AI
Google intends to move quickly with Gemini for consumers, as well, as enterprise implementations.
As you read this, the company’s Bard chatbot will immediately begin using Gemini Pro. This can only be classified as the biggest update to Bard since it rolled out a few months ago. Google insists it’ll make the chatbot respond to queries with advanced reasoning, planning and understanding. No update is too small, amidst competition with OpenAI’s GPT-4 based ChatGPT, Microsoft Copilot that’s getting deeper integration within Windows PCs getting the GPT-4 Turbo upgrade, and Anthropic’s Claude, its closest competitors.
Gemini Nano will see its first implementation in the company’s latest Pixel 8 Pro smartphone. Google confirms the Pixel 8 Pro is the first smartphone engineered to run Gemini Nano, powering new features including Summarize in the Recorder app and rolling out in Smart Reply in Gboard, starting with WhatsApp (though more messaging apps join next year).
“Gemini Nano, our most efficient model built for on-device tasks, runs directly on mobile silicon, opening support for a range of important use cases. Running on-device enables features where the data should not leave the device, such as suggesting replies to messages in an end-to-end encrypted messaging app,” says Dave Burke, VP of Engineering at Google.
Gemini Nano has evolved from the larger Gemini models, with specific optimisations to run on mobile silicon accelerators.
Experimentations have already begun with the Search Generative Experience (SGE) in Google Search, while Ads, Chrome and Duet AI will also get some level of Gemini integration in the coming weeks.
Developers and enterprises will get access to Gemini Pro on December 13, while Android developers will also be able to use Gemini Nano for on-device tasks, a glimpse of which Google will give the world using AICore on the Pixel 8 Pro phone. AICore on an Android smartphone handles model management, runtimes, safety features and more.
What about Gemini Ultra? “We are currently completing extensive trust and safety checks, including red-teaming by trusted external parties, and further refining the model using fine-tuning and reinforcement learning from human feedback (RLHF) before making it broadly available,” says Google, following through on the promise of responsible AI deployment, something that repeatedly emphasised.
Once that happens, the launch of Bard Advanced is expected early next year. This will be a refreshed and sharper AI experience, which will also have access to the Gemini Ultra model. At this time, it is not clear if Google intends to monetise the more advanced version of Bard for enterprises or consumers. And if at all, how much the subscription would cost.