Artificial Intelligence

Google Introduces Gemma 2: Elevating AI Performance, Speed and Accessibility for Developers

Published July 4, 2024

Dr. Tehseen Zia

Google has unveiled Gemma 2, the latest iteration of its open-source lightweight language models, available in 9 billion (9B) and 27 billion (27B) parameter sizes. This new version promises enhanced performance and faster inference compared to its predecessor, the Gemma model. Gemma 2, derived from Google’s Gemini models, is designed to be more accessible for researchers and developers, offering substantial improvements in speed and efficiency. Unlike the multimodal and multilingual Gemini models, Gemma 2 focuses solely on language processing. In this article, we’ll delve into the standout features and advancements of Gemma 2, comparing it with its predecessors and competitors in the field, highlighting its use cases and challenges.

Building Gemma 2

Like its predecessor, the Gemma 2 models are based on a decoder-only transformer architecture. The 27B variant is trained on 13 trillion tokens of mainly English data, while the 9B model uses 8 trillion tokens, and the 2.6B model is trained on 2 trillion tokens. These tokens come from a variety of sources, including web documents, code, and scientific articles. The model uses the same tokenizer as Gemma 1 and Gemini, ensuring consistency in data processing.

Gemma 2 is pre-trained using a method called knowledge distillation, where it learns from the output probabilities of a larger, pre-trained model. After initial training, the models are fine-tuned through a process called instruction tuning. This starts with supervised fine-tuning (SFT) on a mix of synthetic and human-generated English text-only prompt-response pairs. Following this, reinforcement learning with human feedback (RLHF) is applied to improve the overall performance

Gemma 2: Enhanced Performance and Efficiency Across Diverse Hardware

Gemma 2 not only outperforms Gemma 1 in performance but also competes effectively with models twice its size. It’s designed to operate efficiently across various hardware setups, including laptops, desktops, IoT devices, and mobile platforms. Specifically optimized for single GPUs and TPUs, Gemma 2 enhances the efficiency of its predecessor, especially on resource-constrained devices. For example, the 27B model excels at running inference on a single NVIDIA H100 Tensor Core GPU or TPU host, making it a cost-effective option for developers who need high performance without investing heavily in hardware.

Additionally, Gemma 2 offers developers enhanced tuning capabilities across a wide range of platforms and tools. Whether using cloud-based solutions like Google Cloud or popular platforms like Axolotl, Gemma 2 provides extensive fine-tuning options. Integration with platforms such as Hugging Face, NVIDIA TensorRT-LLM, and Google’s JAX and Keras allows researchers and developers to achieve optimal performance and efficient deployment across diverse hardware configurations.

Gemma 2 vs. Llama 3 70B

When comparing Gemma 2 to Llama 3 70B, both models stand out in the open-source language model category. Google researchers claim that Gemma 2 27B delivers performance comparable to Llama 3 70B despite being much smaller in size. Additionally, Gemma 2 9B consistently outperforms Llama 3 8B in various benchmarks such as language understanding, coding, and solving math problems,.

One notable advantage of Gemma 2 over Meta’s Llama 3 is its handling of Indic languages. Gemma 2 excels due to its tokenizer, which is specifically designed for these languages and includes a large vocabulary of 256k tokens to capture linguistic nuances. On the other hand, Llama 3, despite supporting many languages, struggles with tokenization for Indic scripts due to limited vocabulary and training data. This gives Gemma 2 an edge in tasks involving Indic languages, making it a better choice for developers and researchers working in these areas.

Use Cases

Based on the specific characteristics of the Gemma 2 model and its performances in benchmarks, we have been identified some practical use cases for the model.

Multilingual Assistants: Gemma 2’s specialized tokenizer for various languages, especially Indic languages, makes it an effective tool for developing multilingual assistants tailored to these language users. Whether seeking information in Hindi, creating educational materials in Urdu, marketing content in Arabic, or research articles in Bengali, Gemma 2 empowers creators with effective language generation tools. A real-world example of this use case is Navarasa, a multilingual assistant built on Gemma that supports nine Indian languages. Users can effortlessly produce content that resonates with regional audiences while adhering to specific linguistic norms and nuances.
Educational Tools: With its capability to solve math problems and understand complex language queries, Gemma 2 can be used to create intelligent tutoring systems and educational apps that provide personalized learning experiences.
Coding and Code Assistance: Gemma 2’s proficiency in computer coding benchmarks indicates its potential as a powerful tool for code generation, bug detection, and automated code reviews. Its ability to perform well on resource-constrained devices allows developers to integrate it seamlessly into their development environments.
Retrieval Augmented Generation (RAG): Gemma 2’s strong performance on text-based inference benchmarks makes it well-suited for developing RAG systems across various domains. It supports healthcare applications by synthesizing clinical information, assists legal AI systems in providing legal advice, enables the development of intelligent chatbots for customer support, and facilitates the creation of personalized education tools.

Limitations and Challenges

While Gemma 2 showcases notable advancements, it also faces limitations and challenges primarily related to the quality and diversity of its training data. Despite its tokenizer supporting various languages, Gemma 2 lacks specific training for multilingual capabilities and requires fine-tuning to effectively handle other languages. The model performs well with clear, structured prompts but struggles with open-ended or complex tasks and subtle language nuances like sarcasm or figurative expressions. Its factual accuracy isn’t always reliable, potentially producing outdated or incorrect information, and it may lack common sense reasoning in certain contexts. While efforts have been made to address hallucinations, especially in sensitive areas like medical or CBRN scenarios, there’s still a risk of generating inaccurate information in less refined domains such as finance. Moreover, despite controls to prevent unethical content generation like hate speech or cybersecurity threats, there are ongoing risks of misuse in other domains. Lastly, Gemma 2 is solely text-based and does not support multimodal data processing.

The Bottom Line

Gemma 2 introduces notable advancements in open-source language models, enhancing performance and inference speed compared to its predecessor. It is well-suited for various hardware setups, making it accessible without significant hardware investments. However, challenges persist in handling nuanced language tasks and ensuring accuracy in complex scenarios. While beneficial for applications like legal advice and educational tools, developers should be mindful of its limitations in multilingual capabilities and potential issues with factual accuracy in sensitive contexts. Despite these considerations, Gemma 2 remains a valuable option for developers seeking reliable language processing solutions.