Gemma 3技术报告 - arXiv

Research

arXiv

Gemma 3技术报告

Gemma 3 Technical Report

Aishwarya Kamath ,

Shreya Pathak ,

Nino Vieillard ,

Ramona Merhej ,

Tatiana Matejovicova ,

Alexandre Ramé ,

Morgane Rivière ,

Louis Rouillard ,

Thomas Mesnard ,

Geoffrey Cideron ,

Jean-bastien Grill ,

Edouard Yvinec ,

Michelle Casbon ,

Francesco Visin ,

Kathleen Kenealy ,

Anton Tsitsulin ,

Robert Busa-Fekete ,

Noveen Sachdeva ,

Benjamin Coleman ,

Basil Mustafa ,

Emilio Parisotto ,

Jan-Thorsten Peter ,

Danila Sinopalnikov ,

Surya Bhupatiraju ,

Rishabh Agarwal ,

Mehran Kazemi ,

Idan Brusilovsky ,

Andreas Steiner ,

Abhanshu Sharma ,

Abheesht Sharma ,

Adi Mayrav Gilady ,

Adrian Goedeckemeyer ,

Alexander Kolesnikov ,

Alexei Bendebury ,

Alvin Abdagic ,

András György ,

André Susano Pinto ,

Antoine Miech ,

Antonia Paterson ,

Ashish Shenoy ,

Ayan Chakrabarti ,

Bobak Shahriari ,

Bryce Petrini ,

Charline Le Lan ,

Christopher A. Choquette-Choo ,

Daniel Deutsch ,

Danielle Eisenbud ,

Dimitris Paparas ,

Divyashree Shivakumar Sreepathihalli ,

Erwin Huizenga ,

Eugene Kharitonov ,

Frederick Liu ,

Gagik Amirkhanyan ,

Glenn Cameron ,

Hanna Klimczak-Plucińska ,

Harshal Tushar Lehri ,

Hussein Hazimeh ,

Ian Ballantyne ,

Idan Szpektor ,

Jean Pouget-Abadie ,

Joseph Fernandez ,

Jyotinder Singh ,

Kiran Vodrahalli ,

Marcella Valentine ,

Marina Coelho ,

Marvin Ritter ,

Matthew Watson ,

Mayank Chaturvedi ,

Michael Moynihan ,

Nikola Momchev ,

Nilay Chauhan ,

Noveen Sachdeva ,

Pankil Botarda ,

Paul Kishan Rubenstein ,

Phil Culliton ,

Philipp Schmid ,

Pier Giuseppe Sessa ,

Piotr Stanczyk ,

Rakesh Shivanna ,

Rob Willoughby ,

Sertan Girgin ,

Shashir Reddy ,

Sijal Bhatnagar ,

Sindhu Raghuram Panyam ,

Trevor Yacovone ,

Tyler Liechty ,

Vincent Roseberry ,

Vlad Feinberg ,

Vlad Kolesnikov ,

Victor Cotruta ,

Erica Moreira ,

Luiz Gustavo Martins ,

Omar Sanseviero ,

Lucas Gonzalez ,

Zach Gleicher ,

Tris Warkentin ,

Vahab Mirrokni ,

Joelle Barral ,

Zoubin Ghahramani ,

Oriol Vinyals ,

Demis Hassabis ,

Koray Kavukcuoglu ,

Clement Farabet ,

Elena Buchatskaya ,

Jean-Baptiste Alayrac ,

Sebastian Borgeaud ,

Olivier Bachem ,

Armand Joulin ,

Cassidy Hardin ,

Robert Dadashi ,

Léonard Hussenot

论文信息在线阅读PDF

摘要 Abstract

我们介绍了Gemma 3，这是Gemma家族轻量级开源模型的一个多模态扩展版本，参数规模从10亿到270亿不等。此版本引入了视觉理解能力，覆盖了更广泛的语言，并支持更长的上下文——至少12.8万个标记。我们还调整了模型架构，通过增加局部注意力层与全局注意力层的比例并缩短局部注意力的跨度，减少了因长上下文导致的KV缓存内存爆炸问题。Gemma 3系列模型采用蒸馏训练方式，在预训练和指令微调版本中均优于Gemma 2。特别是我们的新型后训练方法显著提升了数学、对话、指令跟随和多语言能力，使Gemma3-4B-IT在基准测试中与Gemma2-27B-IT相当，而Gemma3-27B-IT则与Gemini-1.5-Pro相当。我们将所有模型发布给社区。

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

Gemma 3技术报告 - arXiv