OmniScience：专用于科学研究与发现的领域特定大型语言模型

Research

arXiv

OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery

Vignesh Prabhakar ,

Md Amirul Islam ,

Adam Atanas ,

Yao-Ting Wang ,

Joah Han ,

Aastha Jhunjhunwala ,

Rucha Apte ,

Robert Clark ,

Kang Xu ,

Zihan Wang ,

Kai Liu

论文信息在线阅读PDF

摘要 Abstract

大型语言模型（LLMs）在推动科学发展和解决复杂挑战方面展现出非凡潜力。本文介绍了一种专门用于通用科学推理的OmniScience模型，该模型通过三个关键组成部分构建而成：（1）在精心策划的科学文献语料库上进行领域适应预训练；（2）在专业数据集上进行指令微调，以引导模型完成特定领域的任务；（3）通过基于推理的知识蒸馏进行微调，显著提升其生成上下文相关且逻辑严谨响应的能力。我们通过开发一种高效评估分子作为潜在电解质溶剂或添加剂的电池代理，展示了OmniScience的多功能性。全面评估表明，OmniScience在GPQA Diamond及特定领域的电池基准测试中表现出与最先进的大型推理模型相当的竞争力，同时在参数量相似的情况下优于所有公开的推理和非推理模型。此外，消融实验进一步证明了领域适应预训练和基于推理的知识蒸馏对于实现我们在各基准测试中的性能水平至关重要。

Large Language Models (LLMs) have demonstrated remarkable potential in advancing scientific knowledge and addressing complex challenges. In this work, we introduce OmniScience, a specialized large reasoning model for general science, developed through three key components: (1) domain adaptive pretraining on a carefully curated corpus of scientific literature, (2) instruction tuning on a specialized dataset to guide the model in following domain-specific tasks, and (3) reasoning-based knowledge distillation through fine-tuning to significantly enhance its ability to generate contextually relevant and logically sound responses. We demonstrate the versatility of OmniScience by developing a battery agent that efficiently ranks molecules as potential electrolyte solvents or additives. Comprehensive evaluations reveal that OmniScience is competitive with state-of-the-art large reasoning models on the GPQA Diamond and domain-specific battery benchmarks, while outperforming all public reasoning and non-reasoning models with similar parameter counts. We further demonstrate via ablation experiments that domain adaptive pretraining and reasoning-based knowledge distillation are critical to attain our performance levels, across benchmarks.