Open-Sora 2.0:在20万美元预算内训练商业级视频生成模型
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
摘要 Abstract
过去一年,视频生成模型取得了显著进展,AI视频质量不断提升,但代价是模型规模更大、数据量增加以及对训练算力的需求更高。本报告介绍了Open-Sora 2.0,这是一个仅花费20万美元便训练完成的商业级视频生成模型。通过该模型,我们证明了训练顶级视频生成模型的成本具有高度可控性。我们详细阐述了所有促成这一效率突破的技术,包括数据整理、模型架构、训练策略以及系统优化。根据人类评估结果和VBench评分,Open-Sora 2.0 的性能与开源的 HunyuanVideo 和闭源的 Runway Gen-3 Alpha 等全球领先的视频生成模型相当。通过完全开源 Open-Sora 2.0,我们旨在让先进视频生成技术惠及更多人,推动内容创作领域的更广泛创新与创造力。所有资源均可在以下地址获取:https://github.com/hpcaitech/Open-Sora。
Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.