基于视觉基础模型与适配器学习的多时相、多空间珊瑚礁状况多标签分类
Multi-label classification for multi-temporal, multi-spatial coral reef condition monitoring using vision foundation model with adapter learning
摘要 Abstract
珊瑚礁生态系统提供重要的生态服务,但受到气候变化和人类活动的重大威胁。尽管深度学习的进步使珊瑚礁状况的自动分类成为可能,但传统深度模型在处理复杂的水下生态图像时难以实现高性能。视觉基础模型以其高精度和跨领域泛化能力提供了有前景的解决方案,然而微调这些模型需要大量的计算资源并导致较高的碳排放。为了解决这些问题,低秩适配(LoRA)等适配器学习方法应运而生。本研究引入了一种结合DINOv2视觉基础模型与LoRA微调方法的方法。该方法利用通过泰国龟岛15个潜水点的水下调查收集的多时相现场图像,并根据公民科学保护计划采用的通用标准对所有图像进行标注。实验结果表明,DINOv2-LoRA模型的匹配比率达到64.77%,优于最佳的传统模型的60.34%。此外,使用LoRA将可训练参数从1,100M减少到5.91M。在不同时间和空间设置下的迁移学习实验突显了DINOv2-LoRA在不同季节和地点之间卓越的泛化能力。本研究首次探索了在多时相和多空间条件下,高效适应基础模型用于珊瑚礁状况多标签分类的方法。所提出的方法推进了珊瑚礁状况的分类,并为监测、保护和管理珊瑚礁生态系统提供了工具。
Coral reef ecosystems provide essential ecosystem services, but face significant threats from climate change and human activities. Although advances in deep learning have enabled automatic classification of coral reef conditions, conventional deep models struggle to achieve high performance when processing complex underwater ecological images. Vision foundation models, known for their high accuracy and cross-domain generalizability, offer promising solutions. However, fine-tuning these models requires substantial computational resources and results in high carbon emissions. To address these challenges, adapter learning methods such as Low-Rank Adaptation (LoRA) have emerged as a solution. This study introduces an approach integrating the DINOv2 vision foundation model with the LoRA fine-tuning method. The approach leverages multi-temporal field images collected through underwater surveys at 15 dive sites at Koh Tao, Thailand, with all images labeled according to universal standards used in citizen science-based conservation programs. The experimental results demonstrate that the DINOv2-LoRA model achieved superior accuracy, with a match ratio of 64.77%, compared to 60.34% achieved by the best conventional model. Furthermore, incorporating LoRA reduced the trainable parameters from 1,100M to 5.91M. Transfer learning experiments conducted under different temporal and spatial settings highlight the exceptional generalizability of DINOv2-LoRA across different seasons and sites. This study is the first to explore the efficient adaptation of foundation models for multi-label classification of coral reef conditions under multi-temporal and multi-spatial settings. The proposed method advances the classification of coral reef conditions and provides a tool for monitoring, conserving, and managing coral reef ecosystems.