农田分割的大规模图像-文本数据集基准

Research

arXiv

农田分割的大规模图像-文本数据集基准

A large-scale image-text dataset benchmark for farmland segmentation

Chao Tao ,

Dandan Zhong ,

Weiliang Mu ,

Zhuofei Du ,

Haiyang Wu

论文信息在线阅读PDF

摘要 Abstract

单纯依赖标注数据的传统深度学习范式在表征农田元素与其周围环境的空间关系方面存在局限性，难以有效建模农田动态时序演化和空间异质性。语言作为一种结构化的知识载体，能够明确表达农田的时空特性，如形状、分布及其周围环境信息。因此，基于语言驱动的学习范式可以有效缓解农田时空异质性带来的挑战。然而，在农田遥感影像领域，目前尚无全面支持这一研究方向的基准数据集。为填补这一空白，本文引入了基于语言描述的农田概念，并构建了FarmSeg-VL数据集，这是首个针对时空农田分割设计的细粒度图像-文本数据集。首先，本文提出了一种半自动标注方法，能够准确地为每张图像分配描述性标签，确保数据质量高且语义丰富，同时提高了数据集构建效率。其次，FarmSeg-VL 数据集在时空特性上表现显著：在时间维度上涵盖四季，在空间维度上覆盖中国八个典型农业区域。此外，就描述性标签而言，FarmSeg-VL 涵盖了农田丰富的时空特性，包括其固有属性、物候特征、空间分布、地形地貌特征以及周围环境的分布情况。最后，我们对视觉语言模型（VLMs）以及仅依赖标注数据训练的深度学习模型的表现进行了分析，展示了其作为农田分割标准基准的潜力。

The traditional deep learning paradigm that solely relies on labeled data has limitations in representing the spatial relationships between farmland elements and the surrounding environment.It struggles to effectively model the dynamic temporal evolution and spatial heterogeneity of farmland. Language,as a structured knowledge carrier,can explicitly express the spatiotemporal characteristics of farmland, such as its shape, distribution,and surrounding environmental information.Therefore,a language-driven learning paradigm can effectively alleviate the challenges posed by the spatiotemporal heterogeneity of farmland.However,in the field of remote sensing imagery of farmland,there is currently no comprehensive benchmark dataset to support this research direction.To fill this gap,we introduced language based descriptions of farmland and developed FarmSeg-VL dataset,the first fine-grained image-text dataset designed for spatiotemporal farmland segmentation.Firstly, this article proposed a semi-automatic annotation method that can accurately assign caption to each image, ensuring high data quality and semantic richness while improving the efficiency of dataset construction.Secondly,the FarmSeg-VL exhibits significant spatiotemporal characteristics.In terms of the temporal dimension,it covers all four seasons.In terms of the spatial dimension,it covers eight typical agricultural regions across China.In addition, in terms of captions,FarmSeg-VL covers rich spatiotemporal characteristics of farmland,including its inherent properties,phenological characteristics, spatial distribution,topographic and geomorphic features,and the distribution of surrounding environments.Finally,we present a performance analysis of VLMs and the deep learning models that rely solely on labels trained on the FarmSeg-VL,demonstrating its potential as a standard benchmark for farmland segmentation.