预训练卷积神经网络及基础模型作为基于内容的医学图像检索特征提取器的评估

Research

arXiv

Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval

摘要 Abstract

医学图像检索是指在数据库中为给定查询图像找到相似图像的任务，其应用包括诊断支持。传统医学图像检索依赖于临床元数据，而基于内容的医学图像检索（CBMIR）则依赖于图像特征，这些特征可以自动或半自动地提取。许多方法已被提出用于CBMIR，其中利用预训练卷积神经网络（CNNs）是一种广泛应用的方法。然而，考虑到计算机视觉任务中基础模型的最新发展，也可以研究它们在CBMIR中的应用。本研究使用了来自知名预训练CNN和预训练基础模型的多个预训练特征提取器，并调查了八种二维（2D）和三维（3D）医学图像的CBMIR性能。此外，我们还研究了图像尺寸对CBMIR性能的影响。我们的结果显示，总体而言，对于2D数据集，基础模型在性能上明显优于CNNs，通用计算病理学自监督模型（UNI）在所有数据集和图像尺寸上的整体表现最佳。对于3D数据集，CNNs和基础模型表现出更具有竞争力的性能，组织病理学对比学习模型（CONCH）实现了最佳的整体性能。此外，我们的研究结果证实，虽然使用更大的图像尺寸（特别是在2D数据集上）可以获得稍好的性能，但即使使用较小的图像尺寸，仍然可以实现竞争性的CBMIR性能。我们的代码可在以下网址复现结果：https://github.com/masih4/MedImageRetrieval。

Medical image retrieval refers to the task of finding similar images for given query images in a database, with applications such as diagnosis support. While traditional medical image retrieval relied on clinical metadata, content-based medical image retrieval (CBMIR) depends on image features, which can be extracted automatically or semi-automatically. Many approaches have been proposed for CBMIR, and among them, using pre-trained convolutional neural networks (CNNs) is a widely utilized approach. However, considering the recent advances in the development of foundation models for various computer vision tasks, their application for CBMIR can also be investigated. In this study, we used several pre-trained feature extractors from well-known pre-trained CNNs and pre-trained foundation models and investigated the CBMIR performance on eight types of two-dimensional (2D) and three-dimensional (3D) medical images. Furthermore, we investigated the effect of image size on the CBMIR performance. Our results show that, overall, for the 2D datasets, foundation models deliver superior performance by a large margin compared to CNNs, with the general-purpose self-supervised model for computational pathology (UNI) providing the best overall performance across all datasets and image sizes. For 3D datasets, CNNs and foundation models deliver more competitive performance, with contrastive learning from captions for histopathology model (CONCH) achieving the best overall performance. Moreover, our findings confirm that while using larger image sizes (especially for 2D datasets) yields slightly better performance, competitive CBMIR performance can still be achieved even with smaller image sizes. Our codes to reproduce the results are available at: https://github.com/masih4/MedImageRetrieval.