基于计数相似性度量的机器生成生物医学图像评估

Research

arXiv

Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure

摘要 Abstract

超分辨率、图像修复、整图生成、非配对风格迁移以及网络约束图像重构都涉及机器学习图像合成的一个方面，而在使用时实际的真实情况未知。通常很难定量且权威地评估合成图像的质量；然而，在任务关键型生物医学场景中，稳健的评估至关重要。在本研究中，所有实用的图像到图像比较实际上是相对评价，而非绝对差异量化；因此，可以利用Tversky指数（一种用于评估感知相似性的成熟方法）来实现生成图像质量的有意义评价。该评估程序通过多个真实和模拟图像数据集进行开发和演示。主要结果是，当明确考虑任何特征编码选择的主观性和固有缺陷时，Tversky方法得出的结果直观，而基于深度特征空间距离汇总的传统方法则不然。

Super-resolution, in-painting, whole-image generation, unpaired style-transfer, and network-constrained image reconstruction each include an aspect of machine-learned image synthesis where the actual ground truth is not known at time of use. It is generally difficult to quantitatively and authoritatively evaluate the quality of synthetic images; however, in mission-critical biomedical scenarios robust evaluation is paramount. In this work, all practical image-to-image comparisons really are relative qualifications, not absolute difference quantifications; and, therefore, meaningful evaluation of generated image quality can be accomplished using the Tversky Index, which is a well-established measure for assessing perceptual similarity. This evaluation procedure is developed and then demonstrated using multiple image data sets, both real and simulated. The main result is that when the subjectivity and intrinsic deficiencies of any feature-encoding choice are put upfront, Tversky's method leads to intuitive results, whereas traditional methods based on summarizing distances in deep feature spaces do not.