多实例学习与聚合:基于图学习的患者生存预测策略评估
MIL vs. Aggregation: Evaluating Patient-Level Survival Prediction Strategies Using Graph-Based Learning
摘要 Abstract
肿瘤学家常常依赖多种数据,包括全片扫描图像(WSI),以指导治疗决策,力求获得最佳患者结局。然而,由于肿瘤异质性、患者内部变异以及分析WSI的复杂性,预测癌症患者的预后可能是一项具有挑战性的任务。这些图像非常庞大,包含数十亿像素,直接处理计算成本高昂,需要专门的方法提取相关信息。此外,来自同一患者的多张WSI可能捕捉到不同的肿瘤区域,其中一些更具信息量。这引发了一个根本问题:我们应该使用所有WSI来表征患者,还是应该识别最具代表性的切片进行预后判断?我们的研究通过比较在WSI和患者层面上预测生存的不同策略,试图回答这一问题。前者将每张WSI视为独立样本,模仿其他研究采用的策略,而后者则包含聚合多张WSI预测结果的方法或使用多实例学习(MIL)自动识别最具相关性的切片的方法。此外,我们还在这些策略下评估了不同的图神经网络架构。我们使用MMIST-ccRCC数据集开展实验,该数据集包含患有透明细胞肾细胞癌(ccRCC)的患者。我们的结果显示,基于MIL的选择可以提高准确性,表明选择最具代表性的切片有助于生存预测。
Oncologists often rely on a multitude of data, including whole-slide images (WSIs), to guide therapeutic decisions, aiming for the best patient outcome. However, predicting the prognosis of cancer patients can be a challenging task due to tumor heterogeneity and intra-patient variability, and the complexity of analyzing WSIs. These images are extremely large, containing billions of pixels, making direct processing computationally expensive and requiring specialized methods to extract relevant information. Additionally, multiple WSIs from the same patient may capture different tumor regions, some being more informative than others. This raises a fundamental question: Should we use all WSIs to characterize the patient, or should we identify the most representative slide for prognosis? Our work seeks to answer this question by performing a comparison of various strategies for predicting survival at the WSI and patient level. The former treats each WSI as an independent sample, mimicking the strategy adopted in other works, while the latter comprises methods to either aggregate the predictions of the several WSIs or automatically identify the most relevant slide using multiple-instance learning (MIL). Additionally, we evaluate different Graph Neural Networks architectures under these strategies. We conduct our experiments using the MMIST-ccRCC dataset, which comprises patients with clear cell renal cell carcinoma (ccRCC). Our results show that MIL-based selection improves accuracy, suggesting that choosing the most representative slide benefits survival prediction.