基于特征增强的机器学习在医疗数据全因死亡率预测中的应用

Feature-Enhanced Machine Learning for All-Cause Mortality Prediction in Healthcare Data

摘要 Abstract

准确的患者死亡率预测能够实现有效的风险分层,从而制定个性化治疗方案并改善患者预后。然而,在医疗领域预测死亡率仍是一项重大挑战,现有研究多集中于特定疾病或有限的预测因子集。本研究利用MIMIC-III数据库,采用综合的特征工程方法评估了多种机器学习模型用于院内全因死亡率预测的效果。在临床专业知识和文献指导下,我们提取了诸如生命体征(如心率、血压)、实验室结果(如肌酐、葡萄糖)以及人口统计学信息等关键特征。随机森林模型表现最佳,其曲线下面积(AUC)达到0.94,显著优于其他机器学习和深度学习方法。这表明随机森林在处理高维、噪声大的临床数据方面具有稳健性,并有望开发出有效的临床决策支持工具。我们的研究强调了精心设计特征工程对于准确预测死亡率的重要性。最后,我们讨论了该模型在临床应用中的意义,并提出了未来的研究方向,包括提高模型鲁棒性及针对特定疾病的预测模型优化。

Accurate patient mortality prediction enables effective risk stratification, leading to personalized treatment plans and improved patient outcomes. However, predicting mortality in healthcare remains a significant challenge, with existing studies often focusing on specific diseases or limited predictor sets. This study evaluates machine learning models for all-cause in-hospital mortality prediction using the MIMIC-III database, employing a comprehensive feature engineering approach. Guided by clinical expertise and literature, we extracted key features such as vital signs (e.g., heart rate, blood pressure), laboratory results (e.g., creatinine, glucose), and demographic information. The Random Forest model achieved the highest performance with an AUC of 0.94, significantly outperforming other machine learning and deep learning approaches. This demonstrates Random Forest's robustness in handling high-dimensional, noisy clinical data and its potential for developing effective clinical decision support tools. Our findings highlight the importance of careful feature engineering for accurate mortality prediction. We conclude by discussing implications for clinical adoption and propose future directions, including enhancing model robustness and tailoring prediction models for specific diseases.

基于特征增强的机器学习在医疗数据全因死亡率预测中的应用 - arXiv