基于临床笔记提取月经特征的多任务学习

Research

arXiv

基于临床笔记提取月经特征的多任务学习

Multi-Task Learning for Extracting Menstrual Characteristics from Clinical Notes

摘要 Abstract

月经健康是女性医疗保健中至关重要但常常被忽视的一部分。尽管其具有重要的临床意义，但在结构化的医疗记录中很少能够获得详细的月经特征数据。为了解决这一问题，我们提出了一种新颖的自然语言处理流程，用于提取关键的月经周期属性——痛经、规律性、流量和间期出血。我们的方法利用了带有基于多任务提示学习的GatorTron模型，并通过混合检索预处理步骤增强，以识别相关文本片段。即使在训练样本少于100个标注的临床笔记的情况下，该方法的表现也优于基线方法，在所有月经特征上的平均F1分数达到了90%。检索步骤在所有方法中都持续提高了性能，使模型能够专注于长篇临床笔记中最相关的部分。这些结果表明，结合多任务学习与检索可以提高月经特征的泛化能力和性能，推动从临床笔记中实现自动化提取，并支持女性健康研究。

Menstrual health is a critical yet often overlooked aspect of women's healthcare. Despite its clinical relevance, detailed data on menstrual characteristics is rarely available in structured medical records. To address this gap, we propose a novel Natural Language Processing pipeline to extract key menstrual cycle attributes -- dysmenorrhea, regularity, flow volume, and intermenstrual bleeding. Our approach utilizes the GatorTron model with Multi-Task Prompt-based Learning, enhanced by a hybrid retrieval preprocessing step to identify relevant text segments. It out- performs baseline methods, achieving an average F1-score of 90% across all menstrual characteristics, despite being trained on fewer than 100 annotated clinical notes. The retrieval step consistently improves performance across all approaches, allowing the model to focus on the most relevant segments of lengthy clinical notes. These results show that combining multi-task learning with retrieval improves generalization and performance across menstrual charac- teristics, advancing automated extraction from clinical notes and supporting women's health research.