使用合成爆发数据评估呼吸道疾病住院预测

Research

arXiv

使用合成爆发数据评估呼吸道疾病住院预测

Evaluation of respiratory disease hospitalisation forecasts using synthetic outbreak data

摘要 Abstract

传染病住院预测在流行病和大流行期间对医疗资源分配起着重要作用。大规模分析表明，在COVID-19疫情期间，模型预测的准确性分布具有异质性，集成预测的平均准确性最高。在此基础上，我们生成了包含324个不同住院时间序列的最大多样性合成数据集，对应不同的疾病特征和公共卫生响应。我们评估了14个分量模型和6种不同集成方法的预测结果。结果显示，分量模型的准确性具有异质性，并且取决于当前疾病的传播率。从7天到14天的预测中，机制模型相对于统计模型的相对准确性有所提高。一种新的自适应集成方法表现优于其他所有集成方法，但其后紧随的是中位数集成方法。我们还研究了集成误差与分量预测变异性的关系，发现变异系数可以预测未来的误差。最后，我们在瑞典的COVID-19疫情数据上验证了这些结果。我们的研究结果有望改进流行病预测，特别是在预测时根据分量预测的变异性为集成预测赋予置信度的能力。

Forecasts of hospitalisations of infectious diseases play an important role for allocating healthcare resources during epidemics and pandemics. Large-scale analysis of model forecasts during the COVID-19 pandemic has shown that the model rank distribution with respect to accuracy is heterogeneous and that ensemble forecasts have the highest average accuracy. Building on that work we generated a maximally diverse synthetic dataset of 324 different hospitalisation time-series that correspond to different disease characteristics and public health responses. We evaluated forecasts from 14 component models and 6 different ensembles. Our results show that component model accuracy was heterogeneous and varied depending on the current rate of disease transmission. Going from 7 day to 14 day forecasts mechanistic models improved in relative accuracy compared to statistical models. A novel adaptive ensemble method outperforms all other ensembles, but is closely followed by a median ensemble. We also investigated the relationship between ensemble error and variability of component forecasts and show that the coefficient of variation is predictive of future error. Lastly, we validated the results on data from the COVID-19 pandemic in Sweden. Our findings have the potential to improve epidemic forecasting, in particular the ability to assign confidence to ensemble forecasts at the time of prediction based on component forecast variability.