针对纵向数据反事实回归的因果动态变分自编码器

Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data

摘要 Abstract

在许多现实世界的应用中,如精准医学、流行病学、经济学和市场营销,估计随时间变化的治疗效果具有重要意义。许多最先进的方法要么假设所有混杂因素均可观测,要么试图推断未观测到的混杂因素。我们采取了一种不同的视角,即假设存在未观测的风险因素(即仅影响结果序列的调整变量)。在无混淆假设下,我们针对由于缺失风险因素导致的治疗反应异质性,目标是估计个体治疗效应(ITE)。我们解决了时变效应和未观测调整变量带来的挑战。基于学习的调整变量有效性的理论结果以及治疗效应的泛化界,我们提出了因果动态变分自编码器(CDVAE)。该模型结合了动态变分自编码器(DVAE)框架,并利用倾向评分进行加权策略以估计反事实响应。CDVAE模型能够准确估计个体治疗效应并捕捉纵向数据中的潜在异质性。我们的模型评估显示其性能优于最先进的模型。

Estimating treatment effects over time is relevant in many real-world applications, such as precision medicine, epidemiology, economy, and marketing. Many state-of-the-art methods either assume the observations of all confounders or seek to infer the unobserved ones. We take a different perspective by assuming unobserved risk factors, i.e., adjustment variables that affect only the sequence of outcomes. Under unconfoundedness, we target the Individual Treatment Effect (ITE) estimation with unobserved heterogeneity in the treatment response due to missing risk factors. We address the challenges posed by time-varying effects and unobserved adjustment variables. Led by theoretical results over the validity of the learned adjustment variables and generalization bounds over the treatment effect, we devise Causal DVAE (CDVAE). This model combines a Dynamic Variational Autoencoder (DVAE) framework with a weighting strategy using propensity scores to estimate counterfactual responses. The CDVAE model allows for accurate estimation of ITE and captures the underlying heterogeneity in longitudinal data. Evaluations of our model show superior performance over state-of-the-art models.