Biased-Annotator Competence Estimation(BACE)模型应用于COVID-19疫苗Twitter数据的人类注释:潜在信息特征的人工标注
Adoption and implication of the Biased-Annotator Competence Estimation (BACE) model into COVID-19 vaccine Twitter data: Human annotation for latent message features
摘要 Abstract
传统的定量内容分析方法(人工编码法)存在不足,例如在训练过程中一旦达到信度阈值就假定所有人工编码员同样准确。我们应用了Biased-Annotator Competence Estimation(BACE)模型(Tyler,2021),该模型基于贝叶斯建模改进人工编码。该模型的一个重要贡献在于考虑每位编码员可能存在的偏见和可靠性,并将每条信息的“真实”标签视为潜在参数,具有可量化的估计不确定性。相比之下,在传统人工编码中,每条信息会获得固定标签,而没有测量不确定性的估计。本文首先总结了传统人工编码的不足之处;然后将BACE模型应用于COVID-19疫苗的Twitter数据,并与其他统计模型进行比较;最后讨论了如何利用BACE模型改进对潜在信息特征的人工编码。
Traditional quantitative content analysis approach (human coding method) has weaknesses, such as assuming all human coders are equally accurate once the intercoder reliability for training reaches a threshold score. We applied the Biased-Annotator Competence Estimation (BACE) model (Tyler, 2021), which draws on Bayesian modeling to improve human coding. An important contribution of this model is it takes each coder's potential biases and reliability into consideration and treats the "true" label of each message as a latent parameter, with quantifiable estimation uncertainties. In contrast, in conventional human coding, each message will receive a fixed label without estimates for measurement uncertainties. In this extended abstract, we first summarize the weaknesses of conventional human coding; and then apply the BACE model to COVID-19 vaccine Twitter data and compare BACE with other statistical models; finally, we discuss how the BACE model can be applied to improve human coding of latent message features.