自发口语台湾官话声调实现的研究:基于语料库的调查与基于理论的计算建模
The realization of tones in spontaneous spoken Taiwan Mandarin: a corpus-based survey and theory-driven computational modeling
摘要 Abstract
越来越多的文献表明语义可以共同决定精细的语音细节。然而,语音实现与语义之间复杂的相互作用仍研究不足,尤其是在音高实现方面。本研究调查了台湾普通话自发语料库中双音节词的所有20种可能的声调组合的声调实现情况。我们利用广义可加混合模型(GAMs)将基频(f0)轮廓建模为一系列预测变量的函数,包括性别、声调环境、声调模式、语速、词位、双音节概率、说话者和词汇。在GAM分析中,词汇和词义成为基频轮廓的关键预测因素,其效应量超过了声调模式。对于数据集中的每个词例,我们通过将GPT-2大型语言模型应用于语料库中该词例的上下文,获得了上下文化的嵌入。我们证明,这些上下文化的嵌入可以在很大程度上预测词例的音高轮廓,这些嵌入近似于使用环境中特定词例的意义。我们的语料库研究表明,语境中的意义与语音实现之间的纠缠程度远超标准语言学理论的预测。
A growing body of literature has demonstrated that semantics can co-determine fine phonetic detail. However, the complex interplay between phonetic realization and semantics remains understudied, particularly in pitch realization. The current study investigates the tonal realization of Mandarin disyllabic words with all 20 possible combinations of two tones, as found in a corpus of Taiwan Mandarin spontaneous speech. We made use of Generalized Additive Mixed Models (GAMs) to model f0 contours as a function of a series of predictors, including gender, tonal context, tone pattern, speech rate, word position, bigram probability, speaker and word. In the GAM analysis, word and sense emerged as crucial predictors of f0 contours, with effect sizes that exceed those of tone pattern. For each word token in our dataset, we then obtained a contextualized embedding by applying the GPT-2 large language model to the context of that token in the corpus. We show that the pitch contours of word tokens can be predicted to a considerable extent from these contextualized embeddings, which approximate token-specific meanings in contexts of use. The results of our corpus study show that meaning in context and phonetic realization are far more entangled than standard linguistic theory predicts.