摘要 Abstract
之前针对tRNA和5S rRNA的研究表明,通过修改多分支环熵惩罚函数中的参数,可以显著提高二级结构预测的准确性。然而,由于当时尚未完全理解的原因,这两种家族整体上的改进幅度远低于单独考虑每个家族时的水平。本文解决了这一矛盾,发现每种家族都有其特有的目标区域几何结构,这种几何结构不仅彼此不同,也与其自身的二核苷酸洗牌结构显著不同。这需要一种更高效的方法来从分支参数空间计算所需信息,并对区域几何结构进行新的理论描述。所获得的见解强烈表明应考虑由多环参数变化产生的多种可能的二级结构。我们提供了原理证明结果,表明这种方法显著提高了Archive II基准数据集中另外8个家族的预测准确性。
Prior results for tRNA and 5S rRNA demonstrated that secondary structure prediction accuracy can be significantly improved by modifying the parameters in the multibranch loop entropic penalty function. However, for reasons not well understood at the time, the scale of improvement possible across both families was well below the level for each family when considered separately. We resolve this dichotomy here by showing that each family has a characteristic target region geometry, which is distinct from the other and significantly different from their own dinucleotide shuffles. This required a much more efficient approach to computing the necessary information from the branching parameter space, and a new theoretical characterization of the region geometries. The insights gained point strongly to considering multiple possible secondary structures generated by varying the multiloop parameters. We provide proof-of-principle results that this significantly improves prediction accuracy across all 8 additional families in the Archive II benchmarking dataset.