系统发育树集合上的共凸特征

Coconvex characters on collections of phylogenetic trees

摘要 Abstract

在系统发育学中,一个关键问题是从一组特征构建进化树,其中对于物种集合X,特征本质上是从X到状态集合的一个函数。在此背景下,一个关键概念是凸性,即当且仅当由具有相同状态的树叶所张成的子树对两两不相交时,特征在叶集为X的树上是凸的。尽管单棵树上的凸特征集合在过去几十年里得到了广泛研究,但关于共凸特征(即同时在多棵树上凸的特征)却知之甚少。作为理解共凸性的起点,本文针对以下问题证明了一系列极值结果:在所有大小为t ≥ 2的n叶树集合上,共凸特征的最小数量是多少?如果限制共凸特征映射到k个状态,情况又如何?作为共凸性的应用,我们引入了一种新的带有一参数族的树度量,该度量范围从粗粒度的Robinson-Foulds距离到更精细的四元组距离。我们表明上述问题中的量的界限可以转化为新度量下树空间直径的界限。我们的结果开启了多个新的有趣方向和问题,这些方向和问题在系统发育空间和系统基因组学等领域具有潜在的应用价值。

In phylogenetics, a key problem is to construct evolutionary trees from collections of characters where, for a set X of species, a character is simply a function from X onto a set of states. In this context, a key concept is convexity, where a character is convex on a tree with leaf set X if the collection of subtrees spanned by the leaves of the tree that have the same state are pairwise disjoint. Although collections of convex characters on a single tree have been extensively studied over the past few decades, very little is known about coconvex characters, that is, characters that are simultaneously convex on a collection of trees. As a starting point to better understand coconvexity, in this paper we prove a number of extremal results for the following question: What is the minimal number of coconvex characters on a collection of n-leaved trees taken over all collections of size t >= 2, also if we restrict to coconvex characters which map to k states? As an application of coconvexity, we introduce a new one-parameter family of tree metrics, which range between the coarse Robinson-Foulds distance and the much finer quartet distance. We show that bounds on the quantities in the above question translate into bounds for the diameter of the tree space for the new distances. Our results open up several new interesting directions and questions which have potential applications to, for example, tree spaces and phylogenomics.