平方族:超越常规概率模型的探索

Squared families: Searching beyond regular probability models

摘要 Abstract

我们引入平方族的概念,该族是由一个统计量的线性变换平方得到的概率密度函数族。尽管平方族具有奇异性,但这种奇异性可以轻松处理,使其成为正则模型。处理奇异性后,平方族具备许多便利的性质。其Fisher信息是Bregman生成器诱导的Hessian度规的保角变换,Bregman生成器即为归一化常数,并在族内定义了一个统计散度。归一化常数具有参数积分因子分解特性,这意味着只需计算一次参数无关的积分即可获得族中所有归一化常数,这与指数族不同。最后,平方族核是计算Fisher信息、统计散度和归一化常数所需的唯一积分。然后我们描述了平方族在更广泛的$g$-族中的特殊性,$g$-族是通过将足够规则的函数$g$应用于统计量的线性变换而得到的。去除特定奇异性后,仅正齐次族和指数族的Fisher信息是Hessian度规的保角变换,且生成器仅通过归一化常数依赖于参数。偶数阶单项式族也具有参数积分因子分解特性,而指数族不具备这一特性。我们研究了平方族中的参数估计和密度估计问题,在正确设定和错误设定下进行了探讨。利用通用逼近性质,我们证明平方族能够以速率$\mathcal{O}(N^{-1/2})+C n^{-1/4}$学习足够良好的目标密度函数,其中$N$是数据点数量,$n$是参数数量,$C$为某个常数。

We introduce squared families, which are families of probability densities obtained by squaring a linear transformation of a statistic. Squared families are singular, however their singularity can easily be handled so that they form regular models. After handling the singularity, squared families possess many convenient properties. Their Fisher information is a conformal transformation of the Hessian metric induced from a Bregman generator. The Bregman generator is the normalising constant, and yields a statistical divergence on the family. The normalising constant admits a helpful parameter-integral factorisation, meaning that only one parameter-independent integral needs to be computed for all normalising constants in the family, unlike in exponential families. Finally, the squared family kernel is the only integral that needs to be computed for the Fisher information, statistical divergence and normalising constant. We then describe how squared families are special in the broader class of $g$-families, which are obtained by applying a sufficiently regular function $g$ to a linear transformation of a statistic. After removing special singularities, positively homogeneous families and exponential families are the only $g$-families for which the Fisher information is a conformal transformation of the Hessian metric, where the generator depends on the parameter only through the normalising constant. Even-order monomial families also admit parameter-integral factorisations, unlike exponential families. We study parameter estimation and density estimation in squared families, in the well-specified and misspecified settings. We use a universal approximation property to show that squared families can learn sufficiently well-behaved target densities at a rate of $\mathcal{O}(N^{-1/2})+C n^{-1/4}$, where $N$ is the number of datapoints, $n$ is the number of parameters, and $C$ is some constant.

平方族:超越常规概率模型的探索 - arXiv