巴拿赫空间中无限方差与鞅相关下的均值估计

Mean Estimation in Banach Spaces Under Infinite Variance and Martingale Dependence

摘要 Abstract

我们考虑估计取值于巴拿赫空间的一组重尾随机变量序列的公共均值。特别是,我们重新审视并扩展了Catoni和Giulini首次提出的简单截断型均值估计方法。虽然现有的截断型方法需要观测值的原始(非中心)二阶矩有界,但我们的结果在满足某些$p\in(1,2]$的中心或非中心$p$阶矩有界时成立。因此,我们的分析能够处理具有无限方差的分布。论文的主要贡献来自于利用基于截断的均值估计与光滑巴拿赫空间中鞅集中之间的联系。我们证明了两类时间均匀的估计值与未知均值之间距离的界:线穿不等式,可以在固定样本大小$n$时进行优化;以及迭代对数不等式,在至多$n$的双对数因子范围内,与线穿不等式的紧度一致。我们的结果不依赖于巴拿赫空间的维数,适用于鞅相关情形,并且不等式中的所有常数均为已知且较小。

We consider estimating the shared mean of a sequence of heavy-tailed random variables taking values in a Banach space. In particular, we revisit and extend a simple truncation-based mean estimator first proposed by Catoni and Giulini. While existing truncation-based approaches require a bound on the raw (non-central) second moment of observations, our results hold under a bound on either the central or non-central $p$th moment for some $p \in (1,2]$. Our analysis thus handles distributions with infinite variance. The main contributions of the paper follow from exploiting connections between truncation-based mean estimation and the concentration of martingales in smooth Banach spaces. We prove two types of time-uniform bounds on the distance between the estimator and unknown mean: line-crossing inequalities, which can be optimized for a fixed sample size $n$, and iterated logarithm inequalities, which match the tightness of line-crossing inequalities at all points in time up to a doubly logarithmic factor in $n$. Our results do not depend on the dimension of the Banach space, hold under martingale dependence, and all constants in the inequalities are known and small.