在普适性约束下的序贯异常假设检验

Research

arXiv

在普适性约束下的序贯异常假设检验

Sequential Outlier Hypothesis Testing under Universality Constraints

摘要 Abstract

我们重新审视了序贯异常假设检验问题，并在名义分布和异常分布均为未知的情况下推导出可实现指数的界。异常假设检验的任务是在所有观测序列中识别由异常分布生成的异常集，其余多数序列由名义分布生成。在序贯设置下，每单位时间内从每个序列中获取一个样本，直到可以做出可靠决策为止。对于恰好存在一个异常的情况，我们的指数界是紧的，为序贯检验提供了精确的大偏差特征，并改进了Li、Nitinawarat和Veeravalli（2017）的先前结果。特别地，我们的序贯检验在任意名义分布和异常分布对下平均样本数都具有普遍界，且我们的序贯检验实现了比固定长度检验更大的Bayes指数，这一点无法保证由Li、Nitinawarat和Veeravalli（2017）提出的序贯检验实现。对于最多存在一个异常的情况，我们提出了一种基于阈值的检验方法，该方法在较弱条件下具有有界期望停止时间，并且在每个非零假设和零假设下界定了错误指数。我们的序贯检验解决了Zhou、Wei和Hero（TIT 2022）提出的固定长度检验中的错误指数权衡问题。最后，为进一步应用于实际场景，我们将结果推广到多个异常情况，并表明当异常数量未知时，误差指数会受到惩罚。

We revisit sequential outlier hypothesis testing and derive bounds on achievable exponents when both the nominal and anomalous distributions are \emph{unknown}. The task of outlier hypothesis testing is to identify the set of outliers that are generated from an anomalous distribution among all observed sequences where the rest majority are generated from a nominal distribution. In the sequential setting, one obtains a sample from each sequence per unit time until a reliable decision could be made. For the case with exactly one outlier, our exponent bounds on are tight, providing exact large deviations characterization of sequential tests and strengthening a previous result of Li, Nitinawarat and Veeravalli (2017). In particular, the average sample size of our sequential test is bounded universally under any pair of nominal and anomalous distributions and our sequential test achieves larger Bayesian exponent than the fixed-length test, which could not be guaranteed by the sequential test of Li, Nitinawarat and Veeravalli (2017). For the case with at most one outlier, we propose a threshold-based test that has bounded expected stopping time under mild conditions and we bound the error exponents under each non-null and the null hypotheses. Our sequential test resolves the error exponents tradeoff for the fixed-length test of Zhou, Wei and Hero (TIT 2022). Finally, with a further step towards practical applications, we generalize our results to the cases of multiple outliers and show that there is a penalty in the error exponents when the number of outliers is unknown.