PilotDB:具备先验误差保证的数据库无关在线近似查询处理技术报告

PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees (Technical Report)

摘要 Abstract

经过数十年对近似查询处理(AQP)的研究,其在工业中的应用仍然有限。现有方法难以同时提供用户指定的误差保证、消除维护开销并避免对数据库管理系统进行修改。为了解决这些挑战,我们引入了两种新颖的技术,即TAQA和BSAP。TAQA是一种两阶段的在线AQP算法,能够为任意查询实现这三个特性。然而,如果采用标准的行级采样,它可能会比精确查询慢。BSAP通过在TAQA中启用具有统计保证的块级采样解决了这一问题。我们将TAQA和BSAP集成到一个原型中间件系统PilotDB中,该系统兼容所有支持高效块级采样的DBMS。我们在PostgreSQL、SQL Server和DuckDB上评估了PilotDB在真实基准测试中的表现,结果表明在保证5%误差的情况下,运行速度最高可提升126倍。

After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. TAQA is a two-stage online AQP algorithm that achieves all three properties for arbitrary queries. However, it can be slower than exact queries if we use standard row-level sampling. BSAP resolves this by enabling block-level sampling with statistical guarantees in TAQA. We simple ment TAQA and BSAP in a prototype middleware system, PilotDB, that is compatible with all DBMSs supporting efficient block-level sampling. We evaluate PilotDB on PostgreSQL, SQL Server, and DuckDB over real-world benchmarks, demonstrating up to 126X speedups when running with a 5% guaranteed error.

PilotDB:具备先验误差保证的数据库无关在线近似查询处理技术报告 - arXiv