随机SketchRefine: 将不确定条件下的数据库内决策扩展至数百万元组

Research

arXiv

Stochastic SketchRefine: Scaling In-Database Decision-Making under Uncertainty to Millions of Tuples

Anh L. Mai ,

摘要 Abstract

在不确定性条件下进行决策通常需要选择包（即元组的集合），这些包在整体上优化期望结果并限制风险。处理随机包查询（SPQs）涉及在不确定数据上解决非常大的优化问题。蒙特卡洛方法生成大量的场景（即所有元组的随机属性的样本实现），并在这些场景中生成具有最优目标值的包。为了获得准确的近似解（即使用先前方法时优化问题的规模），所需的场景数量随着数据方差的增加而增加，并且优化问题的搜索空间随关系中元组数量的增加呈指数增长。现有的求解器在处理包含高方差随机属性的大关系上的SPQs时需要花费数小时。除了丰富SPaQL语言以捕捉更广泛的风脸规范外，我们针对可扩展的SPQ处理提出了两项根本性的贡献。首先，为了解决高方差问题，我们提出风险约束线性化（RCL），它将SPQs转换为整数线性规划（ILPs），其规模独立于所使用的场景数量。求解这些ILPs可以得到可行且接近最优的包。其次，我们提出了随机SketchRefine，这是一种分而治之的框架，将一个大规模的随机优化问题分解为涉及较小元组子集的子问题。我们的实验表明，结合RCL和随机SketchRefine可以在比现有技术快几个数量级的运行时间内产生高质量的包。

Decision making under uncertainty often requires choosing packages, or bags of tuples, that collectively optimize expected outcomes while limiting risks. Processing Stochastic Package Queries (SPQs) involves solving very large optimization problems on uncertain data. Monte Carlo methods create numerous scenarios, or sample realizations of the stochastic attributes of all the tuples, and generate packages with optimal objective values across these scenarios. The number of scenarios needed for accurate approximation - and hence the size of the optimization problem when using prior methods - increases with variance in the data, and the search space of the optimization problem increases exponentially with the number of tuples in the relation. Existing solvers take hours to process SPQs on large relations containing stochastic attributes with high variance. Besides enriching the SPaQL language to capture a broader class of risk specifications, we make two fundamental contributions towards scalable SPQ processing. First, to handle high variance, we propose risk-constraint linearization (RCL), which converts SPQs into Integer Linear Programs (ILPs) whose size is independent of the number of scenarios used. Solving these ILPs gives us feasible and near-optimal packages. Second, we propose Stochastic SketchRefine, a divide and conquer framework that breaks down a large stochastic optimization problem into subproblems involving smaller subsets of tuples. Our experiments show that, together, RCL and Stochastic SketchRefine produce high-quality packages in orders of magnitude lower runtime than the state of the art.