LHCb剥离项目:持续充分且高效地利用遗留数据

LHCb Stripping Project: Continuing to Fully and Efficiently Utilize Legacy Data

摘要 Abstract

LHCb合作组在运行3期间继续大量利用运行1和运行2的遗留数据集。随着操作重点从遗留数据转向实时运行3样本,确保分析师能够继续从遗留数据集中受益的可持续且高效的系统至关重要。LHCb剥离项目是面向用户的离线数据处理阶段,它通过可由Python配置的架构允许分析师仅需选择感兴趣的物理候选对象。在完成并验证物理选择后,整个遗留数据集将在称为剥离活动的小时间窗口内重新处理。LHCb的剥离活动以较短的开发窗口为特点,其中大量合作者(通常是初级研究人员)直接开发各种物理选择;最近的一次活动涉及超过900个物理选择。现代组织工具,如GitLab里程碑,被用于跟踪所有开发工作,并确保物理工作组的所有开发者都遵守严格的计划。此外,GitLab中还实施了持续集成,用于运行物理选择的功能测试,监控不同算法的监测率和运行时间,以确保操作一致性。除了这些大型活动外,该项目还进行夜间构建,以确保软件在其他地方进行并行开发时的可维护性。

The LHCb collaboration continues to heavily utilize the Run 1 and Run 2 legacy datasets well into Run 3. As the operational focus shifts from the legacy data to the live Run 3 samples, it is vital that a sustainable and efficient system is in place to allow analysts to continue to profit from the legacy datasets. The LHCb Stripping project is the user-facing offline data-processing stage that allows analysts to select their physics candidates of interest simply using a Python-configurable architecture. After physics selections have been made and validated, the full legacy datasets are then reprocessed in small time windows known as Stripping campaigns. Stripping campaigns at LHCb are characterized by a short development window with a large portion of collaborators, often junior researchers, directly developing a wide variety of physics selections; the most recent campaign dealt with over 900 physics selections. Modern organizational tools, such as GitLab Milestones, are used to track all of the developments and ensure the tight schedule is adhered to by all developers across the physics working groups. Additionally, continuous integration is implemented within GitLab to run functional tests of the physics selections, monitoring rates and timing of the different algorithms to ensure operational conformity. Outside of these large campaigns the project is also subject to nightly builds, ensuring the maintainability of the software when parallel developments are happening elsewhere.