FACETS:通过约束迭代搜索实现高效的端到端目标检测

FACETS: Efficient Once-for-all Object Detection via Constrained Iterative Search

摘要 Abstract

深度学习目标检测框架中的神经网络架构搜索(NAS)通常涉及多个模块,每个模块执行不同的任务。这些模块共同导致了巨大的搜索空间,使得搜索可能需要耗费数个GPU时甚至数天的时间,具体取决于搜索空间的复杂程度。这使得联合优化既具有挑战性又计算成本高昂。此外,在各模块中满足目标设备的约束条件进一步增加了优化过程的复杂性。为了解决这些挑战,我们提出了FACETS(Efficient Once-for-All Object Detection via Constrained Iterative Search,通过约束迭代搜索实现高效的端到端目标检测),这是一种新颖的统一迭代NAS方法,以循环方式优化所有模块的架构。FACETS利用之前迭代的反馈信息,在固定一个模块架构的同时交替优化其他模块。这种方法在减少整体搜索空间的同时保留了模块间的相互依赖关系,并结合了基于目标设备计算预算的约束条件。在与渐进式和单模块搜索策略的对照实验中,FACETS实现了精度提升高达4.75%的架构,且在早期阶段的速度是渐进式搜索策略的两倍,同时仍能够达到全局最优解。此外,FACETS展示了迭代优化搜索空间的能力,随着时间推移生成性能更优的架构。经优化后的搜索空间产生的候选架构的平均精度比全局搜索高27%,比渐进式搜索方法高5%(通过随机采样获得)。

Neural Architecture Search (NAS) for deep learning object detection frameworks typically involves multiple modules, each performing distinct tasks. These modules contribute to a vast search space, resulting in searches that can take several GPU hours or even days, depending on the complexity of the search space. This makes joint optimization both challenging and computationally expensive. Furthermore, satisfying target device constraints across modules adds additional complexity to the optimization process. To address these challenges, we propose \textbf{FACETS}, e\textbf{\underline{F}}ficient Once-for-\textbf{\underline{A}}ll Object Detection via \textbf{\underline{C}}onstrained it\textbf{\underline{E}}ra\textbf{\underline{T}}ive\textbf{\underline{S}}earch, a novel unified iterative NAS method that refines the architecture of all modules in a cyclical manner. FACETS leverages feedback from previous iterations, alternating between fixing one module's architecture and optimizing the others. This approach reduces the overall search space while preserving interdependencies among modules and incorporates constraints based on the target device's computational budget. In a controlled comparison against progressive and single-module search strategies, FACETS achieves architectures with up to $4.75\%$ higher accuracy twice as fast as progressive search strategies in earlier stages, while still being able to achieve a global optimum. Moreover, FACETS demonstrates the ability to iteratively refine the search space, producing better performing architectures over time. The refined search space yields candidates with a mean accuracy up to $27\%$ higher than global search and $5\%$ higher than progressive search methods via random sampling.