弱人工智能安全监管的反效果

The Backfiring Effect of Weak AI Safety Regulation

摘要 Abstract

近期政策提案旨在提升通用人工智能的安全性,但对不同监管方法在人工智能安全方面的有效性缺乏理解。我们提出一个战略模型,探讨监管者、通用人工智能技术创造者以及领域专家(即那些将人工智能应用于特定领域的人员)之间的互动。我们的分析考察了针对开发链不同部分的不同监管措施如何影响开发过程的结果。具体而言,我们假设人工智能技术由两个关键属性描述:安全性与性能。首先,监管者设定适用于一个或两个参与者的最低安全标准,并对不合规行为施加严厉惩罚。然后,通用技术创造者开发技术,确定其初始安全性和性能水平。接下来,领域专家针对其特定应用场景优化人工智能,由此产生的收益通过事前协商过程在专家和通用技术创造者之间进行分配。我们对该博弈的分析揭示了两个关键见解:第一,仅针对领域专家实施的弱安全监管可能适得其反。虽然似乎合理地监管应用场景(而非通用技术),但我们的分析表明,单独针对领域专家的弱监管可能会无意中降低安全性。这一效应在广泛的设置下持续存在。第二,与上述发现形成鲜明对比的是,我们观察到更强、更恰当的监管实际上可以惠及所有受其约束的参与者。当监管者对人工智能创造者和领域专家均施加适当的安全标准时,该监管作为一种承诺机制,能够带来安全性和性能上的提升,超越无监管或仅监管单一参与者的成果。

Recent policy proposals aim to improve the safety of general-purpose AI, but there is little understanding of the efficacy of different regulatory approaches to AI safety. We present a strategic model that explores the interactions between the regulator, the general-purpose AI technology creators, and domain specialists--those who adapt the AI for specific applications. Our analysis examines how different regulatory measures, targeting different parts of the development chain, affect the outcome of the development process. In particular, we assume AI technology is described by two key attributes: safety and performance. The regulator first sets a minimum safety standard that applies to one or both players, with strict penalties for non-compliance. The general-purpose creator then develops the technology, establishing its initial safety and performance levels. Next, domain specialists refine the AI for their specific use cases, and the resulting revenue is distributed between the specialist and generalist through an ex-ante bargaining process. Our analysis of this game reveals two key insights: First, weak safety regulation imposed only on the domain specialists can backfire. While it might seem logical to regulate use cases (as opposed to the general-purpose technology), our analysis shows that weak regulations targeting domain specialists alone can unintentionally reduce safety. This effect persists across a wide range of settings. Second, in sharp contrast to the previous finding, we observe that stronger, well-placed regulation can in fact benefit all players subjected to it. When regulators impose appropriate safety standards on both AI creators and domain specialists, the regulation functions as a commitment mechanism, leading to safety and performance gains, surpassing what is achieved under no regulation or regulating one player only.

弱人工智能安全监管的反效果 - arXiv