摘要 Abstract
智能合约是在区块链上自主运行的小型程序,利用区块链作为其持久化存储。智能合约的主要平台是以太坊虚拟机(EVM)。在EVM智能合约中,一个具有重要应用价值的问题是仅通过已部署的智能合约代码来识别数据结构(在区块链状态中,即“存储”)。这个问题一直极具挑战性,常常被认为几乎无法令人满意地解决。(例如,最新的最先进的研究工具几乎无法恢复几乎所有复杂的结构,并且只能处理不到50%的合约。)主要的复杂性在于,主链上的主要数据结构(映射和数组)的位置是通过代码执行动态派生的。我们提出了精细的静态分析技术,以极高的准确性和完整性解决了链上数据结构的识别问题。我们的分析几乎普遍适用,并且能够恢复深度数据结构。与最先进的工具相比,我们的技术能够在98.6%的精度和至少92.6%的召回率下识别数据结构的具体类型,而该工具分别只有80.8%和68.2%的表现。值得注意的是,这种分析结果通常比编译器自身基于源代码生成的存储描述更加完整。
Smart contracts are small programs that run autonomously on the blockchain, using it as their persistent memory. The predominant platform for smart contracts is the Ethereum VM (EVM). In EVM smart contracts, a problem with significant applications is to identify data structures (in blockchain state, a.k.a. "storage"), given only the deployed smart contract code. The problem has been highly challenging and has often been considered nearly impossible to address satisfactorily. (For reference, the latest state-of-the-art research tool fails to recover nearly all complex data structures and scales to under 50% of contracts.) Much of the complication is that the main on-chain data structures (mappings and arrays) have their locations derived dynamically through code execution. We propose sophisticated static analysis techniques to solve the identification of on-chain data structures with extremely high fidelity and completeness. Our analysis scales nearly universally and recovers deep data structures. Our techniques are able to identify the exact types of data structures with 98.6% precision and at least 92.6% recall, compared to a state-of-the-art tool managing 80.8% and 68.2% respectively. Strikingly, the analysis is often more complete than the storage description that the compiler itself produces, with full access to the source code.