自适应集成分层注意力机制(AILA)

Adaptive Integrated Layered Attention (AILA)

摘要 Abstract

我们提出了自适应集成分层注意力机制(AILA),这是一种结合密集跳跃连接与不同机制以实现跨网络层自适应特征重用的神经网络架构。我们在三个具有挑战性的任务上评估了AILA:多种商品及指数(标普500、黄金、美元期货、咖啡、小麦)的价格预测、使用CIFAR-10数据集进行图像识别以及IMDB电影评论数据集上的情感分析。在所有情况下,AILA均达到了强大的深度学习基线模型(LSTM、Transformer和ResNet)的表现水平,但其训练和推理时间仅为基线模型的一小部分。值得注意的是,我们实现了两种版本的模型——AILA架构1,采用简单的线性层作为层间连接机制;AILA架构2,实施了注意力机制以选择性地关注从前一层输出的信息。这两种架构均应用于单任务学习场景,每个模型分别针对具体任务进行独立训练。结果表明,AILA的自适应层间连接通过灵活重用多层网络中的相关信息特征,带来了稳健的性能提升。因此,AILA方法是对现有架构的一种扩展,能够提高长距离序列建模能力、优化计算速度的图像识别性能以及实际应用中的SOTA分类效果。

We propose Adaptive Integrated Layered Attention (AILA), a neural network architecture that combines dense skip connections with different mechanisms for adaptive feature reuse across network layers. We evaluate AILA on three challenging tasks: price forecasting for various commodities and indices (S&P 500, Gold, US dollar Futures, Coffee, Wheat), image recognition using the CIFAR-10 dataset, and sentiment analysis on the IMDB movie review dataset. In all cases, AILA matches strong deep learning baselines (LSTMs, Transformers, and ResNets), achieving it at a fraction of the training and inference time. Notably, we implement and test two versions of the model - AILA-Architecture 1, which uses simple linear layers as the connection mechanism between layers, and AILA-Architecture 2, which implements an attention mechanism to selectively focus on outputs from previous layers. Both architectures are applied in a single-task learning setting, with each model trained separately for individual tasks. Results confirm that AILA's adaptive inter-layer connections yield robust gains by flexibly reusing pertinent features at multiple network depths. The AILA approach thus presents an extension to existing architectures, improving long-range sequence modeling, image recognition with optimised computational speed, and SOTA classification performance in practice.

自适应集成分层注意力机制(AILA) - arXiv