基于注意力机制的图像分类降维方法

doi:10.16088/j.issn.1001-6600.2020090704

摘要/Abstract

摘要： 卷积算子是卷积神经网络的核心构造块,它根据一定的感受视野,融合卷积神经网络各层与不同通道之间的信息,提取出原始图像特征。然而图像中的相邻像素往往具有相似的值,导致卷积层的输出包含大量冗余信息。为了减少冗余信息,加快模型推理速度,神经网络中会加入池化层进行信息降维。对比传统降维方法,池化本身具有平移和旋转不变性,对图像特征的降维效果更好,并能维持模型是端到端的。利用这样的特性,本文提出一种基于注意力机制的降维方法。在特征提取过程中非线性地复用神经网络各层降维后的特征信息,使网络能学习到它们之间的潜在联系,另外,在降维时优先关注图像中目标的主要纹理,并结合该目标的弱纹理信息进行融合,能得到降维后的特征信息。基于DLA-34(deep layer aggregation)神经网络,将本文提出的降维方法与基于最大值、基于均值等池化方法在CIFAR10与CIFAR100数据集上设计多组对比实验,证明该方法的有效性。

关键词: 深度学习, 图像分类, 卷积神经网络, 残差网络, 注意力机制

Abstract: The convolution operators are the core building blocks of convolutional neural network, which enable the network to fuse the information of various layers of space and channels according to a certain perception field of view, and extract the characteristics of the information. However, adjacent pixels often have similar values in an image, which results in a large amount of redundant information in the output of the convolutional layer. In order to reduce redundant information and speed up model inference, many pooling layers are added to the convolutional neural network for reducing information dimensionality. Pooling has better dimensionality reduction effect on image features with the invariance of translation and rotation. And end-to-end model can be maintained compared with traditional dimensionality reduction methods. Therefore, a dimensionality reduction method is proposed based on the attention mechanism by using the pooling layer. In the process of feature extraction, the dimensionality reduction information from each layer’s are reused nonlinearly, so that the potential connections of information in different layers after dimensionality reduction can be learned. In order to obtain the characteristics of the input information, the proposed method focuses on the main texture of the target in the image, and then the low texture and background information of the target are combined. Based on the DLA-34 (deep layer aggregation) neural network, the dimensionality reduction method proposed in this paper and the others dimensionality reduction methods based on the maximum value and the average value are compared to deal with multiple sets on the CIFAR10 and CIFAR100 datasets, which proves the effectiveness of the new method.

Key words: deep learning, image classification, convolutional neural network, residual network, attentionmechanism

中图分类号:

TP391.4

邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报（自然科学版）, 2021, 39(2): 32-40.

DENG Wenxuan, YANG Hang, JIN Ting. A Dimensionality-reduction Method Based on Attention Mechanismon Image Classification[J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 32-40.

参考文献

[1] RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.DOI: 10.1007/s11263-015-0816-y.
[2] 崔雍浩,商聪,陈锶奇,等.人工智能综述:AI的发展[J].无线电通信技术,2019,45(3):225-231.DOI: 10.3969/j.issn.1003-3114.2019.03.01.
[3] 郭丽丽,丁世飞.深度学习研究进展[J].计算机科学,2015,42(5):28-33.DOI: 10.11896/j.issn.1002-137X.2015.05.006.
[4] 毛其超,贾瑞生,左羚群,等.基于深度学习的交通监控视频车辆检测算法[J].计算机应用与软件,2020,37(9):111-117,164.DOI: 10.3969/j.issn.1000-386x.2020.09.019.
[5] 周玲,胡月,刘红,等.循证医学和深度学习在社区新型冠状病毒肺炎疫情防控管理中的应用[J].护理研究,2020,34(6):947-949.DOI: 10.12102/j.issn.1009-6493.2020.06.035.
[6] 陈德鑫,占袁圆,杨兵.深度学习技术在教育大数据挖掘领域的应用分析[J].电化教育研究,2019,40(2):68-76.DOI: 10.13811/j.cnki.eer.2019.02.009.
[7] 张晓海,操新文.基于深度学习的军事辅助决策研究[J].火力与指挥控制,2020,45(3):1-6.DOI: 10.3969/j.issn.1002-0640.2020.03.001.
[8] 王青松,赵西安,马超.特征提取和特征匹配改进方法的研究[J].测绘科学技术学报,2014,31(4):377-382.DOI: 10.3969/j.issn.1673-6338.2014.04.011.
[9] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Advances in neural information processing systems 25:26th Annual Conference on Neural Information Processing Systems 2012.La Jolla,CA:Neural Information Processing Systems,2012:1097-1105.
[10] 王伟男,杨朝红.基于图像处理技术的目标识别方法综述[J].电脑与信息技术,2019,27(6):9-15.DOI: 10.19414/j.cnki.1005-1228.2019.06.003.
[11] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2014-09-04)[2020-02-07].https://arxiv.org/pdf/1409.1556.pdf.
[12] SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2015:1-9.DOI: 10.1109/CVPR.2015.7298594.
[13] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2016:2818-2826.DOI: 10.1109/cvpr.2016.308.
[14] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2016:770-778.DOI: 10.1109/CVPR.2016.90.
[15] XIE S N,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision And Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2017:5987-5995.DOI: 10.1109/CVPR.2017.634.
[16] HUANG G,LIU Z,Van Der MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2017:2261-2269.DOI: 10.1109/CVPR.2017.243.
[17] ELFIKY N M,KHAN F S,van de WEIJER J,et al.Discriminative compact pyramids for object and scene recognition[J].Pattern Recognition,2012,45(4):1627-1636.DOI: 10.1016/j.patcog.2011.09.020.
[18] KE Y,SUKTHANKAR R.PCA-SIFT:a more distinctive representation for local image descriptors[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition:Volume II.Los Alamitos,CA:IEEE Computer Society,2004:506-513.DOI: 10.1109/CVPR.2004.1315206.
[19] HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2018:7132-7141.DOI: 10.1109/cvpr.2018.00745.
[20] YU F,WANG D Q,SHELHAMER E,et al.Deep layer aggregation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2018:2403-2412.DOI: 10.1109/cvpr.2018.00255.
[21] RONNEBERGER O,FISCHER P,BROX T.U-net:Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015.Berlin:Springer,2015:234-241.DOI: 10.1007/978-3-319-24574-4_28.
[22] LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2017:936-944.DOI: 10.1109/CVPR.2017.106.
[23] HOWARD A G,ZHU M L,CHEN B,et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].(2017-04-17)[2020-02-07].https://arxiv.org/pdf/1704.04861.pdf.
[24] HE T,ZHANG Z,ZHANG H,et al.Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2019:558-567.DOI: 10.1109/cvpr.2019.00065.
[25] ZHANG X Y,ZHOU X Y,LIN M X,et al.ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2018:6848-6856.DOI: 10.1109/cvpr.2018.00716.

Metrics

Viewed

Full text

519

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	519

From	Others	local

Times	94	425
Rate	18%	82%

Abstract

355

Just accepted	Online first	Issue

0	0	355

From	Others	local

Times	335	20
Rate	94%	6%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed