广西师范大学学报(自然科学版) ›› 2021, Vol. 39 ›› Issue (2): 32-40.doi: 10.16088/j.issn.1001-6600.2020090704

• CCIR2020 • 上一篇    下一篇

基于注意力机制的图像分类降维方法

邓文轩, 杨航, 靳婷*   

  1. 海南大学 计算机与网络空间安全学院, 海南 海口 570228
  • 收稿日期:2020-09-07 修回日期:2020-09-30 出版日期:2021-03-25 发布日期:2021-04-15
  • 通讯作者: 靳婷(1982—),女,天津人,海南大学副教授,博士。E-mail:tingj@fudan.edu.cn
  • 基金资助:
    靳婷(1982—),女,天津人,海南大学副教授,博士。E-mail:tingj@fudan.edu.cn

A Dimensionality-reduction Method Based on Attention Mechanismon Image Classification

DENG Wenxuan, YANG Hang, JIN Ting*   

  1. School of Computer Science and Cyberspace Security, Hainan University, Haikou Hainan 570228, China
  • Received:2020-09-07 Revised:2020-09-30 Online:2021-03-25 Published:2021-04-15

摘要: 卷积算子是卷积神经网络的核心构造块,它根据一定的感受视野,融合卷积神经网络各层与不同通道之间的信息,提取出原始图像特征。然而图像中的相邻像素往往具有相似的值,导致卷积层的输出包含大量冗余信息。为了减少冗余信息,加快模型推理速度,神经网络中会加入池化层进行信息降维。对比传统降维方法,池化本身具有平移和旋转不变性,对图像特征的降维效果更好,并能维持模型是端到端的。利用这样的特性,本文提出一种基于注意力机制的降维方法。在特征提取过程中非线性地复用神经网络各层降维后的特征信息,使网络能学习到它们之间的潜在联系,另外,在降维时优先关注图像中目标的主要纹理,并结合该目标的弱纹理信息进行融合,能得到降维后的特征信息。基于DLA-34(deep layer aggregation)神经网络,将本文提出的降维方法与基于最大值、基于均值等池化方法在CIFAR10与CIFAR100数据集上设计多组对比实验,证明该方法的有效性。

关键词: 深度学习, 图像分类, 卷积神经网络, 残差网络, 注意力机制

Abstract: The convolution operators are the core building blocks of convolutional neural network, which enable the network to fuse the information of various layers of space and channels according to a certain perception field of view, and extract the characteristics of the information. However, adjacent pixels often have similar values in an image, which results in a large amount of redundant information in the output of the convolutional layer. In order to reduce redundant information and speed up model inference, many pooling layers are added to the convolutional neural network for reducing information dimensionality. Pooling has better dimensionality reduction effect on image features with the invariance of translation and rotation. And end-to-end model can be maintained compared with traditional dimensionality reduction methods. Therefore, a dimensionality reduction method is proposed based on the attention mechanism by using the pooling layer. In the process of feature extraction, the dimensionality reduction information from each layer’s are reused nonlinearly, so that the potential connections of information in different layers after dimensionality reduction can be learned. In order to obtain the characteristics of the input information, the proposed method focuses on the main texture of the target in the image, and then the low texture and background information of the target are combined. Based on the DLA-34 (deep layer aggregation) neural network, the dimensionality reduction method proposed in this paper and the others dimensionality reduction methods based on the maximum value and the average value are compared to deal with multiple sets on the CIFAR10 and CIFAR100 datasets, which proves the effectiveness of the new method.

Key words: deep learning, image classification, convolutional neural network, residual network, attentionmechanism

中图分类号: 

  • TP391.4
[1] RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.DOI: 10.1007/s11263-015-0816-y.
[2] 崔雍浩,商聪,陈锶奇,等.人工智能综述:AI的发展[J].无线电通信技术,2019,45(3):225-231.DOI: 10.3969/j.issn.1003-3114.2019.03.01.
[3] 郭丽丽,丁世飞.深度学习研究进展[J].计算机科学,2015,42(5):28-33.DOI: 10.11896/j.issn.1002-137X.2015.05.006.
[4] 毛其超,贾瑞生,左羚群,等.基于深度学习的交通监控视频车辆检测算法[J].计算机应用与软件,2020,37(9):111-117,164.DOI: 10.3969/j.issn.1000-386x.2020.09.019.
[5] 周玲,胡月,刘红,等.循证医学和深度学习在社区新型冠状病毒肺炎疫情防控管理中的应用[J].护理研究,2020,34(6):947-949.DOI: 10.12102/j.issn.1009-6493.2020.06.035.
[6] 陈德鑫,占袁圆,杨兵.深度学习技术在教育大数据挖掘领域的应用分析[J].电化教育研究,2019,40(2):68-76.DOI: 10.13811/j.cnki.eer.2019.02.009.
[7] 张晓海,操新文.基于深度学习的军事辅助决策研究[J].火力与指挥控制,2020,45(3):1-6.DOI: 10.3969/j.issn.1002-0640.2020.03.001.
[8] 王青松,赵西安,马超.特征提取和特征匹配改进方法的研究[J].测绘科学技术学报,2014,31(4):377-382.DOI: 10.3969/j.issn.1673-6338.2014.04.011.
[9] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Advances in neural information processing systems 25:26th Annual Conference on Neural Information Processing Systems 2012.La Jolla,CA:Neural Information Processing Systems,2012:1097-1105.
[10] 王伟男,杨朝红.基于图像处理技术的目标识别方法综述[J].电脑与信息技术,2019,27(6):9-15.DOI: 10.19414/j.cnki.1005-1228.2019.06.003.
[11] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2014-09-04)[2020-02-07].https://arxiv.org/pdf/1409.1556.pdf.
[12] SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2015:1-9.DOI: 10.1109/CVPR.2015.7298594.
[13] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2016:2818-2826.DOI: 10.1109/cvpr.2016.308.
[14] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2016:770-778.DOI: 10.1109/CVPR.2016.90.
[15] XIE S N,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision And Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2017:5987-5995.DOI: 10.1109/CVPR.2017.634.
[16] HUANG G,LIU Z,Van Der MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2017:2261-2269.DOI: 10.1109/CVPR.2017.243.
[17] ELFIKY N M,KHAN F S,van de WEIJER J,et al.Discriminative compact pyramids for object and scene recognition[J].Pattern Recognition,2012,45(4):1627-1636.DOI: 10.1016/j.patcog.2011.09.020.
[18] KE Y,SUKTHANKAR R.PCA-SIFT:a more distinctive representation for local image descriptors[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition:Volume II.Los Alamitos,CA:IEEE Computer Society,2004:506-513.DOI: 10.1109/CVPR.2004.1315206.
[19] HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2018:7132-7141.DOI: 10.1109/cvpr.2018.00745.
[20] YU F,WANG D Q,SHELHAMER E,et al.Deep layer aggregation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2018:2403-2412.DOI: 10.1109/cvpr.2018.00255.
[21] RONNEBERGER O,FISCHER P,BROX T.U-net:Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015.Berlin:Springer,2015:234-241.DOI: 10.1007/978-3-319-24574-4_28.
[22] LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2017:936-944.DOI: 10.1109/CVPR.2017.106.
[23] HOWARD A G,ZHU M L,CHEN B,et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].(2017-04-17)[2020-02-07].https://arxiv.org/pdf/1704.04861.pdf.
[24] HE T,ZHANG Z,ZHANG H,et al.Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2019:558-567.DOI: 10.1109/cvpr.2019.00065.
[25] ZHANG X Y,ZHOU X Y,LIN M X,et al.ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,CA:IEEE Computer Society,2018:6848-6856.DOI: 10.1109/cvpr.2018.00716.
[1] 杨州, 范意兴, 朱小飞, 郭嘉丰, 王越. 神经信息检索模型建模因素综述[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 1-12.
[2] 薛涛, 丘森辉, 陆豪, 秦兴盛. 基于经验模态分解和多分支LSTM网络汇率预测[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 41-50.
[3] 唐熔钗, 伍锡如. 基于改进YOLO-V3网络的百香果实时检测[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 32-39.
[4] 张明宇, 赵猛, 蔡夫鸿, 梁钰, 王鑫红. 基于深度学习的波浪能发电功率预测[J]. 广西师范大学学报(自然科学版), 2020, 38(3): 25-32.
[5] 李维勇, 柳斌, 张伟, 陈云芳. 一种基于深度学习的中文生成式自动摘要方法[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 51-63.
[6] 严浩, 许洪波, 沈英汉, 程学旗. 开放式中文事件检测研究[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 64-71.
[7] 刘英璇, 伍锡如, 雪刚刚. 基于深度学习的道路交通标志多目标实时检测[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 96-106.
[8] 王健, 郑七凡, 李超, 石晶. 基于ENCODER_ATT机制的远程监督关系抽取[J]. 广西师范大学学报(自然科学版), 2019, 37(4): 53-60.
[9] 范瑞,蒋品群,曾上游,夏海英,廖志贤,李鹏. 多尺度并行融合的轻量级卷积神经网络设计[J]. 广西师范大学学报(自然科学版), 2019, 37(3): 50-59.
[10] 张金磊, 罗玉玲, 付强. 基于门控循环单元神经网络的金融时间序列预测[J]. 广西师范大学学报(自然科学版), 2019, 37(2): 82-89.
[11] 黄丽明, 陈维政, 闫宏飞, 陈翀. 基于循环神经网络和深度学习的股票预测方法[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 13-22.
[12] 武文雅, 陈钰枫, 徐金安, 张玉洁. 基于高层语义注意力机制的中文实体关系抽取[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 32-41.
[13] 岳天驰, 张绍武, 杨亮, 林鸿飞, 于凯. 基于两阶段注意力机制的立场检测方法[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 42-49.
[14] 余传明, 李浩男, 安璐. 基于多任务深度学习的文本情感原因分析[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 50-61.
[15] 王祺, 邱家辉, 阮彤, 高大启, 高炬. 基于循环胶囊网络的临床语义关系识别研究[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 80-88.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 胡锦铭, 韦笃取. 分数阶永磁同步电机的广义同步研究[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 14 -20 .
[2] 朱勇建, 罗坚, 秦运柏, 秦国峰, 唐楚柳. 基于光度立体和级数展开法的金属表面缺陷检测方法[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 21 -31 .
[3] 杨丽婷, 刘学聪, 范鹏来, 周岐海. 中国非人灵长类声音通讯研究进展[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 1 -9 .
[4] 宾石玉, 廖芳, 杜雪松, 许艺兰, 王鑫, 武霞, 林勇. 罗非鱼耐寒性能研究进展[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 10 -16 .
[5] 刘静, 边迅. 直翅目昆虫线粒体基因组的特征及应用[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 17 -28 .
[6] 李兴康, 钟恩主, 崔春艳, 周佳, 李小平, 管振华. 西黑冠长臂猿滇西亚种鸣叫行为监测[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 29 -37 .
[7] 和鑫明, 夏万才, 巴桑, 龙晓斌, 赖建东, 杨婵, 王凡, 黎大勇. 滇金丝猴主雄应对配偶雌性数量的理毛策略[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 38 -44 .
[8] 付文, 任宝平, 林建忠, 栾科, 王朋程, 王宾, 黎大勇, 周岐海. 济源太行山猕猴种群数量和保护现状[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 45 -52 .
[9] 郑景金, 梁霁鹏, 张克处, 黄爱面, 陆倩, 李友邦, 黄中豪. 基于木本植物优势度的白头叶猴食物选择研究[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 53 -64 .
[10] 杨婵, 万雅琼, 黄小富, 袁旭东, 周洪艳, 方浩存, 黎大勇, 李佳琦. 基于红外相机技术的小麂(Muntiacus reevesi)活动节律[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 65 -70 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发