Journal of Guangxi Normal University(Natural Science Edition) ›› 2025, Vol. 43 ›› Issue (4): 69-82.doi: 10.16088/j.issn.1001-6600.2024102502

• Intelligence Information Processing • Previous Articles     Next Articles

Fine-grained Image Classification Combining Adaptive Spatial Mutual Attention and Feature Pair Integration Discrimination

LI Zhixin1,2*, KUANG Wenlan1,2   

  1. 1. Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education (Guangxi Normal University), Guilin Guangxi 541004, China;
    2. Guangxi Key Lab of Multi-source Information Mining and Security (Guangxi Normal University), Guilin Guangxi 541004, China
  • Received:2024-10-25 Revised:2025-01-15 Online:2025-07-05 Published:2025-07-14

Abstract: Due to the characteristics of small inter-class differences and large intra-class distinctions in fine-grained image, many studies have utilized Vision Transformer to mine critical region features to improve the accuracy of fine-grained image classification. However, there still exists two major problems: firstly, background regions are also considered when the network mines critical classification cues, bringing additional noise interference to the model. Secondly, there is a lack of spatial connection between the local embeddings feature of the input images, and the model lacks the ability of object structure cognition, leading to inaccurate extracted category features. To address these problem, this paper proposes two modules: adaptive spatial mutual attention module and feature pair integrated discrimination module, which first learns the mutual attention weights of different embedding layers to select better discriminative regions through mutual attention spatial adaptive module, and adaptively learns the neighbor relationship of different regions through graph convolutional network. Then the feature pair integration discrimination module is utilized to treat the cue interactions between image pairs and reduce the confusion between fine-grained images. The final prediction results are derived under the token feature enhancement strategy. The proposed method achieves an accuracy of 92.5%, 93.3% and 91.8% on three benchmark datasets, namely, CUB-200-2011, Stanford Dogs and NABirds, which are better than many other existing state-of-the-art methods.

Key words: fine-grained image classification, adaptive spatial mutual attention, feature pair integration discrimination, graph convolutional network, token feature enhancement

CLC Number:  TP391.41
[1] 李志欣, 张佳, 吴璟莉, 等. 基于半监督对抗学习的图像语义分割[J]. 中国图象图形学报, 2022, 27(7): 2157-2170. DOI: 10.11834/jig.200600.
[2] 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722. DOI: 10.11834/jig.220069.
[3] 刘颖, 庞羽良, 张伟东, 等. 基于主动学习的图像分类技术: 现状与未来[J]. 电子学报, 2023, 51(10): 2960-2984. DOI: 10.12263/DZXB.20230397.
[4] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03)[2024-10-25]. https://arxiv.org/abs/2010.11929. DOI: 10.48550/arXiv.2010.11929.
[5] 李志欣, 侯传文, 谢秀敏. 融合多重实例关系的无监督跨模态哈希检索[J]. 软件学报, 2023, 34(11): 4973-4988. DOI: 10.13328/j.cnki.jos.006742.
[6] 卓亚琦, 魏家辉, 李志欣. 基于双注意模型的图像描述生成方法研究[J]. 电子学报, 2022, 50(5): 1123-1130. DOI: 10.12263/DZXB.20210696.
[7] 李志欣, 苏强. 基于知识辅助的图像描述生成[J]. 广西师范大学学报(自然科学版), 2022,40(5): 418-432. DOI: 10.16088/j.issn.1001-6600.2022013101.
[8] 项剑文, 陈泯融, 杨百冰. 结合Swin及多尺度特征融合的细粒度图像分类[J]. 计算机工程与应用, 2023, 59(20): 147-157. DOI: 10.3778/j.issn.1002-8331.2211-0456.
[9] HU Y Q, JIN X, ZHANG Y, et al. RAMS-trans: recurrent attention multi-scale transformer for fine-grained image recognition[C]// Proceedings of the 29th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2021: 4239-4248. DOI: 10.1145/3474085.3475561.
[10] XU Q, WANG J H, JIANG B, et al. Fine-grained visual classification via internal ensemble learning transformer[J]. IEEE Transactions on Multimedia, 2023, 25: 9015-9028. DOI: 10.1109/TMM.2023.3244340.
[11] KE X, CAI Y H, CHEN B T, et al. Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification[J]. Pattern Recognition, 2023, 137: 109305. DOI: 10.1016/j.patcog.2023.109305.
[12] ZHENG S J, WANG G C, YUAN Y J, et al. Fine-grained image classification based on TinyVit object location and graph convolution network[J]. Journal of Visual Communication and Image Representation, 2024, 100: 104120. DOI: 10.1016/j.jvcir.2024.104120.
[13] XIE J J, ZHONG Y J, ZHANG J G, et al. A weakly supervised spatial group attention network for fine-grained visual recognition[J]. Applied Intelligence, 2023, 53(20): 23301-23315. DOI: 10.1007/s10489-023-04627-z.
[14] 贺小箭, 林金福. 融合弱监督目标定位的细粒度小样本学习[J]. 中国图象图形学报, 2022, 27(7): 2226-2239. DOI: 10.11834/jig.200849.
[15] 黄程, 曾志高, 朱文球, 等. 基于弱监督多注意融合网络的细粒度图像识别[J]. 现代信息科技, 2022, 6(21): 78-82, 87. DOI: 10.19850/j.cnki.2096-4706.2022.21.019.
[16] GAO Y, HAN X T, WANG X, et al. Channel interaction networks for fine-grained image categorization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 10818-10825. DOI: 10.1609/aaai.v34i07.6712.
[17] ZHU Q X, KUANG W L, LI Z X. A collaborative gated attention network for fine-grained visual classification[J]. Displays, 2023, 79: 102468. DOI: 10.1016/j.displa.2023.102468.
[18] WANG Q, WANG J J, DENG H Y, et al. AA-trans: core attention aggregating transformer with information entropy selector for fine-grained visual classification[J]. Pattern Recognition, 2023, 140: 109547. DOI: 10.1016/j.patcog.2023.109547.
[19] XU Y, WU S S, WANG B Q, et al. Two-stage fine-grained image classification model based on multi-granularity feature fusion[J]. Pattern Recognition, 2024, 146: 110042. DOI: 10.1016/j.patcog.2023.110042.
[20] 王梓祺, 李阳, 张睿, 等. 小样本SAR图像分类方法综述[J]. 中国图象图形学报, 2024, 29(7): 1902-1920. DOI: 10.11834/jig.230359.
[21] 杨传广, 陈路明, 赵二虎, 等. 基于图表征知识蒸馏的图像分类方法[J]. 电子学报, 2024, 52(10): 3435-3447. DOI: 10.12263/DZXB.20230976.
[22] 宋燕, 王勇. 多阶段注意力胶囊网络的图像分类[J]. 自动化学报, 2024, 50(9): 1804-1817. DOI: 10.16383/j.aas.c210012.
[23] WU H P, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 22-31. DOI: 10.1109/ICCV48922.2021.00009.
[24] SHAO R, BI X J, CHEN Z. Hybrid ViT-CNN network for fine-grained image classification[J]. IEEE Signal Processing Letters, 2024, 31: 1109-1113. DOI: 10.1109/LSP.2024.3386112.
[25] HE J, CHEN J N, LIU S, et al. TransFG: a transformer architecture for fine-grained recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(1): 852-860. DOI: 10.1609/aaai.v36i1.19967.
[26] DU R Y, XIE J Y, MA Z Y, et al. Progressive learning of category-consistent multi-granularity features for fine-grained visual classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022,44(12): 9521-9535. DOI: 10.1109/TPAMI.2021.3126668.
[27] CHEN T H, LI Y Y, QIAO Q H. Fine-grained bird image classification based on counterfactual method of vision transformer model[J]. The Journal of Supercomputing, 2024, 80(5): 6221-6239. DOI: 10.1007/s11227-023-05701-6.
[28] JI R Y, LI J Y, ZHANG L B, et al. Dual transformer with multi-grained assembly for fine-grained visual classification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(9): 5009-5021. DOI: 10.1109/TCSVT.2023.3248791.
[29] CHEN H Z, ZHANG H M, LIU C, et al. FET-FGVC: feature-enhanced transformer for fine-grained visual classification[J]. Pattern Recognition, 2024, 149: 110265. DOI: 10.1016/j.patcog.2024.110265.
[30] SERRANO S, SMITH N A. Is attention interpretable?[EB/OL]. (2019-06-09)[2024-10-25]. https://arxiv.org/abs/1906.03731. DOI: 10.48550/arXiv.1906.03731.
[31] ABNAR S, ZUIDEMA W. Quantifying attention flow in transformers[EB/OL]. (2020-05-31)[2024-10-25]. https://arxiv.org/abs/2005.00928. DOI: 10.48550/arXiv.2005.00928.
[32] ZHOU M H, BAI Y L, ZHANG W, et al. Look-into-object: self-supervised structure modeling for object recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 11771-11780. DOI: 10.1109/CVPR42600.2020.01179.
[33] WANG J G, LI J, YAU W Y, et al. Boosting dense SIFT descriptors and shape contexts of face images for gender recognition[C]// 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. Los Alamitos, CA: IEEE Computer Society, 2010: 96-102. DOI: 10.1109/CVPRW.2010.5543238.
[34] ZHUANG P Q, WANG Y L, QIAO Y. Learning attentive pairwise interaction for fine-grained classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 13130-13137. DOI: 10.1609/aaai.v34i07.7016.
[35] WAH C, BRANSON S, WELINDER P, et al. Caltech-UCSD birds-200-2011 (CUB-200-2011): CNS-TR-2011-001[DS/OL]. (2011-07-30)[2024-10-25]. https://www.vision.caltech.edu/datasets/cub_200_2011/.
[36] KHOSLA A, JAYADEVAPRAKASH N, YAO B P, et al. Stanford dogs dataset[DS/OL]. (2012-11-21)[2024-10-25]. http://vision.stanford.edu/aditya86/ImageNetDogs/main.html.
[37] VAN HORN G, BRANSON S, FARRELL R, et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2015: 595-604. DOI: 10.1109/CVPR.2015.7298658.
[38] LUO W, YANG X T, MO X J, et al. Cross-X learning for fine-grained visual categorization[C]// IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 8241-8250. DOI: 10.1109/ICCV.2019.00833.
[39] LIANG Y Z, ZHU L C, WANG X H, et al. Penalizing the hard example but not too much: a strong baseline for fine-grained visual classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(5): 7048-7059. DOI: 10.1109/TNNLS.2022.3213563.
[40] HUANG S L, WANG X C, TAO D C. Stochastic partial swap: enhanced model generalization and interpretability for fine-grained recognition[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 600-609. DOI: 10.1109/ICCV48922.2021.00066.
[41] ZHANG L B, HUANG S L, LIU W. Learning sequentially diversified representations for fine-grained categorization[J]. Pattern Recognition, 2022, 121: 108219. DOI: 10.1016/j.patcog.2021.108219.
[42] ZHU Q X, LI Z X, KUANG W L, et al. A multichannel location-aware interaction network for visual classification[J]. Applied Intelligence, 2023, 53(20): 23049-23066. DOI: 10.1007/s10489-023-04734-x.
[43] PU Y F, HAN Y Z, WANG Y L, et al. Fine-grained recognition with learnable semantic data augmentation[J]. IEEE Transactions on Image Processing, 2024, 33: 3130-3144. DOI: 10.1109/TIP.2024.3364500.
[44] HU X B, ZHU S N, PENG T L. Hierarchical attention vision transformer for fine-grained visual classification[J]. Journal of Visual Communication and Image Representation, 2023, 91: 103755. DOI: 10.1016/j.jvcir.2023.103755.
[45] LIU X D, WANG L L, HAN X G. Transformer with peak suppression and knowledge guidance for fine-grained image recognition[J]. Neurocomputing, 2022,492: 137-149. DOI: 10.1016/j.neucom.2022.04.037.
[46] YE S, YU S J, WANG Y, et al.R2-trans: fine-grained visual categorization with redundancy reduction[J]. Image and Vision Computing, 2024, 143: 104923. DOI: 10.1016/j.imavis.2024.104923.
[47] ZHANG Z C, CHEN Z D, WANG Y X, et al. A vision transformer for fine-grained classification by reducing noise and enhancing discriminative information[J]. Pattern Recognition, 2024, 145: 109979. DOI: 10.1016/j.patcog.2023.109979.
[1] PENG Tao, TANG Jing, HE Kai, HU Xinrong, LIU Junping, HE Ruhan. Emotion Recognition Based on Multi-gait Feature Fusion [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 104-111.
[2] ZHUO Ming, LIU Leyuan, ZHOU Shijie, YANG Peng, WAN Simin. A New Method for Invulnerability Analysis of Spatial Information Networks [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 21-31.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] HE Ankang, CHEN Yanping, HU Ying, HUANG Ruizhang, QIN Yongbin. Fusing Boundary Interaction Information for Named Entity Recognition[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 1 -11 .
[2] LU Zhanyue, CHEN Yanping, YANG Weizhe, HUANG Ruizhang, QIN Yongbin. Relational Extraction Method Based on Mask Attention and Multi-feature Convolutional Networks[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 12 -22 .
[3] QI Dandan, WANG Changzheng, GUO Shaoru, YAN Zhichao, HU Zhiwei, SU Xuefeng, MA Boxiang, LI Shizhao, LI Ru. Topic-based Multi-view Entity Representation for Zero-Shot Entity Retrieval[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 23 -34 .
[4] HUANG Chuanyang, CHENG Can’er, LI Songwei, CHENG Hongdong, ZHANG Qiunan, ZHANG Zhao, SHAO Laipeng, TANG Jian, WANG Yongmei, GUO Kuikui, LU Hanglin, HU Junhui. Study on Temperature Sensing Characteristics of Long Period Fiber Grating with Coating Layer[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 35 -42 .
[5] TIAN Sheng, XIONG Chenyin, LONG Anyang. Point Cloud Classification Method of Urban Roads Based on Improved PointNet++[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 1 -14 .
[6] LI Zongxiao, ZHANG Jian, LUO Xinyue, ZHAO Yifei, LU Fei. Research on Arrival Trajectory Prediction Based on K-means and Adam-LSTM[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 15 -23 .
[7] SONG Mingkai, ZHU Chengjie. Research on Fault Location of Distribution Network Based on H-WOA-GWO and Region Correction Strategies[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 24 -37 .
[8] CHEN Yu, CHEN Lei, ZHANG Yi, ZHANG Zhirui. Wind Speed Prediction Model Based on QMD-LDBO-BiGRU[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 38 -57 .
[9] HAN Shuo, JIANG Linfeng, YANG Jianbin. Attention-based PINNs Method for Solving Saint-Venant Equations[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 58 -68 .
[10] SHI Tianyi, NAN Xinyuan, GUO Xiangyu, ZHAO Pu, CAI Xin. Improved ConvNeXt-based Algorithm for Apple Leaf Disease Classification[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 83 -96 .