广西师范大学学报(自然科学版) ›› 2025, Vol. 43 ›› Issue (5): 104-113.doi: 10.16088/j.issn.1001-6600.2024120301

• 智能信息处理 • 上一篇    下一篇

基于注意力引导的遮挡感知面部表情识别

黎豊玮1,2, 谭玉枚1,2, 宋树祥1,2, 夏海英1,2*   

  1. 1.广西类脑计算与智能芯片重点实验室(广西师范大学), 广西 桂林 541004;
    2.广西高校集成电路与微系统重点实验室(广西师范大学), 广西 桂林 541004
  • 收稿日期:2024-12-03 修回日期:2025-03-21 出版日期:2025-09-05 发布日期:2025-08-05
  • 通讯作者: 夏海英(1983—), 女, 山东聊城人, 广西师范大学教授, 博士。E-mail: xhy22@gxnu.edu.cn
  • 基金资助:
    国家自然科学基金(62106054, 62366006); 广西研究生教育创新计划项目(XYCSR2024095); 教育区块链与智能技术教育部重点实验室系统性研究课题(EBME24-07)

Occlusion-Aware Facial Expression Recognition Based on Attention Guidance

LI Fengwei1,2, TAN Yumei1,2, SONG Shuxiang1,2, XIA Haiying1,2*   

  1. 1. Guangxi Key Laboratory of Brain-inspired Computing and Intelligent Chips (Guangxi Normal University), Guilin Guangxi 541004, China;
    2. Key Laboratory of Integrated Circuits and Microsystems (Guangxi Normal University), Guilin Guangxi 541004, China
  • Received:2024-12-03 Revised:2025-03-21 Online:2025-09-05 Published:2025-08-05

摘要: 遮挡和姿态变化是自然场景中影响面部表情识别的主要干扰因素。现有大多数方法采用注意力来增强与表情相关的信息,减少遮挡和姿态变化对表情识别性能的影响。然而,这些方法在网络中不同位置使用相同的注意力机制,忽视浅层和深层特征张量在空间和通道维度上的差异,影响了特征表达的准确性。为此,本文提出一种粒度感知多维自适应注意力网络(GA-MDA)。首先,设计跨粒度空间感知注意力模块(CSA),用于增强浅层网络的特征表达能力;其次,引入多维度自适应注意力模块(MAA),自适应地优化不同维度的空间与通道特征表示,以进一步提升模型的特征表达能力。实验结果表明,GA-MDA在RAF-DB和FERPlus数据集上表情识别准确率分别达到92.01%和90.36%,与目前先进方法HANet和GE-LA相比,识别性能分别提升0.09和0.43个百分点,模型参数量分别减少2.963×107和6.341×107

关键词: 表情识别, 注意力机制, 遮挡, 鲁棒性

Abstract: Occlusion and pose variations are the main distractors that affect facial expression recognition in natural scenes. Most existing methods use attention to enhance expression-related information and reduce the impact of occlusion and pose variations on expression recognition performance. However, these methods use the same attention mechanism at different locations in the network, ignoring the differences between shallow and deep feature tensors in spatial and channel dimensions, which affects the accuracy of feature expression. For this reason, a Granularity-Aware Multi-Dimensional Adaptive Attention network (GA-MDA) is proposed. Firstly, a Cross-granularity Spatial Aware Attention module (CSA) is designed for enhancing the feature expression ability of the shallow network. Then, a Multi-Dimensional Adaptive Attention module (MAA) is introduced to adaptively optimize the spatial and channel feature representations in different dimensions to further enhance the feature expression ability of the model. The results show that GA-MDA achieves recognition accuracy of 92.01% and 90.36% on RAF-DB and FERPlus datasets, improves the recognition performance by 0.09% and 0.43%, and reduces the number of model parameters by 2.963×107 and 6.341×107, respectively, compared with the current state-of-the-art methods HANet and GE-LA.

Key words: expression recognition, attention mechanism, occlusion, robustness

中图分类号:  TP391.41

[1] LUCEY P, COHN J F, KANADE T, et al. The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression[C]// 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. Los Alamitos, CA: IEEE Computer Society, 2010: 94-101. DOI: 10.1109/CVPRW.2010.5543262.
[2] ZHAO G Y, HUANG X H, TAINI M, et al. Facial expression recognition from near-infrared videos[J]. Image and Vision Computing, 2011, 29(9): 607-619. DOI: 10.1016/j.imavis.2011.07.002.
[3] LI S, DENG W H, DU J P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Los Alamitos, CA: IEEE Computer Society, 2017: 2584-2593. DOI: 10.1109/CVPR.2017.277.
[4] GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning: a report on three machine learning contests[C]// Neural Information Processing: LNCS Volume 8228. Berlin: Springer, 2013: 117-124. DOI: 10.1007/978-3-642-42051-1_16.
[5] XIE S Y, HU H F, WU Y B. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition[J]. Pattern Recognition, 2019, 92: 177-191. DOI: 10.1016/j.patcog.2019.03.019.
[6] ZHAO S C, MA Y S, GU Y, et al. An end-to-end visual-audio attention network for emotion recognition in user-generated videos[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(1): 303-311. DOI: 10.1609/aaai.v34i01.5364.
[7] LI Y, ZENG J B, SHAN S G, et al. Occlusion aware facial expression recognition using CNN with attention mechanism[J]. IEEE Transactions on Image Processing, 2019, 28(5): 2439-2450. DOI: 10.1109/TIP.2018.2886767.
[8] WANG Z N, ZENG F W, LIU S C, et al.OAENet: oriented attention ensemble for accurate facial expression recognition[J]. Pattern Recognition, 2021, 112: 107694. DOI: 10.1016/j.patcog.2020.107694.
[9] WANG K, PENG X J, YANG J F, et al. Region attention networks for pose and occlusion robust facial expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 4057-4069. DOI: 10.1109/TIP.2019.2956143.
[10] TAN Y M, XIA H Y, SONG S X. Learning informative and discriminative semantic features for robust facial expression recognition[J]. Journal of Visual Communication and Image Representation, 2024, 98: 104062. DOI: 10.1016/j.jvcir.2024.104062.
[11] 卢莉丹, 夏海英, 谭玉枚, 等. 注意力引导局部特征联合学习的人脸表情识别[J]. 中国图象图形学报, 2024, 29(8): 2377-2387. DOI: 10.11834/jig.230410.
[12] 苏春海, 夏海英. 抗噪声双约束网络的面部表情识别[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 70-82. DOI: 10.16088/j.issn.1001-6600.2024021601.
[13] 胡琨, 董爱华, 黄荣. 基于双流特征和注意力机制的人脸表情识别[J/OL]. 东华大学学报(自然科学版)[2025-03-21]. https://doi.org/10.19886/j.cnki.dhdz.2024.0311.
[14] 于霞, 武家逸, 杨畅, 等. 基于特征增强和多头注意力融合的表情识别[J]. 沈阳大学学报(自然科学版), 2025, 37(1): 44-52, 93. DOI: 10.16103/j.cnki.21-1583/n.2025.01.007.
[15] 郭胜, 蔡姗, 邹雪, 等. 基于加权多头并行注意力的局部遮挡面部表情识别[J]. 计算机系统应用, 2024, 33(1): 254-262. DOI: 10.15888/j.cnki.csa.009352.
[16] 罗思诗, 李茂军, 陈满. 多尺度融合注意力机制的人脸表情识别网络[J]. 计算机工程与应用, 2023, 59(1): 199-206. DOI: 10.3778/j.issn.1002-8331.2203-0170.
[17] ZHAO Z Q, LIU Q S, WANG S M. Learning deep global multi-scale and local attention features for facial expression recognition in the wild[J]. IEEE Transactions on Image Processing, 2021, 30: 6544-6556. DOI: 10.1109/TIP.2021.3093397.
[18] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Computer Vision-ECCV 2018. Cham: Springer, 2018: 3-19. DOI: 10.1007/978-3-030-01234-2_1.
[19] LIU H W, CAI H L, LIN Q C, et al. Adaptive multilayer perceptual attention network for facial expression recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9): 6253-6266. DOI: 10.1109/TCSVT.2022.3165321.
[20] ZHANG Z Y, TIAN X, ZHANG Y, et al. Enhanced discriminative global-local feature learning with priority for facial expression recognition[J]. Information Sciences, 2023, 630: 370-384. DOI: 10.1016/j.ins.2023.02.056.
[21] TAO H J, DUAN Q Y. Hierarchical attention network with progressive feature fusion for facial expression recognition[J]. Neural Networks, 2024, 170: 337-348. DOI: 10.1016/j.neunet.2023.11.033.
[22] BARSOUM E, ZHANG C, FERRER CC, et al. Training deep networks for facial expression recognition with crowd-sourced label distribution[C]// ICMI'16: Proceedings of the 18th ACM International Conference on Multimodal Interaction. New York, NY: Association for Computing Machinery, 2016: 279-283. DOI: 10.1145/2993148.2993165.
[23] PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 8024-8035.
[24] GUO Y D, ZHANG L, HU Y X, et al. MS-celeb-1M: a dataset and benchmark for large-scale face recognition[C]// Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 87-102. DOI: 10.1007/978-3-319-46487-9_6.
[25] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
[26] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30)[2024-12-03]. https://arxiv.org/abs/1412.6980. DOI: 10.48550/arXiv.1412.6980.
[27] CHEN D L, WEN G H, LI H H, et al. Multi-relations aware network for in-the-wild facial expression recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(8): 3848-3859. DOI: 10.1109/TCSVT.2023.3234312.
[28] XIA H Y, LU L D, SONG S X. Feature fusion of multi-granularity and multi-scale for facial expression recognition[J]. The Visual Computer, 2024, 40(3): 2035-2047. DOI: 10.1007/s00371-023-02900-3.
[29] 刘娟, 王颖, 胡敏, 等. 融合全局增强-局部注意特征的表情识别网络[J]. 计算机科学与探索, 2024, 18(9): 2487-2500. DOI: 10.3778/j.issn.1673-9418.2307013.
[30] MA F Y, SUN B, LI S T. Facial expression recognition with visual transformers and attentional selective fusion[J]. IEEE Transactions on Affective Computing, 2023, 14(2): 1236-1248. DOI: 10.1109/TAFFC.2021.3122146.
[31] 南亚会, 华庆一. 局部加全局视角遮挡人脸表情识别方法[J]. 计算机工程与应用, 2024, 60(13): 180-189. DOI: 10.3778/j.issn.1002-8331.2309-0213.
[32] GERA D, BALASUBRAMANIAN S. Landmark guidance independentspatio-channel attention and complementary context information based facial expression recognition[J]. Pattern Recognition Letters, 2021, 145: 58-66. DOI: 10.1016/j.patrec.2021.01.029.
[33] LIU P, LIN Y W, MENG Z B, et al. Point adversarial self-mining: a simple method for facial expression recognition[J]. IEEE Transactions on Cybernetics, 2022, 52(12): 12649-12660. DOI: 10.1109/TCYB.2021.3085744.
[34] CAI J, MENG Z B, KHAN A S, et al. Probabilistic attribute tree structured convolutional neural networks for facial expression recognition in the wild[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 1927-1941. DOI: 10.1109/TAFFC.2022.3156920.
[35] LIU Y, ZHANG X M, KAUTTONEN J, et al. Uncertain facial expression recognition via multi-task assisted correction[J]. IEEE Transactions on Multimedia, 2024, 26: 2531-2543. DOI: 10.1109/TMM.2023.3301209.
[36] 侯海燕, 谭玉枚, 宋树祥, 等. 头部姿态鲁棒的面部表情识别[J]. 广西师范大学学报(自然科学版), 2024, 42(6): 126-137. DOI: 10.16088/j.issn.1001-6600.2023121801.
[37] LI HH, XIAO X L, LIU X Y, et al. Heuristic objective for facial expression recognition[J]. The Visual Computer, 2023, 39(10): 4709-4720. DOI: 10.1007/s00371-022-02619-7.
[38] CHATTOPADHAY A, SARKAR A, HOWLADER P, et al. Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks[C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Los Alamitos, CA: IEEE Computer Society, 2018: 839-847. DOI: 10.1109/WACV.2018.00097.
[1] 易见兵, 张裕贤, 曹锋, 李俊, 彭鑫, 陈鑫. 基于时空注意力的3D人体姿态估计网络设计[J]. 广西师范大学学报(自然科学版), 2025, 43(5): 130-144.
[2] 田晟, 熊辰崟, 龙安洋. 基于改进PointNet++的城市道路点云分类方法[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 1-14.
[3] 韩烁, 江林峰, 杨建斌. 基于注意力机制PINNs方法求解圣维南方程[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 58-68.
[4] 石天怡, 南新元, 郭翔羽, 赵濮, 蔡鑫. 基于改进ConvNeXt的苹果叶片病害分类算法[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 83-96.
[5] 卢展跃, 陈艳平, 杨卫哲, 黄瑞章, 秦永彬. 基于掩码注意力与多特征卷积网络的关系抽取方法[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 12-22.
[6] 郭翔羽, 石天怡, 陈燕楠, 南新元, 蔡鑫. 基于YOLO-CDBW模型的列车接触网异物检测研究[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 56-69.
[7] 苏春海, 夏海英. 抗噪声双约束网络的面部表情识别[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 70-82.
[8] 刘玉娜, 马双宝. 基于改进YOLOv8n的轻量化织物疵点检测算法[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 83-94.
[9] 戴林华, 黎远松, 石睿, 何忠良, 李雷. HSED-YOLO:一种轻量化的带钢表面缺陷检测模型[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 95-106.
[10] 余快, 宋宝贵, 邵攀, 余翱. 基于层级尺度交互的U-Net遥感影像建筑物提取方法[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 121-132.
[11] 侯海燕, 谭玉枚, 宋树祥, 夏海英. 头部姿态鲁棒的面部表情识别[J]. 广西师范大学学报(自然科学版), 2024, 42(6): 126-137.
[12] 卢家辉, 陈庆锋, 王文广, 余谦, 何乃旭, 韩宗钊. 基于多尺度注意力的器官图像分割方法[J]. 广西师范大学学报(自然科学版), 2024, 42(6): 138-148.
[13] 杜帅文, 靳婷. 基于用户行为特征的深度混合推荐算法[J]. 广西师范大学学报(自然科学版), 2024, 42(5): 91-100.
[14] 田晟, 胡啸. 基于Transformer模型的车辆轨迹预测[J]. 广西师范大学学报(自然科学版), 2024, 42(3): 47-58.
[15] 王天雨, 袁嘉伟, 齐芮, 李洋. 多类型知识增强的微博立场检测模型[J]. 广西师范大学学报(自然科学版), 2024, 42(1): 79-90.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 钟俏, 陈生龙, 唐聪聪. 水凝胶技术在微藻采收中的应用:现状、挑战与发展分析[J]. 广西师范大学学报(自然科学版), 2024, 42(6): 16 -29 .
[2] 施慧露, 莫燕华, 骆海玉, 马姜明. 檵木乙酸乙酯萃取物抑菌活性研究[J]. 广西师范大学学报(自然科学版), 2025, 43(1): 1 -8 .
[3] 贺青, 李栋, 罗思源, 贺寓东, 李彪, 王强. 超宽带里德堡原子天线技术研究进展[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 1 -19 .
[4] 黄仁慧, 张锐锋, 文晓浩, 闭金杰, 黄守麟, 李廷会. 基于复数协方差卷积神经网络的运动想象脑电信号解码方法[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 43 -56 .
[5] 田晟, 熊辰崟, 龙安洋. 基于改进PointNet++的城市道路点云分类方法[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 1 -14 .
[6] 黎宗孝, 张健, 罗鑫悦, 赵嶷飞, 卢飞. 基于K-means和Adam-LSTM的机场进场航迹预测研究[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 15 -23 .
[7] 宋铭楷, 朱成杰. 基于H-WOA-GWO和区段修正策略的配电网故障定位研究[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 24 -37 .
[8] 韩烁, 江林峰, 杨建斌. 基于注意力机制PINNs方法求解圣维南方程[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 58 -68 .
[9] 李志欣, 匡文兰. 结合互注意力空间自适应和特征对集成判别的细粒度图像分类[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 69 -82 .
[10] 石天怡, 南新元, 郭翔羽, 赵濮, 蔡鑫. 基于改进ConvNeXt的苹果叶片病害分类算法[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 83 -96 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发