头部姿态鲁棒的面部表情识别

doi:10.16088/j.issn.1001-6600.2023121801

摘要/Abstract

摘要： 针对面部表情识别中受头部姿态干扰导致识别性能低的问题,本文提出一种双分支特征融合(dual-branch feature fusion,DFF)方法,以增强面部表情识别的头部姿态鲁棒性。首先,在表情分支中,采用特征提取模块提取高维粗糙表情特征,再利用空间特征增强(spatial feature enhancement,SFE)模块增强高维表情特征在空间层面的信息交互,从而提升表情分支的表情特征提取能力。同时,在头部姿态分支中,利用预训练并固定权重的头部姿态特征提取(head pose feature extraction,HPFE)模块,提取出人脸表情图像的头部姿态特征。最后,将表情分支中的表情特征与头部姿态分支中的头部姿态特征逐元素相乘融合,实现特征间信息互补,得到对头部姿态鲁棒的情感表征。在RAF-DB和FERPlus数据集上针对2种头部姿态Pose(>30°)、Pose(>45°)进行实验评估:在RAF-DB数据集上识别准确率分别为89.98%、89.96%,在FERPlus数据集上分别为89.20%、87.94%。实验结果表明,本文提出的方法提高了存在头部姿态干扰时面部图像的表情识别准确率,对研究自然环境下面部表情识别具有一定贡献。

关键词: 表情识别, 头部姿态, 特征提取, 鲁棒性, 深度学习

Abstract: This paper proposes a Dual-branch Feature Fusion (DFF) method to enhance the robustness of head posture in facial expression recognition, addressing the issue of low recognition performance caused by head posture interference. Firstly, in the expression branch, high-dimensional rough semantic features are extracted using the ResNet18 backbone network. Then, the Spatial Feature Enhancement (SFE) module is employed to facilitate information interaction among high-level semantic features at the spatial level, thereby improving the expression feature extraction capability. Meanwhile, in the head pose branch, head pose features are extracted using the Head Pose Feature Extraction (HPFE),which is pre-trained on the head pose dataset 300W_LP with fixed weights. Finally, the expression features in the expression branch and the head pose features in the head pose branch are fused element-by-element to attain complementary information and establish a pose-robust emotional representation. The proposed method is evaluated on two widely-used datasets: RAF-DB dataset and FERPlus dataset. On the Pose Variation test set, the recognition accuracy of the two head poses (Pose>30° and Pose>45°) is 89.98% and 89.96% on the RAF-DB dataset, and 89.20% and 87.94% on FERPlus dataset, respectively. The experimental results show that the method proposed in this paper improves the accuracy of facial expression recognition in images under head posture interference, which is of great significance for research on facial expression recognition in natural environments.

Key words: expression recognition, head posture, feature extraction, robustness, deep learning

中图分类号: TP391.41

侯海燕, 谭玉枚, 宋树祥, 夏海英. 头部姿态鲁棒的面部表情识别[J]. 广西师范大学学报（自然科学版）, 2024, 42(6): 126-137.

HOU Haiyan, TAN Yumei, SONG Shuxiang, XIA Haiying. Head Pose-Robust Facial Expression Recognition[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(6): 126-137.

参考文献

[1] LI S, DENG W H. Deep facial expression recognition: A survey[J]. IEEE Transactions on Affective Computing, 2022, 13(3): 1195-1215. DOI: 10. 1109/TAFFC.2020.2981446.
[2] 李晶, 李健, 陈海丰, 等. 基于关键区域遮挡与重建的人脸表情识别[J]. 计算机工程, 2024, 50(5): 241-249. DOI: 10.19678/j.issn.1000-3428.0067538.
[3] VINCIARELLI A, PANTIC M, BOURLARD H. Social signal processing: survey of an emerging domain[J]. Image and Vision Computing, 2009, 27(12): 1743-1759. DOI: 10.1016/j.imavis.2008.11.007.
[4] 廖明明, 赵波. 基于面部表情和双流网络的驾驶员疲劳检测[J]. 科学技术与工程, 2022, 22(2): 614-619. DOI: 10.3969/j.issn.1671-1815.2022.02.024.
[5] GRECO M, CARUSO P F, CECCONI M. Artificial intelligence in the intensive care unit[J]. Seminars in Respiratory and Critical Care Medicine, 2021, 42(1): 2-9. DOI: 10.1055/S-0040-1719037.
[6] 陈子健. 在线学习环境下基于面部表情的学习情绪识别方法及应用研究[D]. 武汉: 华中师范大学, 2020.
[7] 张发勇, 刘袁缘, 李杏梅, 等. 基于多视角深度网络增强森林的表情识别[J]. 计算机辅助设计与图形学学报, 2018, 30(12): 2318-2326. DOI: 10.3724/SP.J.1089.2018.17154.
[8] 蒋斌, 钟瑞, 张秋闻, 等. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61. DOI: 10.3778/j.issn.1002-8331.2012-0227.
[9] XUE F L, WANG Q C, GUO G D. TransFER: learning relation-aware facial expression representations with transformers[C] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 3581-3590. DOI: 10.1109/ICCV48922.2021.00358.
[10] ZHANG F F, ZHANG T Z, MAO Q R, et al. Joint pose and expression modeling for facial expression recognition[C] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2018: 3359-3368. DOI: 10.1109/CVPR.2018.00354.
[11] 陈国社, 张青, 李凡. 基于多姿态多状态面部情绪模型的表情识别[J]. 华中科技大学学报(自然科学版), 2004, 32(8): 60-62. DOI: 10.13245/j.hust.2004.08.021.
[12] GÜNEY F, ARAR N M, FISCHER M, et al. Cross-pose facial expression recognition[C] // 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). Piscataway, NJ: IEEE, 2013: 1-6. DOI: 10.1109/FG.2013.6553814.
[13] HASSNER T, HAREL S, PAZ E, et al. Effective face frontalization in unconstrained images[C] // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2015: 4295-4304. DOI:10.1109/CVPR.2015.7299058.
[14] ZHANG F F, MAO Q R, SHEN X J, et al. Spatially coherent feature learning for pose-invariant facial expression recognition[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2018, 14(1s): 27. DOI: 10.1145/3176646.
[15] 孙哲. 基于解耦空间特征学习的稀疏表示面部表情识别算法研究[D]. 秦皇岛: 燕山大学, 2018. DOI: 10.7666/d.D01511248.
[16] RUAN D L, YAN Y, CHEN S, et al. Deep disturbance-disentangled learning for facial expression recognition[C] // Proceedings of the 28th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2020: 2833-2841. DOI: 10.1145/3394171.3413907.
[17] RUAN D L, MO R Y, YAN Y, et al. Adaptive deep disturbance-disentangled learning for facial expression recognition[J]. International Journal of Computer Vision, 2022, 130(2): 455-477. DOI: 10.1007/s11263-021-01556-7.
[18] JIANG J, DENG W H. Disentangling identity and pose for facial expression recognition[J]. IEEE Transactions on Affective Computing, 2022. 13(4): 1868-1878. DOI: 10.1109/taffc.2022.3197761.
[19] 刘娟, 王颖, 胡敏, 等. 融合全局增强-局部注意特征的表情识别网络[J]. 计算机科学与探索, 2024,18(9): 2487-2500.DOI: 10.3778/j.issn.1673-9418.2307013.
[20] LIU Y Y, ZENG J B, SHAN S G, et al. Multi-channel pose-aware convolution neural networks for multi-view facial expression recognition[C] // 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). Los Alamitos, CA: IEEE Computer Society, 2018: 458-465. DOI:10.1109/FG.2018.00074.
[21] LIU Y Y, DAI W, FANG F, et al. Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition[J]. Information Sciences, 2021, 578: 195-213. DOI:10.1016/j.ins.2021.07.034.
[22] 郭胜, 蔡姗, 邹雪, 等. 基于加权多头并行注意力的局部遮挡面部表情识别[J]. 计算机系统应用, 2024, 33(1): 254-262. DOI: 10.15888/j.cnki.csa.009352.
[23] 南亚会, 华庆一. 局部加全局视角遮挡人脸表情识别方法[J]. 计算机工程与应用, 2024, 60(13): 180-189. DOI: 10.3778/j.issn.1002-8331.2309-0213.
[24] ZHU X Y, LEI Z, LIU X M, et al. Face alignment across large poses: a 3D solution[C] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2016: 146-155. DOI: 10.1109/CVPR.2016.23.
[25] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C] // 2019 IEEE/CVF International Conference on Computer Vision(ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 1314-1324. DOI: 10.1109/ICCV.2019.00140.
[26] WEN Z Y, LIN W Z, WANG T, et al. Distractyour attention: multi-head cross attention network for facial expression recognition[J]. Biomimetics, 2023, 8(2): 199. DOI: 10.3390/biomimetics8020199.
[27] LI S, DENG W H, DU J P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 2584-2593. DOI: 10.1109/CVPR.2017.277.
[28] BARSOUM E, ZHANG C, FERRER C C, et al. Training deep networks for facial expression recognition with crowd-sourced label distribution[C] // Proceedings of the 18th ACM International Conference on Multimodal Interaction. New York, NY: Association for Computing Machinery, 2016: 279-283. DOI: 10.1145/2993148.2993165.
[29] WANG K, PENG X J, YANG J F, et al. Region attention networks for pose and occlusion robust facial expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 4057-4069. DOI: 10.1109/TIP.2019.2956143.
[30] RUDER S. An overview of gradient descent optimization algorithms[EB/OL]. (2017-06-15)[2023-12-18]. https://arxiv.org/abs/1609.04747. DOI: 10.48550/arXiv.1609.04747.
[31] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. DOI: 10.1007/s11263-019-01228-7.
[32] LI Y J, LU G M, LI J X, et al. Facial expression recognition in the wild using multi-level features and attention mechanisms[J]. IEEE Transactions on Affective Computing, 2023, 14(1): 451-462. DOI: 10.1109/TAFFC.2020.3031602.
[33] MA F Y, SUN B, LI S T. Facial expression recognition with visual transformers and attentional selective fusion[J]. IEEE Transactions on Affective Computing, 2023, 14(2): 1236-1248. DOI: 10.1109/TAFFC.2021.3122146.
[34] GERA D, BALASUBRAMANIAN S. Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition[J]. Pattern Recognition Letters, 2021, 145: 58-66. DOI: 10.1016/j.patrec.2021.01.029.
[35] XIA H Y, LU L D, SONG S X. Feature fusion of multi-granularity and multi-scale for facial expression recognition[J]. The Visual Computer, 2024, 40(3): 2035-2047. DOI: 10.1007/s00371-023-02900-3.
[36] 罗岩, 冯天波, 邵洁. 基于注意力及视觉Transformer的野外人脸表情识别[J]. 计算机工程与应用, 2022, 58(10): 200-207. DOI: 10.3778/j.issn.1002-8331.2111-0044.
[37] CHO S, LEE J. Learning local attention with guidance map for pose robust facial expression recognition[J]. IEEE Access, 2022, 10: 85929-85940. DOI: 10.1109/ACCESS.2022.3198658.
[38] JIANG J, DENG W H. Boosting facial expression recognition by a semi-supervised progressive teacher[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 2402-2414. DOI: 10.1109/taffc.2021.3131621.
[39] WANG K, PENG X J, YANG J F, et al. Suppressing uncertainties for large-scale facial expression recognition[C] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 6896-6905. DOI: 10.1109/CVPR42600.2020.00693.
[40] 南亚会, 华庆一, 刘继华. 嵌入注意力的Gabor CNN快速人脸表情识别方法[J]. 软件导刊, 2023, 22(9): 182-189. DOI: 10.11907/rjdk.231549.
[41] 何昱均, 韩永国, 张红英. FFDNet:复杂环境中的细粒度面部表情识别[J]. 计算机应用研究,2024, 41(5): 1578-1584. DOI: 10.19734/j.issn.1001-3695.2023.08.0394.
[42] SHAO J, LUO Y. TAMNet: two attention modules-based network on facial expression recognition under uncertainty[J]. Journal of Electronic Imaging, 2021, 30(3): 033021. DOI: 10.1117/1.JEI.30.3.033021.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed