Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (3): 95-103.doi: 10.16088/j.issn.1001-6600.2021070911

Previous Articles     Next Articles

Fusion Algorithm of Face Detection and Head Pose Estimation Based on YOLOv3 Model

LI Yongjie1,2, ZHOU Guihong1,2*, LIU Bo1,2   

  1. 1. School of Information Science and Technology, Hebei Agricultural University, Baoding Hebei 071001, China;
    2. Hebei Key Laboratory of Agricultural Big Data (Hebei Agricultural University), Baoding Hebei 071001, China
  • Received:2021-07-09 Revised:2021-11-11 Online:2022-05-25 Published:2022-05-27

Abstract: To slove the problem that the face detection frame is difficult to learn, and the problems that complex process has high coupling and error accumulation serious in two-step series model, a fusion algorithm of face detection and head pose estimation based on YOLOv3 model is proposed. By using the K-means clustering method to cluster the size of the face area of the training dataset, 9 sets of results are obtained to simulate the size and scale of face areas under real conditions. By expanding the YOLOv3 model, face detection and head pose estimation are achieved simultaneously. Therefore, face detection and head pose estimation on three different levels, multi-scale detection for the feature map is realized. The new algorithm takes advantage of the information in the feature map and uses end-to-end mode training to simplify the processing flow of the head pose estimation task. In addition, an end-to-end model is completed to simplify the processing flow. The recognition accuracy rate of 99.23% is achieved on the pose subset of CAS-PEAL-R1, and the mean absolute error of 3.79° and 4.24° are achieved in the pitch and yaw directions on the Pointing′04 data set. The results show that the model can complete the task of face area detection and head pose estimation under the premise of meeting the real-time requirements, which proves the reliability and practicability of the algorithm in this paper.

Key words: head pose estimation, YOLOv3 model, K-means, multi-scale detection, deep learning

CLC Number: 

  • TP391.41
[1]KUCHINSKY A, PERING C, CREECH M L, et al. FotoFile: a consumer multimedia organization and retrieval system[C]// Proceedings of the 1999 SIGCHI Conference on Human Factors in Computing Systems. New York: ACM, 1999: 496-503.
[2]陈得恩, 张建伟, 柯文俊. 稳定的视频内头部姿态估计方法[J]. 计算机工程与设计, 2020,41(12):3438-3443.
[3]肖仕华, 桑楠, 王旭鹏. 基于深度学习的三维点云头部姿态估计[J]. 计算机应用, 2020,40(4):996-1001.
[4]BORGHI G, FABBRI M, VEZZANI M, et al. Face-from-depth for head pose estimation on depth images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(3): 596-609.
[5]RUIZ N, CHONG E, REHG J M. Fine-grained head pose estimation without keypoints[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW). Piscataway: IEEE, 2018: 2074-2083.
[6]YANG T, CHEN Y T, LIN Y Y, et al. FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image[C]// Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition. Long Beach: IEEE,2019:1087-1096.
[7]AHN B, CHOI D G, PARK J, et al. Real-time head pose estimation using multi-task deep neural network[J]. Robotics and Autonomous Systems, 2018,103:1-12.
[8]齐永锋, 马中玉. 基于深度残差网络的多损失头部姿态估计[J]. 计算机工程,2020,46(12):247-253.
[9]郭赟, 张剑妹, 连玮. 基于头部姿态的学习注意力判别研究[J]. 科学技术与工程, 2020,20(14):5688-5695.
[10]方阳, 刘英杰, 孙立博, 等. 基于SSD模型的人脸检测与头部姿态估计融合算法[J]. 江苏大学学报(自然科学版), 2019,40(4):451-457.
[11]MITTAL A, KUMAR K, DHAMIJA S, et al. Head movement-based driver drowsiness detection: a review of state-of-art techniques[C]// 2016 IEEE International Conference on Engineering and Technology(ICETECH). Piscataway: IEEE, 2016:903-908.
[12]赵磊, 王增才, 王晓锦, 等. 基于ASM局部定位和特征三角形的列车驾驶员头部姿态估计[J]. 铁道学报, 2016,38(9):52-58.
[13]MURPHY-CHUTORIAN E, TRIVEDI M M. Head pose estimation in computer vision: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4):607-626.
[14]梁令羽,孙铭堃,何为,等. Bagging-SVM集成分类器估计头部姿态方法[J].计算机科学与探索, 2019,13(11):1935-1944.
[15]GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE, 2015:1440-1448.
[16]REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[17]LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Computer Vision-ECCV 2016. Cham: Springer, 2016:21-37.
[18]REDMON J, FARHADI A. YOLOv3: an Incremental Improvement[EB/OL]. (2018-04-08)[2021-07-09]. https://arxiv.org/abs/1804.02767. DOI: 10.48550/1804.02767.
[19]REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE, 2016:779-788.
[20]LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE, 2017:936-944.
[21]HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE, 2016:770-778.
[22]张晓华, 山世光, 曹波, 等. CAS-PEAL大规模中国人脸图像数据库及其基本评测介绍[J]. 计算机辅助设计与图形学学报, 2005, 17(1):9-17.
[23]MA B P, HUANG R, QIN L. VoD: a novel image representation for head yaw estimation[J]. Neurocomputing, 2015, 148:455-466.
[24]章惠, 张娜娜, 黄俊. 优化LeNet-5网络的多角度头部姿态估计方法[J]. 计算机应用,2021,41(6):1667-1672.
[25]梁令羽, 张天天, 何为. 多尺度卷积神经网络的头部姿态估计[J]. 激光与光电子学进展,2019,56(13):79-86.
[26]FOYTIK J, ASARI V K. A Two-layer framework for piecewise linear manifold-based head pose estimation[J]. International Journal of Computer Vision, 2013, 101(2):270-287.
[1] ZHANG Ping, XU Qiaozhi. Segmentation of Lung Nodules Based on Multi-receptive Field and Grouping Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 76-87.
[2] WU Jun, OUYANG Aijia, ZHANG Lin. Phosphorylation Site Prediction Model Based on Multi-head Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 161-171.
[3] YAN Longchuan, LI Yan, SONG Hu, ZOU Haodong, WANG Lijun. Web Traffic Prediction Based on Prophet-DeepAR [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 172-184.
[4] LU Kaifeng, YANG Yilong, LI Zhi. A Web Service Classification Method Using BERT and DPCNN [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(6): 87-98.
[5] WU Lingyu, LAN Yang, XIA Haiying. Retinal Image Registration Using Convolutional Neural Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(5): 122-133.
[6] CHEN Wenkang, LU Shenglian, LIU Binghao, LI Guo, LIU Xiaoyu, CHEN Ming. Real-time Citrus Recognition under Orchard Environment by Improved YOLOv4 [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(5): 134-146.
[7] YANG Zhou, FAN Yixing, ZHU Xiaofei, GUO Jiafeng, WANG Yue. Survey on Modeling Factors of Neural Information Retrieval Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 1-12.
[8] DENG Wenxuan, YANG Hang, JIN Ting. A Dimensionality-reduction Method Based on Attention Mechanismon Image Classification [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 32-40.
[9] XUE Tao, QIU Senhui, LU Hao, QIN Xingsheng. Exchange Rate Prediction Based on Empirical Mode Decomposition and Multi-branch LSTM Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 41-50.
[10] TANG Rongchai, WU Xiru. Real-time Detection of Passion Fruit Based on Improved YOLO-V3 Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(6): 32-39.
[11] ZHANG Mingyu,ZHAO Meng,CAI Fuhong,LIANG Yu,WANG Xinhong. Wave Power Prediction Based on Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(3): 25-32.
[12] LI Weiyong, LIU Bin, ZHANG Wei, CHEN Yunfang. An Automatic Summarization Model Based on Deep Learning for Chinese [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(2): 51-63.
[13] LIU Yingxuan, WU Xiru, XUE Ganggang. Multi-target Real-time Detection for Road Traffic SignsBased on Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(2): 96-106.
[14] CHEN Feng,MENG Zuqiang. Topic Discovery in Microblog Based on BTM and Weighting K-Means [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(3): 71-78.
[15] ZHANG Jinlei, LUO Yuling, FU Qiang. Predicting Financial Time Series Based on Gated Recurrent Unit Neural Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 82-89.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] AI Yan, JIA Nan, WANG Yuan, GUO Jing, PAN Dongdong. Review of Statistical Methods and Applications of Genetic Association Analysis for Multiple Traits and Multiple Locus[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 1 -14 .
[2] BAI Defa, XU Xin, WANG Guochang. Review of Generalized Linear Models and Classification for Functional Data[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 15 -29 .
[3] ZENG Qingfan, QIN Yongsong, LI Yufang. Empirical Likelihood Inference for a Class of Spatial Panel Data Models[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 30 -42 .
[4] ZHANG Zhifei, DUAN Qian, LIU Naijia, HUANG Lei. High-dimensional Nonlinear Regression Model Based on JMI[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 43 -56 .
[5] YANG Di, FANG Yangxin, ZHOU Yan. New Category Classification Research Based on MEB and SVM Methods[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 57 -67 .
[6] CHEN Zhongxiu, ZHANG Xingfa, XIONG Qiang, SONG Zefang. Estimation and Test for Asymmetric DAR Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 68 -81 .
[7] DU Jinfeng, WANG Hairong, LIANG Huan, WANG Dong. Progress of Cross-modal Retrieval Methods Based on Representation Learning[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 1 -12 .
[8] LI Muhang, HAN Meng, CHEN Zhiqiang, WU Hongxin, ZHANG Xilong. Survey of Algorithms Oriented to Complex High Utility Pattern Mining[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 13 -30 .
[9] CHAO Rui, ZHANG Kunli, WANG Jiajia, HU Bin, ZHANG Weicong, HAN Yingjie, ZAN Hongying. Construction of Chinese Multimodal Knowledge Base[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 31 -39 .
[10] LI Zhengguang, CHEN Heng, LIN Hongfei. Identification of Adverse Drug Reaction on Social Media Using Bi-directional Language Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 40 -48 .