Journal of Guangxi Normal University(Natural Science Edition) ›› 2023, Vol. 41 ›› Issue (5): 14-25.doi: 10.16088/j.issn.1001-6600.2023022302

Previous Articles     Next Articles

Semantic Enhancement-Based Multimodal Sentiment Analysis

GUO Jialiang, JIN Ting*   

  1. School of Computer Science and Technology, Hainan University, Haikou Hainan 570100, China
  • Received:2023-02-23 Revised:2023-03-31 Published:2023-10-09

Abstract: Multimodal sentiment analysis is an important task in the field of natural language processing, and modality fusion is its core problem. Previous research has not distinguished the primary and secondary status of each modality in sentiment analysis, treating each modality equally and not properly recognizing the quality and performance gaps between different modalities. Existing research shows that textual modalities tend to dominate sentiment analysis, but non-textual modalities contain key feature information that is essential for identifying correct sentiment. Therefore, this paper proposes a modality fusion strategy that focuses on text modality. Through a codec network with an attention mechanism to distinguish the shared and private semantics between different modalities, the two semantic enhancements of non-text modalities relative to text modalities are used to complement text features, achieve a joint robust representation of multiple modalities, and ultimately achieve sentiment prediction. Experiments on the CMU-MOSI and CMU-MOSEI video sentiment analysis datasets show that the accuracy of this method reaches 87.3% and 86.2% respectively, outperforming many existing state-of-the-art methods.

Key words: sentiment analysis, modal fusion, attentional mechanisms, common semantics, private semantics, augmented complementation

CLC Number:  TP391.1
[1] VINODHINI G, CHANDRASEKARAN R M. Sentiment analysis and opinion mining: a survey[J]. International Journal of Advanced Research in Computer Science and Software Engineering, 2012, 2(6): 282-292.
[2] 钟佳娃, 刘巍, 王思丽, 等. 文本情感分析方法及应用综述[J]. 数据分析与知识发现, 2021, 5(6): 1-13. DOI: 10.11925/infotech.2096-3467.2021.0040.
[3] 任泽裕, 王振超, 柯尊旺, 等. 多模态数据融合综述[J]. 计算机工程与应用, 2021, 57(18): 49-64. DOI: 10.3778/j.issn.1002-8331.2104-0237.
[4] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182. DOI: 10.3778/j.issn.1673-9418.2012075.
[5] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438. DOI: 10.16451/j.cnki.issn1003-6059.202005005.
[6] 吴石松, 董召杰. 基于RoBERTa改进的多模态情绪识别关键技术研究[J]. 电子设计工程, 2023, 31(9): 54-58. DOI: 10.14022/j.issn1674-6236.2023.09.011.
[7] SUN Z K, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8992-8999. DOI: 10.1609/aaai.v34i05.6431.
[8] HAZARIKA D, PORIA S, MIHALCEA R, et al. Icon: interactive conversational memory network for multimodal emotion detection[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2594-2604. DOI: 10.18653/v1/D18-1280.
[9] PHAM H, LIANG P P, MANZINI T, et al. Found in translation: learning robust joint representations by cyclic translations between modalities[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 6892-6899. DOI: 10.1609/aaai.v33i01.33016892.
[10] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]// Advances in Neural Information Processing Systems 27 (NIPS 2014). Red Hook, NY: Curran Associates, Inc., 2014: 3104-3112.
[11] CAO R Q, YE C Y, ZHOU H. Multimodel sentiment analysis with self-attention[C]// Proceedings of the Future Technologies Conference (FTC) 2020: Volume 1. Cham: Springer Nature Switzerland AG, 2020: 16-26. DOI: 10.1007/978-3-030-63128-4_2.
[12] PORIA S, CAMBRIA E, BAJPAI R, et al. A review of affective computing: from unimodal analysis to multimodal fusion[J]. Information Fusion, 2017, 37: 98-125. DOI: 10.1016/j.inffus.2017.02.003.
[13] CAMBRIA E, DAS D, BANDYOPADHYAY S, et al. Affective computing and sentiment analysis[M]// CAMBRIA E, DAS D, BANDYOPADHYAY S, et al. A Practical Guide to Sentiment Analysis. Cham: Springer, 2017: 1-10. DOI: 10.1007/978-3-319-55394-8_1.
[14] KAMPMAN O, BAREZI E J, BERTERO D,et al. Investigating audio, visual, and text fusion methods for end-to-end automatic personality prediction[EB/OL]. (2018-05-16)[2023-02-23]. https://arxiv.org/abs/1805.00705. DOI: 10.48550/arXiv.1805.00705.
[15] D’MELLO S K, KORY J. A review and meta-analysis of multimodal affect detection systems[J]. ACM Computing Surveys, 2015, 47(3): 43. DOI: 10.1145/2682899.
[16] MORENCY L P, MIHALCEA R, DOSHI P. Towards multimodal sentiment analysis: harvesting opinions from the web[C]// ICMI’11: Proceedings of the 13th International Conference on Multimodal Interfaces. New York, NY: Association for Computing Machinery, 2011: 169-176. DOI: 10.1145/2070481.2070509.
[17] GKOUMAS D, LI Q C, LIOMA C, et al. What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J]. Information Fusion, 2021, 66: 184-197. DOI: 10.1016/j.inffus.2020.09.005.
[18] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. DOI: 10.18653/v1/D17-1115.
[19] WANG Y S, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting word representations using nonverbal behaviors[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7216-7223. DOI: 10.1609/aaai.v33i01.33017216.
[20] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]// Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 873-883. DOI: 10.18653/v1/P17-1081.
[21] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]// ICMI’21: Proceedings of the 2021 International Conference on Multimodal Interaction. New York, NY: Association for Computing Machinery, 2021: 6-15. DOI: 10.1145/3462244.3479919.
[22] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2021: 9180-9192. DOI: 10.18653/v1/2021.emnlp-main.723.
[23] 杨青, 张亚文, 朱丽, 等. 基于注意力机制和BiGRU融合的文本情感分析[J]. 计算机科学, 2021, 48(11): 307-311. DOI: 10.11896/jsjkx.201000075.
[24] FENG K X, CHASPARI T. A review of generalizable transfer learning in automatic emotion recognition[J]. Frontiers in Computer Science, 2020, 2: 9. DOI: 10.3389/fcomp.2020.00009.
[25] 岳增营, 叶霞, 刘睿珩. 基于语言模型的预训练技术研究综述[J]. 中文信息学报, 2021, 35(9): 15-29. DOI: 10.3969/j.issn.1003-0077.2021.09.002.
[26] 李舟军, 范宇, 吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(3): 162-173. DOI: 10.11896/jsjkx.191000167.
[27] 赵宏, 傅兆阳, 赵凡. 基于BERT和层次化Attention的微博情感分析研究[J]. 计算机工程与应用, 2022, 58(5): 156-162. DOI: 10.3778/j.issn.1002-8331.2107-0448.
[28] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[29] LEE S, HAN D K, KO H. Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification[J]. IEEE Access, 2021, 9: 94557-94572. DOI: 10.1109/ACCESS.2021.3092735.
[30] ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Information Fusion, 2021, 76: 204-226. DOI: 10.1016/j.inffus.2021.06.003.
[31] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11. DOI: 10.3969/j.issn.1003-0077.2019.06.001.
[32] 林敏鸿, 蒙祖强. 基于注意力神经网络的多模态情感分析[J]. 计算机科学, 2020, 47(11A): 508-514, 548. DOI: 10.11896/jsjkx.191100041.
[33] 姚懿秦, 郭薇. 基于交互注意力机制的多模态情感识别算法[J]. 计算机应用研究, 2021, 38(6): 1689-1693. DOI: 10.19734/j.issn.1001-3695.2020.09.0230.
[34] 郭可心, 张宇翔. 基于多层次空间注意力的图文评论情感分析方法[J]. 计算机应用, 2021, 41(10): 2835-2841. DOI: 10.11772/j.issn.1001-9081.2020101676.
[35] 朱亚辉. 基于Bi-LSTM-Attention的英文文本情感分类方法[J]. 电子设计工程, 2022, 30(16): 27-30. DOI: 10.14022/j.issn1674-6236.2022.16.006.
[36] WU Y H, SCHUSTER M, CHEN Z F, et al. Google’s neural machine translation system: bridging the gap between human and machine translation[EB/OL]. (2016-10-08)[2023-02-23]. https://arxiv.org/abs/1609.08144. DOI: 10.48550/arXiv.1609.08144.
[37] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies[C]// 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2014: 960-964. DOI: 10.1109/ICASSP.2014.6853739.
[38] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. (2016-08-12)[2023-02-23]. https://arxiv.org/abs/1606.06259. DOI: 10.48550/arXiv.1606.06259.
[39] ZADEH A, LIANG P P, VANBRIESEN J, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2236-2246. DOI: 10.18653/v1/P18-1208.
[40] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[J]. Proceedings of the AAAI conference on artificial intelligence, 2018, 32(1): 5634-5641. DOI: 10.1609/aaai.v32i1.12021.
[41] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. DOI: 10.18653/v1/P19-1656.
[42] MAI S J, HU H F, XU J, et al. Multi-fusion residual memory network for multimodal human sentiment comprehension[J]. IEEE Transactions on Affective Computing, 2022, 13(1): 320-334. DOI: 10.1109/TAFFC.2020.3000510.
[1] CHAO Rui, ZHANG Kunli, WANG Jiajia, HU Bin, ZHANG Weicong, HAN Yingjie, ZAN Hongying. Construction of Chinese Multimodal Knowledge Base [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 31-39.
[2] SUN Yansong, YANG Liang, LIN Hongfei. Humor Recognition of Sitcom Based on Multi-granularity of Segmentation Enhancement and Semantic Enhancement [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 57-65.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] DONG Shulong, MA Jiangming, XIN Wenjie. Research Progress and Trend of Landscape Visual Evaluation —Knowledge Atlas Analysis Based on CiteSpace[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 1 -13 .
[2] WU Zhengqing, CAO Hui, LIU Baokai. Chinese Fake Review Detection Based on Attention Convolutional Neural Network[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 26 -36 .
[3] LIANG Zhengyou, CAI Junmin, SUN Yu, CHEN Lei. Point Cloud Classification Based on Residual Dynamic Graph Convolution and Feature Enhancement[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 37 -48 .
[4] OUYANG Shuxin, WANG Mingjun, RONG Chuitian, SUN Huabo. Anomaly Detection of Multidimensional QAR Data Based on Improved LSTM[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 49 -60 .
[5] LI Yiyang, ZENG Caibin, HUANG Zaitang. Random Attractors for Chemostat Model with Wall Attachment Driven by Fractional Brownian Motion[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 61 -68 .
[6] LI Pengbo, LI Yongxiang. Radial Symmetric Solutions of p-Laplace Equations on Exterior Domains[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 69 -75 .
[7] WU Zixian, CHENG Jun, FU Jianling, ZHOU Xinwen, XIE Jialong, NING Quan. Analysis of PI-based Event-Triggered Control Design for Semi-Markovian Power Systems[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 76 -85 .
[8] CHENG Lei, YAN Puxuan, DU Bohao, YE Si, ZOU Huahong. Thermal Stability and Dielectric Relaxation of MOF-2 Synthesized in Aqueous Phase[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 86 -95 .
[9] LIU Meiyu, ZHANG Jinyan, ZHOU Tongxi, LIAO Guangfeng, YANG Xinzhou, LU Rumei. A New C21 Steroidal Glycoside from Gymnema sylvestre and Its Hypoglycemic Activity[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 96 -104 .
[10] WANG Wei, DENG Hua, HU Lening, LI Yang. Adsorption Performance of Red Mud-Sodium Alginate Hydrogel on Pb(Ⅱ) in Water[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 105 -115 .