广西师范大学学报(自然科学版) ›› 2023, Vol. 41 ›› Issue (5): 14-25.doi: 10.16088/j.issn.1001-6600.2023022302

• 研究论文 • 上一篇    下一篇

基于语义增强的多模态情感分析

郭嘉梁, 靳婷*   

  1. 海南大学 计算机科学与技术学院,海南 海口 570100
  • 收稿日期:2023-02-23 修回日期:2023-03-31 发布日期:2023-10-09
  • 通讯作者: 靳婷(1982—), 女, 河北赵县人, 海南大学正高级实验师, 博导。E-mail: jinting@hainanu.edu.cn
  • 基金资助:
    国家自然科学基金(61862021); 海南省自然科学基金(620RC565)

Semantic Enhancement-Based Multimodal Sentiment Analysis

GUO Jialiang, JIN Ting*   

  1. School of Computer Science and Technology, Hainan University, Haikou Hainan 570100, China
  • Received:2023-02-23 Revised:2023-03-31 Published:2023-10-09

摘要: 多模态情感分析是自然语言处理领域的重要任务,模态融合是其核心问题。以往的研究没有区分各个模态在情感分析中的主次地位,没有考虑到不同模态之间的质量和性能差距,平等地对待各个模态。现有研究表明文本模态往往在情感分析中占据主导地位,但非文本模态包含识别正确情感必不可少的关键特征信息。因此,本文提出一种以文本模态为中心的模态融合策略,通过带有注意力机制的编解码器网络区分不同模态之间的共有语义和私有语义,利用非文本模态相对于文本模态的2种语义增强补充文本特征,实现多模态的联合鲁棒表示,并最终实现情感预测。在CMU-MOSI和CMU-MOSEI视频情感分析数据集上的实验显示,本方法的准确率分别达到87.3%和86.2%,优于许多现有的先进方法。

关键词: 情感分析, 模态融合, 注意力机制, 共同语义, 私有语义, 增强补充

Abstract: Multimodal sentiment analysis is an important task in the field of natural language processing, and modality fusion is its core problem. Previous research has not distinguished the primary and secondary status of each modality in sentiment analysis, treating each modality equally and not properly recognizing the quality and performance gaps between different modalities. Existing research shows that textual modalities tend to dominate sentiment analysis, but non-textual modalities contain key feature information that is essential for identifying correct sentiment. Therefore, this paper proposes a modality fusion strategy that focuses on text modality. Through a codec network with an attention mechanism to distinguish the shared and private semantics between different modalities, the two semantic enhancements of non-text modalities relative to text modalities are used to complement text features, achieve a joint robust representation of multiple modalities, and ultimately achieve sentiment prediction. Experiments on the CMU-MOSI and CMU-MOSEI video sentiment analysis datasets show that the accuracy of this method reaches 87.3% and 86.2% respectively, outperforming many existing state-of-the-art methods.

Key words: sentiment analysis, modal fusion, attentional mechanisms, common semantics, private semantics, augmented complementation

中图分类号:  TP391.1

[1] VINODHINI G, CHANDRASEKARAN R M. Sentiment analysis and opinion mining: a survey[J]. International Journal of Advanced Research in Computer Science and Software Engineering, 2012, 2(6): 282-292.
[2] 钟佳娃, 刘巍, 王思丽, 等. 文本情感分析方法及应用综述[J]. 数据分析与知识发现, 2021, 5(6): 1-13. DOI: 10.11925/infotech.2096-3467.2021.0040.
[3] 任泽裕, 王振超, 柯尊旺, 等. 多模态数据融合综述[J]. 计算机工程与应用, 2021, 57(18): 49-64. DOI: 10.3778/j.issn.1002-8331.2104-0237.
[4] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182. DOI: 10.3778/j.issn.1673-9418.2012075.
[5] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438. DOI: 10.16451/j.cnki.issn1003-6059.202005005.
[6] 吴石松, 董召杰. 基于RoBERTa改进的多模态情绪识别关键技术研究[J]. 电子设计工程, 2023, 31(9): 54-58. DOI: 10.14022/j.issn1674-6236.2023.09.011.
[7] SUN Z K, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8992-8999. DOI: 10.1609/aaai.v34i05.6431.
[8] HAZARIKA D, PORIA S, MIHALCEA R, et al. Icon: interactive conversational memory network for multimodal emotion detection[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2594-2604. DOI: 10.18653/v1/D18-1280.
[9] PHAM H, LIANG P P, MANZINI T, et al. Found in translation: learning robust joint representations by cyclic translations between modalities[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 6892-6899. DOI: 10.1609/aaai.v33i01.33016892.
[10] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]// Advances in Neural Information Processing Systems 27 (NIPS 2014). Red Hook, NY: Curran Associates, Inc., 2014: 3104-3112.
[11] CAO R Q, YE C Y, ZHOU H. Multimodel sentiment analysis with self-attention[C]// Proceedings of the Future Technologies Conference (FTC) 2020: Volume 1. Cham: Springer Nature Switzerland AG, 2020: 16-26. DOI: 10.1007/978-3-030-63128-4_2.
[12] PORIA S, CAMBRIA E, BAJPAI R, et al. A review of affective computing: from unimodal analysis to multimodal fusion[J]. Information Fusion, 2017, 37: 98-125. DOI: 10.1016/j.inffus.2017.02.003.
[13] CAMBRIA E, DAS D, BANDYOPADHYAY S, et al. Affective computing and sentiment analysis[M]// CAMBRIA E, DAS D, BANDYOPADHYAY S, et al. A Practical Guide to Sentiment Analysis. Cham: Springer, 2017: 1-10. DOI: 10.1007/978-3-319-55394-8_1.
[14] KAMPMAN O, BAREZI E J, BERTERO D,et al. Investigating audio, visual, and text fusion methods for end-to-end automatic personality prediction[EB/OL]. (2018-05-16)[2023-02-23]. https://arxiv.org/abs/1805.00705. DOI: 10.48550/arXiv.1805.00705.
[15] D’MELLO S K, KORY J. A review and meta-analysis of multimodal affect detection systems[J]. ACM Computing Surveys, 2015, 47(3): 43. DOI: 10.1145/2682899.
[16] MORENCY L P, MIHALCEA R, DOSHI P. Towards multimodal sentiment analysis: harvesting opinions from the web[C]// ICMI’11: Proceedings of the 13th International Conference on Multimodal Interfaces. New York, NY: Association for Computing Machinery, 2011: 169-176. DOI: 10.1145/2070481.2070509.
[17] GKOUMAS D, LI Q C, LIOMA C, et al. What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J]. Information Fusion, 2021, 66: 184-197. DOI: 10.1016/j.inffus.2020.09.005.
[18] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. DOI: 10.18653/v1/D17-1115.
[19] WANG Y S, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting word representations using nonverbal behaviors[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7216-7223. DOI: 10.1609/aaai.v33i01.33017216.
[20] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]// Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 873-883. DOI: 10.18653/v1/P17-1081.
[21] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]// ICMI’21: Proceedings of the 2021 International Conference on Multimodal Interaction. New York, NY: Association for Computing Machinery, 2021: 6-15. DOI: 10.1145/3462244.3479919.
[22] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2021: 9180-9192. DOI: 10.18653/v1/2021.emnlp-main.723.
[23] 杨青, 张亚文, 朱丽, 等. 基于注意力机制和BiGRU融合的文本情感分析[J]. 计算机科学, 2021, 48(11): 307-311. DOI: 10.11896/jsjkx.201000075.
[24] FENG K X, CHASPARI T. A review of generalizable transfer learning in automatic emotion recognition[J]. Frontiers in Computer Science, 2020, 2: 9. DOI: 10.3389/fcomp.2020.00009.
[25] 岳增营, 叶霞, 刘睿珩. 基于语言模型的预训练技术研究综述[J]. 中文信息学报, 2021, 35(9): 15-29. DOI: 10.3969/j.issn.1003-0077.2021.09.002.
[26] 李舟军, 范宇, 吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(3): 162-173. DOI: 10.11896/jsjkx.191000167.
[27] 赵宏, 傅兆阳, 赵凡. 基于BERT和层次化Attention的微博情感分析研究[J]. 计算机工程与应用, 2022, 58(5): 156-162. DOI: 10.3778/j.issn.1002-8331.2107-0448.
[28] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[29] LEE S, HAN D K, KO H. Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification[J]. IEEE Access, 2021, 9: 94557-94572. DOI: 10.1109/ACCESS.2021.3092735.
[30] ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Information Fusion, 2021, 76: 204-226. DOI: 10.1016/j.inffus.2021.06.003.
[31] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11. DOI: 10.3969/j.issn.1003-0077.2019.06.001.
[32] 林敏鸿, 蒙祖强. 基于注意力神经网络的多模态情感分析[J]. 计算机科学, 2020, 47(11A): 508-514, 548. DOI: 10.11896/jsjkx.191100041.
[33] 姚懿秦, 郭薇. 基于交互注意力机制的多模态情感识别算法[J]. 计算机应用研究, 2021, 38(6): 1689-1693. DOI: 10.19734/j.issn.1001-3695.2020.09.0230.
[34] 郭可心, 张宇翔. 基于多层次空间注意力的图文评论情感分析方法[J]. 计算机应用, 2021, 41(10): 2835-2841. DOI: 10.11772/j.issn.1001-9081.2020101676.
[35] 朱亚辉. 基于Bi-LSTM-Attention的英文文本情感分类方法[J]. 电子设计工程, 2022, 30(16): 27-30. DOI: 10.14022/j.issn1674-6236.2022.16.006.
[36] WU Y H, SCHUSTER M, CHEN Z F, et al. Google’s neural machine translation system: bridging the gap between human and machine translation[EB/OL]. (2016-10-08)[2023-02-23]. https://arxiv.org/abs/1609.08144. DOI: 10.48550/arXiv.1609.08144.
[37] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies[C]// 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2014: 960-964. DOI: 10.1109/ICASSP.2014.6853739.
[38] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. (2016-08-12)[2023-02-23]. https://arxiv.org/abs/1606.06259. DOI: 10.48550/arXiv.1606.06259.
[39] ZADEH A, LIANG P P, VANBRIESEN J, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2236-2246. DOI: 10.18653/v1/P18-1208.
[40] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[J]. Proceedings of the AAAI conference on artificial intelligence, 2018, 32(1): 5634-5641. DOI: 10.1609/aaai.v32i1.12021.
[41] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. DOI: 10.18653/v1/P19-1656.
[42] MAI S J, HU H F, XU J, et al. Multi-fusion residual memory network for multimodal human sentiment comprehension[J]. IEEE Transactions on Affective Computing, 2022, 13(1): 320-334. DOI: 10.1109/TAFFC.2020.3000510.
[1] 吴正清, 曹晖, 刘宝锴. 基于注意力卷积神经网络的中文虚假评论检测[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 26-36.
[2] 唐侯清, 辛斌斌, 朱虹谕, 乙加伟, 张冬冬, 武新章, 双丰. 基于多尺度注意力倒残差网络的轴承故障诊断[J]. 广西师范大学学报(自然科学版), 2023, 41(4): 109-122.
[3] 黄叶祺, 王明伟, 闫瑞, 雷涛. 基于改进的YOLOv5金刚石线表面质量检测[J]. 广西师范大学学报(自然科学版), 2023, 41(4): 123-134.
[4] 邓希桢, 蒋明, 岑明灿, 罗玉玲. 基于熵图像静态分析技术的勒索软件分类研究[J]. 广西师范大学学报(自然科学版), 2023, 41(3): 91-104.
[5] 王利娥, 王艺汇, 李先贤. POI推荐中的多源数据融合和隐私保护方法[J]. 广西师范大学学报(自然科学版), 2023, 41(1): 87-101.
[6] 王宇航, 张灿龙, 李志欣, 王智文. 体现用户意图和风格的图像描述生成[J]. 广西师范大学学报(自然科学版), 2022, 40(4): 91-103.
[7] 晁睿, 张坤丽, 王佳佳, 胡斌, 张维聪, 韩英杰, 昝红英. 中文多模态知识库构建[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 31-39.
[8] 李正光, 陈恒, 林鸿飞. 基于双向语言模型的社交媒体药物不良反应识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 40-48.
[9] 孙岩松, 杨亮, 林鸿飞. 基于多粒度的分词消歧和语义增强的情景剧幽默识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 57-65.
[10] 万黎明, 张小乾, 刘知贵, 宋林, 周莹, 李理. 基于高效通道注意力的UNet肺结节CT图像分割[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 66-75.
[11] 张萍, 徐巧枝. 基于多感受野与分组混合注意力机制的肺结节分割研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 76-87.
[12] 孔亚钰, 卢玉洁, 孙中天, 肖敬先, 侯昊辰, 陈廷伟. 面向强化当前兴趣的图神经网络推荐算法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 151-160.
[13] 吴军, 欧阳艾嘉, 张琳. 基于多头注意力机制的磷酸化位点预测模型[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 161-171.
[14] 薛其威, 伍锡如. 基于多模态特征融合的无人驾驶系统车辆检测[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 37-48.
[15] 邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 32-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 董淑龙, 马姜明, 辛文杰. 景观视觉评价研究进展与趋势——基于CiteSpace的知识图谱分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 1 -13 .
[2] 吴正清, 曹晖, 刘宝锴. 基于注意力卷积神经网络的中文虚假评论检测[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 26 -36 .
[3] 梁正友, 蔡俊民, 孙宇, 陈磊. 结合残差动态图卷积与特征强化的点云分类[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 37 -48 .
[4] 欧阳舒歆, 王洺钧, 荣垂田, 孙华波. 基于改进LSTM的多维QAR数据异常检测[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 49 -60 .
[5] 李依洋, 曾才斌, 黄在堂. 分数Brown运动驱动的具有壁附着的恒化器模型的随机吸引子[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 61 -68 .
[6] 李鹏博, 李永祥. 外部区域上p-Laplace方程的径向对称解[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 69 -75 .
[7] 吴子弦, 成军, 符坚铃, 周心雯, 谢佳龙, 宁全. 基于PI的Semi-Markovian电力系统事件触发控制设计分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 76 -85 .
[8] 程蕾, 闫普选, 杜博豪, 叶思, 邹华红. MOF-2的水相合成及其热稳定和介电性能研究[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 86 -95 .
[9] 刘美余, 张进燕, 周童曦, 廖广凤, 杨新洲, 卢汝梅. 匙羹藤中一个新的C21甾体糖苷及其降血糖活性[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 96 -104 .
[10] 王威, 邓华, 胡乐宁, 李杨. 赤泥-海藻酸钠水凝胶对水中Pb(Ⅱ)的吸附性能[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 105 -115 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发