|
|
广西师范大学学报(自然科学版) ›› 2026, Vol. 44 ›› Issue (1): 91-101.doi: 10.16088/j.issn.1001-6600.2025040903
王旭阳*, 马瑾
WANG Xuyang*, MA Jin
摘要: 在多模态情感分析任务中,由于非语言模态信息利用不充分、跨模态交互缺乏细粒度关联建模以及层次化语义融合机制不完善,导致不同模态之间的情感信息难以实现有效融合。为此,本文提出一种跨模态特征增强与层次化MLP通信的多模态情感分析方法。该方法构建渐进式融合架构,首先通过跨模态注意力机制增强非语言模态信息,捕捉多对多的跨模态细粒度交互;继而使用层次化MLP通信模块,在模态融合维度与时间建模维度上分别设计并行与堆叠的MLP模块,实现水平与垂直方向的层次化特征交互,有效提升情感理解的准确性与表达能力。实验结果表明,本文模型在CMU-MOSI上,Acc2和F1值较次优模型分别提升0.89和0.77个百分点,在CMU-MOSEI上对比实验各项指标均优于基准模型,Acc2、F1值分别达到86.34%、86.25%。
中图分类号: TP391.1
| [1] 李梦云, 张景, 张换香, 等. 基于跨模态语义信息增强的多模态情感分析[J]. 计算机科学与探索, 2024, 18(9): 2476-2486. DOI: 10.3778/j.issn.1673-9418.2307045. [2] 王旭阳, 章家瑜. 基于跨模态增强网络的时序多模态情感分析[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 97-107. DOI: 10.16088/j.issn.1001-6600.2024081301. [3] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5642-5649. DOI: 10.1609/aaai.v32i1.12024. [4] 吴俊洁, 王佳阳, 朱萍, 等. 基于MLP和多头自注意力特征融合的双模态情感计算模型[J]. 计算机应用, 2024, 44(S1): 39-43. DOI: 10.11772/j.issn.1001-9081.2023091295. [5] 林宜山, 左景, 卢树华. 基于音视频特征优化与跨模态Transformer的多模态情感分析[J/OL]. 北京航空航天大学学报:1-13[2025-04-09]. https://doi.org/10.13700/j.bh.1001-5965.2024.0247. DOI: 10.13700/j.bh.1001-5965.2024.0247. [6] LIU C, WANG Y, YANG J. A transformer-encoder-based multimodal multi-attention fusion network for sentimentanalysis[J]. Applied Intelligence, 2024, 54(17/18): 8415-8441. DOI: 10.1007/s10489-024-05623-7. [7] 李文潇, 梅红岩, 李雨恬. 基于深度学习的多模态情感分析研究综述[J]. 辽宁工业大学学报(自然科学版), 2022, 42(5): 293-298. DOI: 10.15916/j.issn1674-3261.2022.05.003. [8] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182. DOI: 10.3778/j.issn.1673-9418.2012075. [9] GARG A, PAVLOVIC V, REHG J M. Boosted learning in dynamic Bayesian networks for multimodal speaker detection[J]. Proceedings of the IEEE, 2003, 91(9): 1355-1369. DOI: 10.1109/JPROC.2003.817119. [10] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. DOI: 10.18653/v1/D17-1115. [11] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5634-5641. DOI: 10.1609/aaai.v32i1.12021. [12] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. DOI: 10.18653/v1/P19-1656. [13] 高玮军, 孙子博, 刘书君. 基于多视角的图像文本情感分析[J]. 计算机科学, 2024, 51(S2): 138-145. DOI: 10.11896/jsjkx.231200163. [14] 卢婵, 郭军军, 谭凯文, 等. 基于文本指导的层级自适应融合的多模态情感分析[J]. 山东大学学报(理学版), 2023, 58(12): 31-40. DOI: 10.6040/j.issn.1671-9352.1.2022.421. [15] HUANG J, JI Y L, QIN Z, et al. Dominant single-modal supplementary fusion (SIMSUF) for multimodal sentiment analysis[J]. IEEE Transactions on Multimedia, 2024, 26: 8383-8394. DOI: 10.1109/TMM.2023.3344358. [16] 欧阳梦妮, 樊小超, 帕力旦·吐尔逊. 基于目标对齐和语义过滤的多模态情感分析[J]. 计算机技术与发展, 2024, 34(10): 171-177. DOI: 10.20165/j.cnki.ISSN1673-629X.2024.0209. [17] 谢润锋, 张博超, 杜永萍. 基于视觉语言模型的跨模态多级融合情感分析方法[J]. 模式识别与人工智能, 2024, 37(5): 459-468. DOI: 10.16451/j.cnki.issn1003-6059.202405007. [18] KE P, JI H Z, LIU S Y, et al. SentiLARE: sentiment-aware language representation learning with linguistic knowledge[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: Association for Computational Linguistics, 2020: 6975-6988. DOI: 10.18653/v1/2020.emnlp-main.567. [19] XU H, LIU B, SHU L, et al. BERT post-training for review reading comprehension and aspect-based sentiment analysis[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2324-2335. DOI: 10.18653/v1/N19-1242. [20] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI:multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. (2016-08-12)[2025-04-09]. https://arxiv.org/abs/1606.06259. DOI: 10.48550/arXiv.1606.06259. [21] BAGHER ZADEH A, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2236-2246. DOI: 10.18653/v1/P18-1208. [22] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies[C]//2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2014: 960-964. DOI: 10.1109/ICASSP.2014.6853739. [23] CHEONG J H, JOLLY E, XIE T K, et al. Py-Feat: Python facial expression analysis toolbox[J]. Affective Science, 2023, 4(4): 781-796. DOI: 10.1007/s42761-023-00191-4. [24] EKMAN P, ROSENBERG E L. What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS)[M]. 2nd ed. New York: Oxford University Press, 2005. DOI: 10.1093/acprof:oso/9780195179644.001.0001. [25] LIN H, ZHANG P L, LING J D, et al. PS-Mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis[J]. Information Processing & Management, 2023, 60(2): 103229. DOI: 10.1016/j.ipm.2022.103229. [26] SRIVASTAVA R K, GREFF K, SCHMIDHUBER J. Highway networks[EB/OL]. (2015-11-03)[2025-04-09]. http://arxiv.org/abs/1505.00387. DOI: 10.48550/arXiv.1505.00387. [27] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423. [28] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2247-2256. DOI: 10.18653/v1/P18-1209. [29] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2020: 1122-1131. DOI: 10.1145/3394171.3413678. [30] YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(12): 10790-10797. DOI: 10.1609/aaai.v35i12.17289. [31] YANG B, WU L J, ZHU J H, et al. Multimodal sentiment analysis with two-phase multi-task learning[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 2015-2024. DOI: 10.1109/TASLP.2022.3178204. [32] SUN L C, LIAN Z, LIU B, et al. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2024, 15(1): 309-325. DOI: 10.1109/TAFFC.2023.3274829. [33] WANG Y F, HE J H, WANG D, et al. Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis[J]. Neurocomputing, 2024, 572: 127181. DOI: 10.1016/j.neucom.2023.127181. [34] LI X, ZHANG H J, DONG Z Q, et al. Learning fine-grained representation with token-level alignment for multimodal sentiment analysis[J]. Expert Systems with Applications, 2025, 269: 126274. DOI: 10.1016/j.eswa.2024.126274. |
| [1] | 施子豪, 蒙祖强, 谈超洪. 基于注意力机制和多尺度融合的多模态虚假新闻检测模型[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 68-79. |
| [2] | 王旭阳, 章家瑜. 基于跨模态增强网络的时序多模态情感分析[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 97-107. |
| [3] | 李志欣, 刘鸣琦. 差异特征导向的解耦多模态情感分析[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 57-71. |
| [4] | 王旭阳, 王常瑞, 张金峰, 邢梦怡. 基于跨模态交叉注意力网络的多模态情感分析方法[J]. 广西师范大学学报(自然科学版), 2024, 42(2): 84-93. |
| [5] | 郭嘉梁, 靳婷. 基于语义增强的多模态情感分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 14-25. |
| [6] | 梁启花, 胡现韬, 钟必能, 于枫, 李先贤. 基于孪生网络的目标跟踪算法研究进展[J]. 广西师范大学学报(自然科学版), 2022, 40(5): 90-103. |
| [7] | 杜锦丰, 王海荣, 梁焕, 王栋. 基于表示学习的跨模态检索方法研究进展[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 1-12. |
| [8] | 晁睿, 张坤丽, 王佳佳, 胡斌, 张维聪, 韩英杰, 昝红英. 中文多模态知识库构建[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 31-39. |
| [9] | 孙岩松, 杨亮, 林鸿飞. 基于多粒度的分词消歧和语义增强的情景剧幽默识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 57-65. |
| [10] | 马新娜, 赵猛, 祁琳. 基于卷积脉冲神经网络的故障诊断方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 112-120. |
| [11] | 薛其威, 伍锡如. 基于多模态特征融合的无人驾驶系统车辆检测[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 37-48. |
|
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |