广西师范大学学报(自然科学版) ›› 2025, Vol. 43 ›› Issue (4): 97-107.doi: 10.16088/j.issn.1001-6600.2024081301

• 智能信息处理 • 上一篇    下一篇

基于跨模态增强网络的时序多模态情感分析

王旭阳*, 章家瑜   

  1. 兰州理工大学 计算机与通信学院, 甘肃 兰州 730050
  • 收稿日期:2024-08-13 修回日期:2024-12-20 出版日期:2025-07-05 发布日期:2025-07-14
  • 通讯作者: 王旭阳(1974—),女,甘肃兰州人,兰州理工大学教授。E-mail: wxuyang126@126.com
  • 基金资助:
    国家自然科学基金(62161019)

Temporal Multimodal Sentiment Analysis with Cross-Modal Augmentation Networks

WANG Xuyang*, ZHANG Jiayu   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou Gansu 730050, China
  • Received:2024-08-13 Revised:2024-12-20 Online:2025-07-05 Published:2025-07-14

摘要: 针对多模态情感分析中存在的模态间交互性差、时序性考虑不充分以及模态重要性不同等问题,本文提出一种基于跨模态增强网络的时序多模态情感分析框架(TCAN-SA)。首先,通过模态间交互模块增强各模态之间的信息交流;其次,引入双向时域卷积网络(BiTCN)层,以捕捉模态信息的时序特征;最后,采用多模态门控模块来平衡模态间的重要性差异。实验结果表明,该框架在公开数据集CMU-MOSI和CMU-MOSEI上表现优异,相较于现有模型,性能更为突出。

关键词: 时域卷积, 多模态情感分析, 多模态融合, 门控单元, Transformer

Abstract: To address the issues of poor inter-modal interaction, insufficient consideration of temporal order, and varying importance of modalities in multimodal sentiment analysis, a temporal multimodal sentiment analysis framework based on a cross-modal augmentation network (TCAN-SA) is proposed in this paper. Firstly, the inter-modal interaction module enhances the information exchange between modalities. Secondly, a bidirectional temporal convolutional network (BiTCN) layer is introduced to capture the temporal characteristics of the modal information. Finally, a multimodal gating module is employed to balance the varying importance among modalities. Experimental results demonstrate that the framework performs well on two public datasets, CMU-MOSI and CMU-MOSEI, and outperforms other existing models.

Key words: time-domain convolution, multimodal sentiment analysis, multimodal fusion, gated unit, Transformer

中图分类号:  TP391

[1] 刘佳, 宋泓, 陈大鹏, 等. 非语言信息增强和对比学习的多模态情感分析模型[J]. 电子与信息学报, 2024,46(8): 3372-3381. DOI: 10.11999/JEIT231274.
[2] 王旭阳, 董帅, 石杰. 复合层次融合的多模态情感分析[J]. 计算机科学与探索, 2023, 17(1): 198-208. DOI: 10.3778/j.issn.1673-9418.2111004.
[3] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438. DOI: 10.16451/j.cnki.issn1003-6059.202005005.
[4] 卢婵, 郭军军, 谭凯文, 等. 基于文本指导的层级自适应融合的多模态情感分析[J]. 山东大学学报(理学版), 2023, 58(12): 31-40, 51. DOI: 10.6040/j.issn.1671-9352.1.2022.421.
[5] CHAUHAN D S, AKHTAR M S, EKBAL A, et al. Context-aware interactive attention for multi-modal sentiment and emotion analysis[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 5647-5657. DOI: 10.18653/v1/D19-1566.
[6] XU G X, MENG Y T, QIU X Y, et al. Sentiment analysis of comment texts based on BiLSTM[J]. IEEE Access, 2019, 7: 51522-51532. DOI: 10.1109/ACCESS.2019.2909919.
[7] BAI Z W, CHEN X H, ZHOU M L, et al. Low-rank multimodal fusion algorithm based on context modeling[J]. Journal of Internet Technology, 2021, 22(4): 913-921. DOI: 10.53106/160792642021072204018.
[8] 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503. DOI: 10.3778/j.issn.1673-9418.2112081.
[9] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5642-5649. DOI: 10.1609/aaai.v32i1.12024.
[10] LI X, CHEN M P. Multimodal sentiment analysis with multi-perspective fusion network focusing on sense attentive language[C]// Chinese Computational Linguistics. Cham: Springer International Publishing, 2020: 359-373. DOI: 10.1007/978-3-030-63031-7_26.
[11] 田昌宁, 贺昱政, 王笛, 等. 基于Transformer的多子空间多模态情感分析[J]. 西北大学学报(自然科学版), 2024, 54(2): 156-167. DOI: 10.16152/j.cnki.xdxbzr.2024-02-002.
[12] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]// Proceedings of the 2021 international conference on multimodal interaction. New York, NY: Association for Computing Machinery, 2021: 6-15. DOI: 10.1145/3462244.3479919.
[13] 王旭阳, 王常瑞, 张金峰, 等. 基于跨模态交叉注意力网络的多模态情感分析方法[J]. 广西师范大学学报(自然科学版), 2024,42(2): 84-93. DOI: 10.16088/j.issn.1001-6600.2023052701.
[14] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. DOI: 10.18653/v1/D17-1115.
[15] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2247-2256. DOI: 10.18653/v1/P18-1209.
[16] 陈岩松, 张乐, 张雷瀚, 等. 基于跨模态注意力和门控单元融合网络的多模态情感分析方法[J]. 数据分析与知识发现, 2024, 8(7): 67-76. DOI: 10.11925/infotech.2096-3467.2023.0591.
[17] 缪裕青, 杨爽, 刘同来, 等. 基于跨模态门控机制和改进融合方法的多模态情感分析[J]. 计算机应用研究, 2023,40(7): 2025-2030, 2038. DOI: 10.19734/j.issn.1001-3695.2022.12.0766.
[18] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and -specific representations for multimodal sentiment analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2020: 1122-1131. DOI: 10.1145/3394171.3413678.
[19] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. DOI: 10.18653/v1/P19-1656.
[20] YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(12): 10790-10797. DOI: 10.1609/aaai.v35i12.17289.
[21] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[22] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies[C]// 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Los Alamitos, CA: IEEE Computer Society, 2014: 960-964. DOI: 10.1109/ICASSP.2014.6853739.
[23] EKMAN P, ROSENBERG E L. What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS)[M]. 2nd ed. New York: Oxford University Press, 2005. DOI: 10.1093/acprof:oso/9780195179644.001.0001.
[24] SUN H, LIU J Q, CHEN Y W, et al. Modality-invariant temporal representation learning for multimodal sentiment classification[J]. Information Fusion, 2023, 91: 504-514. DOI: 10.1016/j.inffus.2022.10.031.
[25] YU Z, YU J, FAN J P, et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2017: 1839-1848. DOI: 10.1109/ICCV.2017.202.
[26] BAI S J, KOLTER J Z, KOLTUN V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[EB/OL]. (2018-04-19)[2024-08-13]. https://arxiv.org/abs/1803.01271. DOI: 10.48550/arXiv.1803.01271.
[27] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. (2016-08-12)[2024-08-13]. https://arxiv.org/abs/1606.06259. DOI: 10.48550/arXiv.1606.06259.
[28] ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2236-2246. DOI: 10.18653/v1/P18-1208.
[29] WANG Y S, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting Word representations using nonverbal behaviors[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7216-7223. DOI: 10.1609/aaai.v33i01.33017216.
[30] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2021: 9180-9192. DOI: 10.18653/v1/2021.emnlp-main.723.
[31] SUN H, WANG H Y, LIU J Q, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]// Proceedings of the 30th ACM international conference on multimedia. New York, NY: Association for Computing Machinery, 2022: 3722-3729. DOI: 10.1145/3503161.3548025.
[32] ZHU C B, CHEN M, ZHANG S, et al. SKEAFN: sentiment knowledge enhanced attention fusion network for multimodal sentiment analysis[J]. Information Fusion, 2023, 100: 101958. DOI: 10.1016/j.inffus.2023.101958.
[1] 韩烁, 江林峰, 杨建斌. 基于注意力机制PINNs方法求解圣维南方程[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 58-68.
[2] 李志欣, 刘鸣琦. 差异特征导向的解耦多模态情感分析[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 57-71.
[3] 赵伟, 田帅, 张强, 王耀申, 王思博, 宋江. 基于改进YOLOv5的平贝母检测模型[J]. 广西师范大学学报(自然科学版), 2023, 41(6): 22-32.
[4] 孙旭, 沈彬, 严馨, 张金鹏, 徐广义. 基于Transformer和TextRank的微博观点摘要方法[J]. 广西师范大学学报(自然科学版), 2023, 41(4): 96-108.
[5] 晁睿, 张坤丽, 王佳佳, 胡斌, 张维聪, 韩英杰, 昝红英. 中文多模态知识库构建[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 31-39.
[6] 薛其威, 伍锡如. 基于多模态特征融合的无人驾驶系统车辆检测[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 37-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 何安康, 陈艳平, 扈应, 黄瑞章, 秦永彬. 融合边界交互信息的命名实体识别方法[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 1 -11 .
[2] 卢展跃, 陈艳平, 杨卫哲, 黄瑞章, 秦永彬. 基于掩码注意力与多特征卷积网络的关系抽取方法[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 12 -22 .
[3] 齐丹丹, 王长征, 郭少茹, 闫智超, 胡志伟, 苏雪峰, 马博翔, 李时钊, 李茹. 基于主题多视图表示的零样本实体检索方法[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 23 -34 .
[4] 黄川洋, 程灿儿, 李松威, 陈鸿东, 张秋楠, 张钊, 邵来鹏, 唐剑, 王咏梅, 郭奎奎, 陆航林, 胡君辉. 带涂覆层的长周期光纤光栅温度传感特性研究[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 35 -42 .
[5] 田晟, 熊辰崟, 龙安洋. 基于改进PointNet++的城市道路点云分类方法[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 1 -14 .
[6] 黎宗孝, 张健, 罗鑫悦, 赵嶷飞, 卢飞. 基于K-means和Adam-LSTM的机场进场航迹预测研究[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 15 -23 .
[7] 宋铭楷, 朱成杰. 基于H-WOA-GWO和区段修正策略的配电网故障定位研究[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 24 -37 .
[8] 陈禹, 陈磊, 张怡, 张志瑞. 基于QMD-LDBO-BiGRU的风速预测模型[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 38 -57 .
[9] 韩烁, 江林峰, 杨建斌. 基于注意力机制PINNs方法求解圣维南方程[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 58 -68 .
[10] 李志欣, 匡文兰. 结合互注意力空间自适应和特征对集成判别的细粒度图像分类[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 69 -82 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发