Journal of Guangxi Normal University(Natural Science Edition) ›› 2025, Vol. 43 ›› Issue (4): 97-107.doi: 10.16088/j.issn.1001-6600.2024081301

• Intelligence Information Processing • Previous Articles     Next Articles

Temporal Multimodal Sentiment Analysis with Cross-Modal Augmentation Networks

WANG Xuyang*, ZHANG Jiayu   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou Gansu 730050, China
  • Received:2024-08-13 Revised:2024-12-20 Online:2025-07-05 Published:2025-07-14

Abstract: To address the issues of poor inter-modal interaction, insufficient consideration of temporal order, and varying importance of modalities in multimodal sentiment analysis, a temporal multimodal sentiment analysis framework based on a cross-modal augmentation network (TCAN-SA) is proposed in this paper. Firstly, the inter-modal interaction module enhances the information exchange between modalities. Secondly, a bidirectional temporal convolutional network (BiTCN) layer is introduced to capture the temporal characteristics of the modal information. Finally, a multimodal gating module is employed to balance the varying importance among modalities. Experimental results demonstrate that the framework performs well on two public datasets, CMU-MOSI and CMU-MOSEI, and outperforms other existing models.

Key words: time-domain convolution, multimodal sentiment analysis, multimodal fusion, gated unit, Transformer

CLC Number:  TP391
[1] 刘佳, 宋泓, 陈大鹏, 等. 非语言信息增强和对比学习的多模态情感分析模型[J]. 电子与信息学报, 2024,46(8): 3372-3381. DOI: 10.11999/JEIT231274.
[2] 王旭阳, 董帅, 石杰. 复合层次融合的多模态情感分析[J]. 计算机科学与探索, 2023, 17(1): 198-208. DOI: 10.3778/j.issn.1673-9418.2111004.
[3] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438. DOI: 10.16451/j.cnki.issn1003-6059.202005005.
[4] 卢婵, 郭军军, 谭凯文, 等. 基于文本指导的层级自适应融合的多模态情感分析[J]. 山东大学学报(理学版), 2023, 58(12): 31-40, 51. DOI: 10.6040/j.issn.1671-9352.1.2022.421.
[5] CHAUHAN D S, AKHTAR M S, EKBAL A, et al. Context-aware interactive attention for multi-modal sentiment and emotion analysis[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 5647-5657. DOI: 10.18653/v1/D19-1566.
[6] XU G X, MENG Y T, QIU X Y, et al. Sentiment analysis of comment texts based on BiLSTM[J]. IEEE Access, 2019, 7: 51522-51532. DOI: 10.1109/ACCESS.2019.2909919.
[7] BAI Z W, CHEN X H, ZHOU M L, et al. Low-rank multimodal fusion algorithm based on context modeling[J]. Journal of Internet Technology, 2021, 22(4): 913-921. DOI: 10.53106/160792642021072204018.
[8] 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503. DOI: 10.3778/j.issn.1673-9418.2112081.
[9] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5642-5649. DOI: 10.1609/aaai.v32i1.12024.
[10] LI X, CHEN M P. Multimodal sentiment analysis with multi-perspective fusion network focusing on sense attentive language[C]// Chinese Computational Linguistics. Cham: Springer International Publishing, 2020: 359-373. DOI: 10.1007/978-3-030-63031-7_26.
[11] 田昌宁, 贺昱政, 王笛, 等. 基于Transformer的多子空间多模态情感分析[J]. 西北大学学报(自然科学版), 2024, 54(2): 156-167. DOI: 10.16152/j.cnki.xdxbzr.2024-02-002.
[12] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]// Proceedings of the 2021 international conference on multimodal interaction. New York, NY: Association for Computing Machinery, 2021: 6-15. DOI: 10.1145/3462244.3479919.
[13] 王旭阳, 王常瑞, 张金峰, 等. 基于跨模态交叉注意力网络的多模态情感分析方法[J]. 广西师范大学学报(自然科学版), 2024,42(2): 84-93. DOI: 10.16088/j.issn.1001-6600.2023052701.
[14] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. DOI: 10.18653/v1/D17-1115.
[15] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2247-2256. DOI: 10.18653/v1/P18-1209.
[16] 陈岩松, 张乐, 张雷瀚, 等. 基于跨模态注意力和门控单元融合网络的多模态情感分析方法[J]. 数据分析与知识发现, 2024, 8(7): 67-76. DOI: 10.11925/infotech.2096-3467.2023.0591.
[17] 缪裕青, 杨爽, 刘同来, 等. 基于跨模态门控机制和改进融合方法的多模态情感分析[J]. 计算机应用研究, 2023,40(7): 2025-2030, 2038. DOI: 10.19734/j.issn.1001-3695.2022.12.0766.
[18] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and -specific representations for multimodal sentiment analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2020: 1122-1131. DOI: 10.1145/3394171.3413678.
[19] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. DOI: 10.18653/v1/P19-1656.
[20] YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(12): 10790-10797. DOI: 10.1609/aaai.v35i12.17289.
[21] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[22] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies[C]// 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Los Alamitos, CA: IEEE Computer Society, 2014: 960-964. DOI: 10.1109/ICASSP.2014.6853739.
[23] EKMAN P, ROSENBERG E L. What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS)[M]. 2nd ed. New York: Oxford University Press, 2005. DOI: 10.1093/acprof:oso/9780195179644.001.0001.
[24] SUN H, LIU J Q, CHEN Y W, et al. Modality-invariant temporal representation learning for multimodal sentiment classification[J]. Information Fusion, 2023, 91: 504-514. DOI: 10.1016/j.inffus.2022.10.031.
[25] YU Z, YU J, FAN J P, et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2017: 1839-1848. DOI: 10.1109/ICCV.2017.202.
[26] BAI S J, KOLTER J Z, KOLTUN V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[EB/OL]. (2018-04-19)[2024-08-13]. https://arxiv.org/abs/1803.01271. DOI: 10.48550/arXiv.1803.01271.
[27] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. (2016-08-12)[2024-08-13]. https://arxiv.org/abs/1606.06259. DOI: 10.48550/arXiv.1606.06259.
[28] ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2236-2246. DOI: 10.18653/v1/P18-1208.
[29] WANG Y S, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting Word representations using nonverbal behaviors[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7216-7223. DOI: 10.1609/aaai.v33i01.33017216.
[30] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2021: 9180-9192. DOI: 10.18653/v1/2021.emnlp-main.723.
[31] SUN H, WANG H Y, LIU J Q, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]// Proceedings of the 30th ACM international conference on multimedia. New York, NY: Association for Computing Machinery, 2022: 3722-3729. DOI: 10.1145/3503161.3548025.
[32] ZHU C B, CHEN M, ZHANG S, et al. SKEAFN: sentiment knowledge enhanced attention fusion network for multimodal sentiment analysis[J]. Information Fusion, 2023, 100: 101958. DOI: 10.1016/j.inffus.2023.101958.
[1] HAN Shuo, JIANG Linfeng, YANG Jianbin. Attention-based PINNs Method for Solving Saint-Venant Equations [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 58-68.
[2] LI Zhixin, LIU Mingqi. A Dissimilarity Feature-Driven Decoupled Multimodal Sentiment Analysis [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 57-71.
[3] ZHAO Wei, TIAN Shuai, ZHANG Qiang, WANG Yaoshen, WANG Sibo, SONG Jiang. Fritillaria ussuriensis Maxim Detection Model Based on Improved YOLOv5 [J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(6): 22-32.
[4] SUN Xu, SHEN Bin, YAN Xin, ZHANG Jinpeng, XU Guangyi. Microblog Opinion Summarization Method Based on Transformer and TextRank [J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(4): 96-108.
[5] CHAO Rui, ZHANG Kunli, WANG Jiajia, HU Bin, ZHANG Weicong, HAN Yingjie, ZAN Hongying. Construction of Chinese Multimodal Knowledge Base [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 31-39.
[6] LIN Yue, LIU Tingzhang, HUANG Lirong, XI Xiaoye, PAN Jian. Anomalous State Detection of Power Transformer Basedon Bidirectional KL Distance Clustering Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 20-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] HE Ankang, CHEN Yanping, HU Ying, HUANG Ruizhang, QIN Yongbin. Fusing Boundary Interaction Information for Named Entity Recognition[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 1 -11 .
[2] LU Zhanyue, CHEN Yanping, YANG Weizhe, HUANG Ruizhang, QIN Yongbin. Relational Extraction Method Based on Mask Attention and Multi-feature Convolutional Networks[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 12 -22 .
[3] QI Dandan, WANG Changzheng, GUO Shaoru, YAN Zhichao, HU Zhiwei, SU Xuefeng, MA Boxiang, LI Shizhao, LI Ru. Topic-based Multi-view Entity Representation for Zero-Shot Entity Retrieval[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 23 -34 .
[4] HUANG Chuanyang, CHENG Can’er, LI Songwei, CHENG Hongdong, ZHANG Qiunan, ZHANG Zhao, SHAO Laipeng, TANG Jian, WANG Yongmei, GUO Kuikui, LU Hanglin, HU Junhui. Study on Temperature Sensing Characteristics of Long Period Fiber Grating with Coating Layer[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 35 -42 .
[5] TIAN Sheng, XIONG Chenyin, LONG Anyang. Point Cloud Classification Method of Urban Roads Based on Improved PointNet++[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 1 -14 .
[6] LI Zongxiao, ZHANG Jian, LUO Xinyue, ZHAO Yifei, LU Fei. Research on Arrival Trajectory Prediction Based on K-means and Adam-LSTM[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 15 -23 .
[7] SONG Mingkai, ZHU Chengjie. Research on Fault Location of Distribution Network Based on H-WOA-GWO and Region Correction Strategies[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 24 -37 .
[8] CHEN Yu, CHEN Lei, ZHANG Yi, ZHANG Zhirui. Wind Speed Prediction Model Based on QMD-LDBO-BiGRU[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 38 -57 .
[9] HAN Shuo, JIANG Linfeng, YANG Jianbin. Attention-based PINNs Method for Solving Saint-Venant Equations[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 58 -68 .
[10] LI Zhixin, KUANG Wenlan. Fine-grained Image Classification Combining Adaptive Spatial Mutual Attention and Feature Pair Integration Discrimination[J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 69 -82 .