Journal of Guangxi Normal University(Natural Science Edition) ›› 2024, Vol. 42 ›› Issue (2): 84-93.doi: 10.16088/j.issn.1001-6600.2023052701

Previous Articles     Next Articles

Multimodal Sentiment Analysis Based on Cross-Modal Cross-Attention Network

WANG Xuyang1*, WANG Changrui1, ZHANG Jinfeng1, XING Mengyi2   

  1. 1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou Gansu 730050, China;
    2. School of Mechanical and Electrical Engineering, Lanzhou University of Technology, Lanzhou Gansu 730050, China
  • Received:2023-05-27 Revised:2023-08-22 Published:2024-04-22

Abstract: Exploiting intra-modal and inter-modal information is helpful for improving the performance of multimodal sen-timent analysis. So, a multimodal sentiment analysis based on cross-modal cross-attention network is proposed. Firstly, VGG-16 network is utilized to map the multimodal data into the global feature space. Simultaneously, the Swin Transformer network is used to map the multimodal data into the local feature space. And the intra-modal self-attention and inter-modal cross-attention features are constructed. Then, a cross-modal cross-attention fusion module is designed to achieve the deep fusion of the intra-modal and inter-modal features, enhancing the represen-tation reliability of the multimodal feature. Finally, the softmax function is used to obtain the results of the sentiment analysis. The experimental results on two open source datasets CMU-MOSI and CMU-MSOEI show that the proposed model can achieve an accuracy of 45.9% and 54.1% respectively in the seven-classification task. Compared with the current classical MCGMF model, the accuracy of the proposed model has improved by 0.66% and 2.46%, and the overall performance improvement is significant.

Key words: sentiment analysis, multimodal, cross-modal cross-attention, self-attention, global and local feature

CLC Number:  TP391.41
[1] YANG B, WU L J, ZHU J H, et al. Multimodal sentiment analysis with two-phase multi-task learning[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 2015-2024. DOI: 10.1109/TASLP.2022.3178204.
[2] 王旭阳, 董帅, 石杰. 复合层次融合的多模态情感分析[J]. 计算机科学与探索, 2023, 17(1): 198-208. DOI: 10.3778/j.issn.1673-9418.2111004.
[3] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182. DOI: 10.3778/j.issn.1673-9418.2012075.
[4] YANG B, SHAO B, WU L J, et al. Multimodal sentiment analysis with unidirectional modality translation[J]. Neurocomputing, 2022, 467: 130-137. DOI: 10.1016/j.neucom.2021.09.041.
[5] 沈剑平, 王轩, 于成龙, 等. 基于语义理解的Bayesian-Boosting情感分类[J]. 广西师范大学学报(自然科学版), 2010, 28(1): 161-164. DOI: 10.16088/j.issn.1001-6600.2010.01.020.
[6] 张峰, 李希城, 董春茹, 等. 基于深度情感唤醒网络的多模态情感分析与情绪识别[J]. 控制与决策, 2022, 37(11): 2984-2992. DOI: 10.13195/j.kzyjc.2021.0782.
[7] YAN X M, XUE H W, JIANG S Y, et al. Multimodal sentiment analysis using multi-tensor fusion network with cross-modal modeling[J]. Applied Artificial Intelligence, 2022, 36(1): 2000688. DOI: 10.1080/08839514.2021.2000688.
[8] 包广斌, 李港乐, 王国雄. 面向多模态情感分析的双模态交互注意力[J]. 计算机科学与探索, 2022, 16(4): 909-916. DOI: 10.3778/j.issn.1673-9418.2105071.
[9] LIU D, CHEN L X, WANG L F, et al. A multi-modal emotion fusion classification method combined expression and speech based on attention mechanism[J]. Multimedia Tools and Applications, 2022, 81(29): 41677-41695. DOI: 10.1007/s11042-021-11260-w.
[10] 缪裕青, 杨爽, 刘同来, 等. 基于跨模态门控机制和改进融合方法的多模态情感分析[J]. 计算机应用研究, 2023, 40(7): 2025-2030, 2038. DOI: 10.19734/j.issn.1001-3695.2022.12.0766.
[11] 李丽, 李平. 基于交互图神经网络的方面级多模态情感分析[J]. 计算机应用研究, 2023, 40(12): 3683-3689. DOI: 10.19734/j.issn.1001-3695.2022.10.0532.
[12] 李文雪, 甘臣权. 基于注意力机制的分层次交互融合多模态情感分析[J]. 重庆邮电大学学报(自然科学版), 2023, 35(1): 176-184. DOI: 10.3979/j.issn.1673-825X.202106300229.
[13] 王靖豪, 刘箴, 刘婷婷, 等. 基于多层次特征融合注意力网络的多模态情感分析[J]. 中文信息学报, 2022, 36(10): 145-154. DOI: 10.3969/j.issn.1003-0077.2022.10.016.
[14] ZHANG F, LI X C, LIM C P, et al. Deep emotional arousal networkfor multimodal sentiment analysis and emotion recognition[J]. Information Fusion, 2022, 88: 296-304. DOI: 10.1016/j.inffus.2022.07.006.
[15] ZHU T, LI L D, YANG J F, et al. Multimodal sentiment analysis with image-text interaction network[J]. IEEE Transactions on Multimedia, 2022, 25: 3375-3385. DOI: 10.1109/TMM.2022.3160060.
[16] WANG D, GUO X T, TIAN Y M, et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259. DOI: 10.1016/j.patcog.2022.109259.
[17] YANG X C, FENG S, WANG D L, et al. Image-text multimodal emotion classification via multi-view attentional network[J]. IEEE Transactions on Multimedia, 2021, 23: 4014-4026. DOI: 10.1109/TMM.2020.3035277.
[18] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariantand-specific representations for multimodal sentiment analysis[C]// MM’20: Proceedings of the 28th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2020: 1122-1131. DOI: 10.1145/3394171.3413678.
[19] TRUONG Q T, LAUW H W. VistaNet: visual aspect attention network for multimodal sentiment analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 305-312. DOI: 10.1609/aaai.v33i01.3301305.
[20] WU Y, ZHANG Z Y, PENG P, et al. Leveraging multi-modal interactions among the intermediate representations of deep transformers for emotion recognition[C]// MuSe’22: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge. New York, NY: Association for Computing Machinery, 2022: 101-109. DOI: 10.1145/3551876.3554813.
[21] LIANG Y, TOHTI T, HAMDULLA A. Multimodal false information detection method basedon Text-CNN and SE module[J]. PLoS ONE, 2022, 17(11): e0277463. DOI: 10.1371/journal.pone.0277463.
[22] BHATTACHARJEE D, ZHANG T, SÜSSTRUNK S, et al. MuIT: an end-to-end multitask learning transformer[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2022: 12021-12031. DOI: 10.1109/CVPR52688.2022.01172.
[23] 张昱, 张海军, 刘雅情, 等. 基于双向掩码注意力机制的多模态情感分析[J]. 数据分析与知识发现, 2023, 7(4): 46-55. DOI: 10.11925/infotech.2096-3467.2022.0151.
[24] 孙岩松, 杨亮, 林鸿飞. 基于多粒度的分词消歧和语义增强的情景剧幽默识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 57-65. DOI: 10.16088/j.issn.1001-6600.2021091505.
[25] SUN Z K, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8992-8999. DOI: 10.1609/aaai.v34i05.6431.
[26] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusionfor correlation-controlled multimodal sentiment analysis[C]// ICMI’21: Proceedings of the 2021 International Conference on Multimodal Interaction. New York, NY: Association for Computing Machinery, 2021: 6-15. DOI: 10.1145/3462244.3479919.
[27] SUN H, WANG H Y, LIU J Q, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]// MM’22: Proceedings of the 30th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2022: 3722-3729. DOI: 10.1145/3503161.3548025.
[1] GUO Jialiang, JIN Ting. Semantic Enhancement-Based Multimodal Sentiment Analysis [J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 14-25.
[2] LIANG Qihua, HU Xiantao, ZHONG Bineng, YU Feng, LI Xianxian. Research Progress of Target Tracking Algorithm Based on Siamese Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(5): 90-103.
[3] CHAO Rui, ZHANG Kunli, WANG Jiajia, HU Bin, ZHANG Weicong, HAN Yingjie, ZAN Hongying. Construction of Chinese Multimodal Knowledge Base [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 31-39.
[4] SUN Yansong, YANG Liang, LIN Hongfei. Humor Recognition of Sitcom Based on Multi-granularity of Segmentation Enhancement and Semantic Enhancement [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 57-65.
[5] MA Xinna, ZHAO Men, QI Lin. Fault Diagnosis Based on Spiking Convolution Neural Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 112-120.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] YUAN Jingjing, ZHENG Yuzhao, XU Chenfeng, YIN Tingjie. Advances in Cytoplasmic Delivery Strategies for Non-Endocytosis-Dependent Biomolecules[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(1): 1 -8 .
[2] TU Guangsheng, KONG Yongjun, SONG Zhechao, YE Kang. Research Progress and Technical Difficulties of Reversible Data Hiding in Encrypted Domain[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 1 -15 .
[3] YANG Yangyang, ZHU Zhenting, YANG Cuiping, LI Shihao, ZHANG Shu, FAN Xiulei, WAN Lei. Research Progress of Anaerobic Digestion Pretreatment of Excess Activated Sludge Based on Bibliometric Analysis[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 16 -29 .
[4] XU Lunhui, LI Jinlong, LI Ruonan, CHEN Junyu. Missing Traffic Data Recovery for Road Network Based on Dynamic Generative Adversarial Network[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 30 -40 .
[5] YANG Hai, XIE Yaqin. Regional Energy Storage Allocation Strategy of 5G Base Station Based on Floyd Algorithm[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 41 -54 .
[6] YAN Wenwen, WEN Zhong, WANG Shuang, LI Guoxiang, WANG Boyu, WU Yi. AA-CAES Plant and Integrated Demand Response Based Wind Abandonment and Consumption Strategy for the Heating Period[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 55 -68 .
[7] GAN Youchun, WANG Can, HE Xuhui, ZHANG Yu, ZHANG Xuefei, WANG Fan, YU Yazhou. Joint Optimal Operation of Integrated Electricity-Hydrogen-Heat Energy System Based on Concentrating Solar Power Plant and Flexible Load[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 69 -83 .
[8] WANG Weiduo, WANG Yisong, YANG Lei. Descriptive Solution of the Answer Set Programming for Cloud Resource Scheduling[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 94 -104 .
[9] YU Qian, CHEN Qingfeng, HE Naixu, HAN Zongzhao, LU Jiahui. Genetic Algorithm for Community Detection Accelerated by Matrix Operations[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 105 -119 .
[10] LONG Fang, CAI Jing, ZHU Yan. Analysis of Reliability in a Multicomponent Stress-Strength Model for Lomax Distribution under Progressive type-Ⅱ Hybrid Censoring[J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 120 -130 .