Journal of Guangxi Normal University(Natural Science Edition) ›› 2025, Vol. 43 ›› Issue (3): 57-71.doi: 10.16088/j.issn.1001-6600.2024071702

• Intelligence Information Processing • Previous Articles     Next Articles

A Dissimilarity Feature-Driven Decoupled Multimodal Sentiment Analysis

LI Zhixin1,2*, LIU Mingqi1,2   

  1. 1. Key Lab of Education Blockchain and Intelligent Technology, Mining of Education(Guangxi Normal University), Guilin Guangxi 541004, China;
    2. Guangxi Key Lab of Multi-source Information Mining & Security (Guangxi Normal University), Guilin Guangxi 541004, China
  • Received:2024-07-17 Revised:2024-11-07 Online:2025-05-05 Published:2025-05-14

Abstract: Feature decomposition method decomposes features from different modalities into similarity and dissimilarity features. Due to the decoupled dissimilarity features containing both the diversity and the unique information, they show evident distribution discrepancies. Previous feature decomposition methods have overlooked the inherent contradictions in dissimilarity features, resulting in a decrease in prediction accuracy. To address this issue, a dissimilarity feature-driven decomposition network (DFDDN) for multimodal sentiment analysis is proposed. Firstly, feature extract module is used to extract and amplify features, which not only eliminate visual and audio noise but also facilitate the capture of complementary information between modalities. Secondly, different encoders are used to decouple the features, and a multimodal transformer is used to mitigate the differences in dissimilarity features. Finally, loss functions are used for optimization. Extensive experiments on two widely-used multimodal sentiment analysis datasets demonstrate the accuracy and robustness of this model, transcending SOTA performance.

Key words: multimodal sentiment analysis, feature decomposition, pretraining BERT, representation learning, contrastive learning

CLC Number:  TP391
[1] 周立柱, 贺宇凯, 王建勇. 情感分析研究综述[J]. 计算机应用, 2008, 28(11): 2725-2728.
[2]蔡宇扬, 蒙祖强. 基于模态信息交互的多模态情感分析[J]. 计算机应用研究, 2023, 40(9): 2603-2608. DOI: 10.19734/j.issn.1001-3695.2023.02.0050.
[3]吕学强, 田驰, 张乐, 等. 融合多特征和注意力机制的多模态情感分析模型[J]. 数据分析与知识发现, 2024, 8(5): 91-101. DOI: 10.11925/infotech.2096-3467.2023.0026.
[4]郭嘉梁, 靳婷. 基于语义增强的多模态情感分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 14-25. DOI: 10.16088/j.issn.1001-6600.2023022302.
[5]LI Y Q, ZHANG K, WANG J Y, et al. A cognitive brain model for multimodal sentiment analysis based on attention neural networks[J]. Neurocomputing, 2021, 430: 159-173. DOI: 10.1016/j.neucom.2020.10.021.
[6]ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Information Fusion, 2021, 76: 204-226. DOI: 10.1016/j.inffus.2021.06.003.
[7]LI Z X, PENG Z, TANG S Q, et al. Text summarization method based on double attention pointer network[J]. IEEE Access, 2020, 8: 11279-11288. DOI: 10.1109/ACCESS.2020.2965575.
[8]LIANG T, LIN G S, FENG L, et al. Attention is not enough: mitigating the distribution discrepancy in asynchronous multimodal sequence fusion[C]// 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 8128-8136. DOI: 10.1109/ICCV48922.2021.00804.
[9]LV F M, CHEN X, HUANG Y Y, et al. Progressive modality reinforcement for human multimodal emotion recognition from unaligned multimodal sequences[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 2554-2562. DOI: 10.1109/CVPR46437.2021.00258.
[10]HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2020: 1122-1131. DOI: 10.1145/3394171.3413678.
[11]LIU Z T, WU M, CAO W H, et al. A facial expression emotion recognition based human-robot interaction system[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(4): 668-676. DOI: 10.1109/JAS.2017.7510622.
[12]HUANG B X, CARLEY K M. Syntax-aware aspect level sentiment classification with graph attention networks[EB/OL]. (2019-09-05)[2024-07-17]. https://arxiv.org/abs/1909.02606. DOI: 10.48550/arXiv.1909.02606.
[13]HOU X C, QI P, WANG G T, et al. Graph ensemble learning over multiple dependency trees for aspect-level sentiment classification[EB/OL]. (2021-03-12)[2024-07-17]. https://arxiv.org/abs/2103.11794. DOI: 10.48550/arXiv.2103.11794.
[14]BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2019, 41(2): 423-443. DOI: 10.1109/TPAMI.2018.2798607.
[15]TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. DOI: 10.18653/v1/P19-1656.
[16]LI Y, WANG Y Z, CUI Z. Decoupled multimodal distilling for emotion recognition[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 6631-6640. DOI: 10.1109/CVPR52729.2023.00641.
[17]ZHAO H L, XIAO J Y, XUE Y, et al. Aspect category sentiment classification via document-level GAN and POS information[J]. International Journal of Machine Learning and Cybernetics, 2024, 15(8): 3221-3235. DOI: 10.1007/s13042-023-02089-w.
[18]SUN H, WANG H Y, LIU J Q, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2022: 3722-3729. DOI: 10.1145/3503161.3548025.
[19]WANG D, GUO X T, TIAN Y M, et al. TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259. DOI: 10.1016/j.patcog.2022.109259.
[20]林玩聪, 韩明杰, 靳婷. 基于数据增强的多层次论点立场分类方法[J]. 广西师范大学学报(自然科学版), 2023, 41(6): 62-69. DOI: 10.16088/j.issn.1001-6600.2023052001.
[21]陈开阳, 徐凡, 王明文. 基于知识图谱和图像描述的虚假新闻检测研究[J]. 江西师范大学学报(自然科学版), 2021, 45(4): 398-402. DOI: 10.16357/j.cnki.issn1000-5862.2021.04.12.
[22]方旭东, 王兴芳. 基于注意力机制和对比学习的多模态情感分析[J]. 北京信息科技大学学报(自然科学版), 2024, 39(4): 63-70. DOI: 10.16508/j.cnki.11-5866/n.2024.04.009.
[23]YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[EB/OL]. (2021-02-09)[2024-07-17]. https://arxiv.org/abs/2102.04830. DOI: 10.48550/arXiv.2102.04830.
[24]YU W M, XU H, MENG F Y, et al. CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 3718-3727. DOI: 10.18653/v1/2020.acl-main.343.
[25]文载道, 王佳蕊, 王小旭, 等. 解耦表征学习综述[J]. 自动化学报, 2022, 48(2): 351-374. DOI: 10.16383/j.aas.c210096.
[26]王旭阳, 王常瑞, 张金峰, 等. 基于跨模态交叉注意力网络的多模态情感分析方法[J]. 广西师范大学学报(自然科学版), 2024, 42(2): 84-93. DOI: 10.16088/j.issn.1001-6600.2023052701.
[27]杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述[J]. 软件学报, 2021, 32(2): 327-348. DOI: 10.13328/j.cnki.jos.006125.
[28]赵朝阳, 朱贵波, 王金桥. ChatGPT 给语言大模型带来的启示和多模态大模型新的发展思路[J]. 数据分析与知识发现, 2023, 7(3): 26-35. DOI: 10.11925/infotech.2096-3467.2023.0216.
[29]ZHAO Z X, BAI H W, ZHANG J S, et al. CDDFuse: correlation-driven dual-branch feature decomposition for multi-modality image fusion[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 5906-5916. DOI: 10.1109/CVPR52729.2023.00572.
[30]RUAN D L, YAN Y, LAI S Q, et al. Feature decomposition and reconstruction learning for effective facial expression recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 7656-7665. DOI: 10.1109/CVPR46437.2021.00757.
[31]GUPTA S, HOFFMAN J, MALIK J. Cross modal distillation for supervision transfer[C]// 2016 IEEE Conference on Computer Vision And Pattern Recognition(CRPR). Los Alamitos, CA: IEEE Computer Society, 2016: 2827-2836. DOI: 10.1109/CVPR.2016.309.
[32]YANG J D, YU Y K, NIU D, et al. ConFEDE: contrastive feature decomposition for multimodal sentiment analysis[C]// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2023: 7617-7630. DOI: 10.18653/v1/2023.acl-long.421.
[33]HU G M, LIN T E, ZHAO Y, et al. UniMSE: towards unified multimodal sentiment analysis and emotion recognition[C]// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2022: 7837-7851. DOI: 10.18653/v1/2022.emnlp-main.534.
[34]ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. (2016-08-12)[2024-07-17]. https://arxiv.org/abs/1606.06259. DOI: 10.48550/arXiv.1606.06259.
[35]ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2236-2246. DOI: 10.18653/v1/P18-1208.
[36]ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. DOI: 10.18653/v1/D17-1115.
[37]LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2247-2256. DOI: 10.18653/v1/P18-1209.
[38]ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2018,32(1):5634-5641. DOI: 10.1609/aaai.v32i1.12021.
[39]HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2021: 9180-9192. DOI: 10.18653/v1/2021.emnlp-main.723.
[40]RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 2359-2369. DOI: 10.18653/v1/2020.acl-main.214.
[41]YANG B, SHAO B, WU L J, et al. Multimodal sentiment analysis with unidirectional modality translation[J]. Neurocomputing, 2022, 467: 130-137. DOI: 10.1016/j.neucom.2021.09.041.
[42]MAI S J, ZENG Y, ZHENG S J, et al. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 2276-2289. DOI: 10.1109/TAFFC.2022.3172360.
[43]ZENG Y F, LI Z X, CHEN Z B, et al. A feature-based restoration dynamic interaction network for multimodal sentiment analysis[J]. Engineering Applications of Artificial Intelligence, 2024, 127(Part B): 107335. DOI: 10.1016/j.engappai.2023.107335.
[1] QI Dandan, WANG Changzheng, GUO Shaoru, YAN Zhichao, HU Zhiwei, SU Xuefeng, MA Boxiang, LI Shizhao, LI Ru. Topic-based Multi-view Entity Representation for Zero-Shot Entity Retrieval [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(3): 23-34.
[2] ZHANG Lijie, WANG Shaoqing, ZHANG Yao, SUN Fuzhen. Multi-level Attention Networks and Hierarchical Contrastive Learning for Social Recommendation [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(2): 133-148.
[3] LI Wenbo, DONG Qing, LIU Chao, ZHANG Qi. Fine-grained Intent Recognition from Pediatric Medical Dialogues with Contrastive Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(4): 1-10.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!