|
|
广西师范大学学报(自然科学版) ›› 2025, Vol. 43 ›› Issue (3): 57-71.doi: 10.16088/j.issn.1001-6600.2024071702
李志欣1,2*, 刘鸣琦1,2
LI Zhixin1,2*, LIU Mingqi1,2
摘要: 特征解耦能够将不同模态特征解耦为相似特征和差异特征,以缓和模态间的贡献度差异。但由于差异特征不仅包含互补信息,同时也包含一致信息,因此差异特征存在显著分布差异。传统特征解耦方法忽视了差异特征内在的冲突,从而导致预测不准确。为了解决这一问题,本文提出一种差异特征导向的解耦多模态情感分析方法,利用特征表示学习和对比学习的思想,提取更为有效的特征并扩大差异特征间的差异。首先部署一个特征提取模块,针对3种模态使用不同的特征提取方法以提取到更为有效的特征;其次使用共同编码器与独立编码器解耦3种模态特征,并使用一个多模态变压器进行特征融合;最后,为了扩大差异特征间的差异,设计用于优化的损失函数。在2个大规模基准数据集上进行实验,并与多个当前先进方法进行比较,在绝大部分指标上都超越当前先进方法,验证了本文方法的有效性与鲁棒性。
中图分类号: TP391
| [1] 周立柱, 贺宇凯, 王建勇. 情感分析研究综述[J]. 计算机应用, 2008, 28(11): 2725-2728. [2]蔡宇扬, 蒙祖强. 基于模态信息交互的多模态情感分析[J]. 计算机应用研究, 2023, 40(9): 2603-2608. DOI: 10.19734/j.issn.1001-3695.2023.02.0050. [3]吕学强, 田驰, 张乐, 等. 融合多特征和注意力机制的多模态情感分析模型[J]. 数据分析与知识发现, 2024, 8(5): 91-101. DOI: 10.11925/infotech.2096-3467.2023.0026. [4]郭嘉梁, 靳婷. 基于语义增强的多模态情感分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 14-25. DOI: 10.16088/j.issn.1001-6600.2023022302. [5]LI Y Q, ZHANG K, WANG J Y, et al. A cognitive brain model for multimodal sentiment analysis based on attention neural networks[J]. Neurocomputing, 2021, 430: 159-173. DOI: 10.1016/j.neucom.2020.10.021. [6]ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Information Fusion, 2021, 76: 204-226. DOI: 10.1016/j.inffus.2021.06.003. [7]LI Z X, PENG Z, TANG S Q, et al. Text summarization method based on double attention pointer network[J]. IEEE Access, 2020, 8: 11279-11288. DOI: 10.1109/ACCESS.2020.2965575. [8]LIANG T, LIN G S, FENG L, et al. Attention is not enough: mitigating the distribution discrepancy in asynchronous multimodal sequence fusion[C]// 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 8128-8136. DOI: 10.1109/ICCV48922.2021.00804. [9]LV F M, CHEN X, HUANG Y Y, et al. Progressive modality reinforcement for human multimodal emotion recognition from unaligned multimodal sequences[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 2554-2562. DOI: 10.1109/CVPR46437.2021.00258. [10]HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2020: 1122-1131. DOI: 10.1145/3394171.3413678. [11]LIU Z T, WU M, CAO W H, et al. A facial expression emotion recognition based human-robot interaction system[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(4): 668-676. DOI: 10.1109/JAS.2017.7510622. [12]HUANG B X, CARLEY K M. Syntax-aware aspect level sentiment classification with graph attention networks[EB/OL]. (2019-09-05)[2024-07-17]. https://arxiv.org/abs/1909.02606. DOI: 10.48550/arXiv.1909.02606. [13]HOU X C, QI P, WANG G T, et al. Graph ensemble learning over multiple dependency trees for aspect-level sentiment classification[EB/OL]. (2021-03-12)[2024-07-17]. https://arxiv.org/abs/2103.11794. DOI: 10.48550/arXiv.2103.11794. [14]BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2019, 41(2): 423-443. DOI: 10.1109/TPAMI.2018.2798607. [15]TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. DOI: 10.18653/v1/P19-1656. [16]LI Y, WANG Y Z, CUI Z. Decoupled multimodal distilling for emotion recognition[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 6631-6640. DOI: 10.1109/CVPR52729.2023.00641. [17]ZHAO H L, XIAO J Y, XUE Y, et al. Aspect category sentiment classification via document-level GAN and POS information[J]. International Journal of Machine Learning and Cybernetics, 2024, 15(8): 3221-3235. DOI: 10.1007/s13042-023-02089-w. [18]SUN H, WANG H Y, LIU J Q, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2022: 3722-3729. DOI: 10.1145/3503161.3548025. [19]WANG D, GUO X T, TIAN Y M, et al. TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259. DOI: 10.1016/j.patcog.2022.109259. [20]林玩聪, 韩明杰, 靳婷. 基于数据增强的多层次论点立场分类方法[J]. 广西师范大学学报(自然科学版), 2023, 41(6): 62-69. DOI: 10.16088/j.issn.1001-6600.2023052001. [21]陈开阳, 徐凡, 王明文. 基于知识图谱和图像描述的虚假新闻检测研究[J]. 江西师范大学学报(自然科学版), 2021, 45(4): 398-402. DOI: 10.16357/j.cnki.issn1000-5862.2021.04.12. [22]方旭东, 王兴芳. 基于注意力机制和对比学习的多模态情感分析[J]. 北京信息科技大学学报(自然科学版), 2024, 39(4): 63-70. DOI: 10.16508/j.cnki.11-5866/n.2024.04.009. [23]YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[EB/OL]. (2021-02-09)[2024-07-17]. https://arxiv.org/abs/2102.04830. DOI: 10.48550/arXiv.2102.04830. [24]YU W M, XU H, MENG F Y, et al. CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 3718-3727. DOI: 10.18653/v1/2020.acl-main.343. [25]文载道, 王佳蕊, 王小旭, 等. 解耦表征学习综述[J]. 自动化学报, 2022, 48(2): 351-374. DOI: 10.16383/j.aas.c210096. [26]王旭阳, 王常瑞, 张金峰, 等. 基于跨模态交叉注意力网络的多模态情感分析方法[J]. 广西师范大学学报(自然科学版), 2024, 42(2): 84-93. DOI: 10.16088/j.issn.1001-6600.2023052701. [27]杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述[J]. 软件学报, 2021, 32(2): 327-348. DOI: 10.13328/j.cnki.jos.006125. [28]赵朝阳, 朱贵波, 王金桥. ChatGPT 给语言大模型带来的启示和多模态大模型新的发展思路[J]. 数据分析与知识发现, 2023, 7(3): 26-35. DOI: 10.11925/infotech.2096-3467.2023.0216. [29]ZHAO Z X, BAI H W, ZHANG J S, et al. CDDFuse: correlation-driven dual-branch feature decomposition for multi-modality image fusion[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 5906-5916. DOI: 10.1109/CVPR52729.2023.00572. [30]RUAN D L, YAN Y, LAI S Q, et al. Feature decomposition and reconstruction learning for effective facial expression recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 7656-7665. DOI: 10.1109/CVPR46437.2021.00757. [31]GUPTA S, HOFFMAN J, MALIK J. Cross modal distillation for supervision transfer[C]// 2016 IEEE Conference on Computer Vision And Pattern Recognition(CRPR). Los Alamitos, CA: IEEE Computer Society, 2016: 2827-2836. DOI: 10.1109/CVPR.2016.309. [32]YANG J D, YU Y K, NIU D, et al. ConFEDE: contrastive feature decomposition for multimodal sentiment analysis[C]// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2023: 7617-7630. DOI: 10.18653/v1/2023.acl-long.421. [33]HU G M, LIN T E, ZHAO Y, et al. UniMSE: towards unified multimodal sentiment analysis and emotion recognition[C]// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2022: 7837-7851. DOI: 10.18653/v1/2022.emnlp-main.534. [34]ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. (2016-08-12)[2024-07-17]. https://arxiv.org/abs/1606.06259. DOI: 10.48550/arXiv.1606.06259. [35]ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2236-2246. DOI: 10.18653/v1/P18-1208. [36]ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. DOI: 10.18653/v1/D17-1115. [37]LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2247-2256. DOI: 10.18653/v1/P18-1209. [38]ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2018,32(1):5634-5641. DOI: 10.1609/aaai.v32i1.12021. [39]HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2021: 9180-9192. DOI: 10.18653/v1/2021.emnlp-main.723. [40]RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 2359-2369. DOI: 10.18653/v1/2020.acl-main.214. [41]YANG B, SHAO B, WU L J, et al. Multimodal sentiment analysis with unidirectional modality translation[J]. Neurocomputing, 2022, 467: 130-137. DOI: 10.1016/j.neucom.2021.09.041. [42]MAI S J, ZENG Y, ZHENG S J, et al. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 2276-2289. DOI: 10.1109/TAFFC.2022.3172360. [43]ZENG Y F, LI Z X, CHEN Z B, et al. A feature-based restoration dynamic interaction network for multimodal sentiment analysis[J]. Engineering Applications of Artificial Intelligence, 2024, 127(Part B): 107335. DOI: 10.1016/j.engappai.2023.107335. |
| [1] | 齐丹丹, 王长征, 郭少茹, 闫智超, 胡志伟, 苏雪峰, 马博翔, 李时钊, 李茹. 基于主题多视图表示的零样本实体检索方法[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 23-34. |
| [2] | 张丽杰, 王绍卿, 张尧, 孙福振. 基于分级注意力网络和多层对比学习的社交推荐[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 133-148. |
| [3] | 李文博, 董青, 刘超, 张奇. 基于对比学习的儿科问诊对话细粒度意图识别[J]. 广西师范大学学报(自然科学版), 2024, 42(4): 1-10. |
| [4] | 高盛祥, 杨元樟, 王琳钦, 莫尚斌, 余正涛, 董凌. 面向域外说话人适应场景的多层级解耦个性化语音合成[J]. 广西师范大学学报(自然科学版), 2024, 42(4): 11-21. |
| [5] | 聂煜, 廖祥文, 魏晶晶, 杨定达, 陈国龙. 基于深度自动编码器的多标签分类研究[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 71-79. |
|
||
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |