基于注意力机制和多尺度融合的多模态虚假新闻检测模型

doi:10.16088/j.issn.1001-6600.2024122004

摘要/Abstract

摘要： 虚假新闻如果得不到及时处理,可能会造成严重后果。当前的多模态虚假新闻检测方法主要使用各种注意力机制对单模态特征进行融合,未考虑到不同模态的特征间可能存在语义差距,也未充分利用多模态预训练模型的潜力。本文提出一个新的多模态虚假新闻检测模型,对特征进行多阶段融合。该模型利用多模态预训练模型提取已对齐的特征,然后借助注意力机制使特征互相增强,拼接经过增强的特征以实现早期融合,再通过多尺度融合模块捕捉不同模态特征之间的交互信息,并学习融合权重以实现特征的后期融合。实验结果显示,本文提出的模型取得比同类模型更好的效果,验证了注意力机制与多尺度融合模块的有效性。

关键词: 多模态, 虚假新闻检测, 注意力机制, 多尺度融合, 多阶段融合

Abstract: Fake news can have serious consequences if not dealt with in a timely manner. Currently, various attention mechanisms are mainly employed in multimodal fake news detection methods to fuse unimodal features. The semantic gaps that may exist between different modal features are not taken into account, nor is the potential of multimodal pre-training models fully exploited. To this end, a new multimodal fake news detection model that performs multistage fusion of features is proposed in this paper. The pretrained multimodal model is utilized by the proposed model to extract the aligned features. Then the features are enhanced by each other through the attention mechanism, and the enhanced features are spliced to achieve early fusion. Finally, the interaction information between different modal features is captured by the multiscale fusion module, and the fusion weights are learned to realize the late fusion of features. It is shown by the experimental results that the model proposed in this paper achieves better results than similar models, and the effectiveness of the attention mechanism and the multiscale fusion module is also verified by the experimental results.

Key words: multimodality, fake news detection, attention mechanism, multiscale fusion, multistage fusion

中图分类号: TP391.1

施子豪, 蒙祖强, 谈超洪. 基于注意力机制和多尺度融合的多模态虚假新闻检测模型[J]. 广西师范大学学报（自然科学版）, 2026, 44(1): 68-79.

SHI Zihao, MENG Zuqiang, TAN Chaohong. A Detection Model for Multimodal Fake News Based on Attention Mechanism and Multiscale Fusion[J]. Journal of Guangxi Normal University(Natural Science Edition), 2026, 44(1): 68-79.

参考文献

[1] MA J, GAO W, MITRA P, et al. Detecting rumors from microblogs with recurrent neural networks[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). Menlo Park, CA: AAAI Press, 2016: 3818-3824.
[2] YU F, LIU Q, WU S, et al. A convolutional approach for misinformation identification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17). Menlo Park, CA: AAAI Press, 2017: 3901-3907. DOI: 10.24963/ijcai.2017/545.
[3] MA J, GAO W, WONG K F. Detect rumors on twitter by promoting information campaigns with generative adversarial learning[C]//The World Wide Web Conference. New York, NY: Association for Computing Machinery, 2019: 3049-3055. DOI: 10.1145/3308558.3313741.
[4] VAIBHAV V, MANDYAM R, HOVY E. Do sentence interactions matter? Leveraging sentence level representations for fake news classification[C]//Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13). Stroudsburg, PA: Association for Computational Linguistics, 2019: 134-139. DOI: 10.18653/v1/D19-5316.
[5] QI P, CAO J, YANG T Y, et al. Exploiting multi-domain visual information for fake news detection[C]//2019 IEEE International Conference on Data Mining (ICDM). Los Alamitos, CA: IEEE Computer Society, 2019: 518-527. DOI: 10.1109/ICDM.2019.00062.
[6] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04)[2024-12-20]. https://arxiv.org/abs/1409.1556. DOI: 10.48550/arXiv.1409.1556.
[7] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
[8] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2017-06-12)[2024-12-20]. https://arxiv.org/abs/1706.03762. DOI: 10.48550/arXiv.1706.03762.
[9] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[10] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22)[2024-12-20]. https://arxiv.org/abs/2010.11929. DOI: 10.48550/arXiv.2010.11929.
[11] 亓鹏, 曹娟, 盛强. 语义增强的多模态虚假新闻检测[J]. 计算机研究与发展, 2021, 58(7): 1456-1465. DOI: 10.7544/issn1000-1239.2021.20200804.
[12] 戚力鑫, 万书振, 唐斌, 等. 基于注意力机制的多模态融合谣言检测方法[J]. 计算机工程与应用, 2022, 58(19): 209-217. DOI: 10.3778/j.issn.1002-8331.2102-0229.
[13] 袁玥, 刘永彬, 欧阳纯萍, 等. 基于一对多关系的多模态虚假新闻检测[J]. 中文信息学报, 2023, 37(9): 131-139. DOI: 10.3969/j.issn.1003-0077.2023.09.017.
[14] 王旭阳, 王常瑞, 张金峰, 等. 基于跨模态交叉注意力网络的多模态情感分析方法[J]. 广西师范大学学报(自然科学版), 2024, 42(2): 84-93. DOI: 10.16088/j.issn.1001-6600.2023052701.
[15] 吴聪, 孟敏智, 郑炜, 等. 基于生成对抗网络和对比学习的假新闻检测方法研究[J]. 网络空间安全科学学报, 2024, 2(3): 27-40. DOI: 10.20172/j.issn.2097-3136.240303.
[16] 乔禹涵, 贾彩燕. 基于图自监督对比学习的社交媒体谣言检测[J]. 南京大学学报(自然科学), 2023, 59(5): 823-832. DOI: 10.13232/j.cnki.jnju.2023.05.010.
[17] 张明道, 周欣, 吴晓红, 等. 基于语义扩充和HDGCN的虚假新闻联合检测技术[J]. 计算机科学, 2024, 51(4): 299-306. DOI: 10.11896/jsjkx.230700170.
[18] 韩晓鸿, 赵梦凡, 张钰涛. 联合异质图卷积网络和注意力机制的假新闻检测[J]. 小型微型计算机系统, 2024, 45(2): 301-308. DOI: 10.20009/j.cnki.21-1106/TP.2022-0412.
[19] 吴娇, 汪可馨, 许锟. 融合多模态的虚假新闻检测[J]. 哈尔滨商业大学学报(自然科学版), 2023, 39(1): 47-52. DOI: 10.19492/j.cnki.1672-0946.2023.01.011.
[20] WANG Y Q, MA F L, JIN Z W, et al. EANN: event adversarial neural networks for multi-modal fake news detection[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY: Association for Computing Machinery, 2018: 849-857. DOI: 10.1145/3219819.3219903.
[21] SINGHAL S, SHAH R R, CHAKRABORTY T, et al. SpotFake: a multi-modal framework for fake news detection[C]//2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). Los Alamitos, CA: IEEE Computer Society, 2019: 39-47. DOI: 10.1109/BigMM.2019.00-44.
[22] 刘金硕, 冯阔, PAN J Z, 等. MSRD: 多模态网络谣言检测方法[J]. 计算机研究与发展, 2020, 57(11): 2328-2336. DOI: 10.7544/issn1000-1239.2020.20200413.
[23] JIN Z W, CAO J, GUO H, et al. Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2017: 795-816. DOI: 10.1145/3123266.3123454.
[24] 周昊玮, 刘勇, 玄萍. 基于预训练和多模态融合的假新闻检测[J]. 计算机工程, 2024, 50(1): 289-295. DOI: 10.19678/j.issn.1000-3428.0066412.
[25] XUE J X, WANG Y B, TIAN Y C, et al. Detecting fake news by exploring the consistency of multimodal data[J]. Information Processing & Management, 2021, 58(5): 102610. DOI: 10.1016/j.ipm.2021.102610.
[26] QIAN S S, WANG J G, HU J, et al. Hierarchical multi-modal contextual attention network for fake news detection[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: Association for Computing Machinery, 2021: 153-162. DOI: 10.1145/3404835.3462871.
[27] QI P, CAO J, LI X R, et al. Improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2021: 1212-1220. DOI: 10.1145/3474085.3481548.
[28] WU Y, ZHAN P W, ZHANG Y J, et al. Multimodal fusion with co-attention networks for fake news detection[C]//Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Stroudsburg, PA: Association for Computational Linguistics, 2021: 2560-2569. DOI: 10.18653/v1/2021.findings-acl.226.
[29] CHEN Y X, LI D S, ZHANG P, et al. Cross-modal ambiguity learning for multimodal fake news detection[C]//Proceedings of the ACM Web Conference 2022. New York, NY: Association for Computing Machinery, 2022: 2897-2905. DOI: 10.1145/3485447.3511968.
[30] 彭广川, 吴飞, 韩璐, 等. 基于跨模态交互与特征融合网络的假新闻检测方法[J]. 计算机科学, 2024, 51(11): 23-29. DOI: 10.11896/jsjkx.231200186.
[31] 刘华玲, 陈尚辉, 曹世杰, 等. 基于多模态学习的虚假新闻检测研究[J]. 计算机科学与探索, 2023, 17(9): 2015-2029. DOI: 10.3778/j.issn.1673-9418.2301064.
[32] TAN H, BANSAL M. Lxmert: learning cross-modality encoder representations from transformers[EB/OL]. (2019-08-20)[2024-12-20]. https://arxiv.org/abs/1908.07490. DOI: 10.48550/arXiv.1908.07490.
[33] QI D, SU L, SONG J, et al. Imagebert: cross-modal pre-training with large-scale weak-supervised image-text data[EB/OL]. (2020-01-22)[2024-12-20]. https://arxiv.org/abs/2001.07966. DOI: 10.48550/arXiv.2001.07966.
[34] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. (2021-02-26)[2024-12-20]. https://arxiv.org/abs/2103.00020. DOI: 10.48550/arXiv.2103.00020.
[35] ZHOU Y M, YANG Y Z, YING Q C, et al. Multimodal fake news detection via CLIP-guided learning[C]//2023 IEEE International Conference on Multimedia and Expo (ICME). Los Alamitos, CA: IEEE Computer Society, 2023: 2825-2830. DOI: 10.1109/ICME55011.2023.00480.
[36] ZHOU Y M, YANG Y Z, YING Q C, et al. Multi-modal fake news detection on social media via multi-grained information fusion[C]//Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. New York, NY: Association for Computing Machinery, 2023: 343-352. DOI: 10.1145/3591106.3592271.
[37] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 11531-11539. DOI: 10.1109/CVPR42600.2020.01155.
[38] YANG A, PAN J S, LIN J Y, et al. Chinese clip: contrastive vision-language pretraining in chinese[EB/OL]. (2022-11-02)[2024-12-20]. https://arxiv.org/abs/2211.01335. DOI: 10.48550/arXiv.2211.01335.
[39] YANG Y, ZHENG L, ZHANG J W, et al. TI-CNN: convolutional neural networks for fake news detection[EB/OL]. (2018-01-03)[2024-12-20]. https://arxiv.org/abs/1806.00749. DOI: 10.48550/arXiv.1806.00749.
[40] ZHANG B C, ZHANG P, DONG X Y, et al. Long-CLIP: unlocking the long-text capability of CLIP[C]//Computer Vision-ECCV 2024: LNCS Volume 15109. Cham: Springer Nature Switzerland AG, 2025: 310-325. DOI: 10.1007/978-3-031-72983-6_18.
[41] PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates, Inc., 2019: 8026-8037.
[42] ZHU Y Y, LI Y J, WANG J L, et al. FaKnow: a unified library for fake news detection[EB/OL]. (2024-01-27)[2024-12-20]. https://arxiv.org/abs/2401.16441. DOI: 10.48550/arXiv.2401.16441.
[43] LAURENS VAN DER M, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605.
[44] YE X J. Calflops: a FLOPs and Params calculate tool for neural networks in pytorch framework[EB/OL]. (2023-08-20)[2024-12-20]. https://github.com/MrYxJ/calculate-flops.pytorch.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed