广西师范大学学报(自然科学版) ›› 2026, Vol. 44 ›› Issue (1): 68-79.doi: 10.16088/j.issn.1001-6600.2024122004

• 智能信息处理 • 上一篇    下一篇

基于注意力机制和多尺度融合的多模态虚假新闻检测模型

施子豪1, 蒙祖强1*, 谈超洪2   

  1. 1.广西大学 计算机与电子信息学院, 广西 南宁 530004;
    2.广西数字基础设施重点实验室(广西壮族自治区信息中心), 广西 南宁 530000
  • 收稿日期:2024-12-20 修回日期:2025-04-26 出版日期:2026-01-05 发布日期:2026-01-26
  • 通讯作者: 蒙祖强(1974—), 男, 广西河池人, 广西大学教授, 博导。E-mail: mengzuqiang@163.com
  • 基金资助:
    国家自然科学基金(62266004); 广西数字基础设施重点实验室开放基金(GXDINBC202401)

A Detection Model for Multimodal Fake News Based on Attention Mechanism and Multiscale Fusion

SHI Zihao1, MENG Zuqiang1*, TAN Chaohong2   

  1. 1. College of Computer, Electronics and Information, Guangxi University, Nanning Guangxi 530004, China;
    2. Guangxi Key Laboratory of Digital Infrastructure (Guangxi Zhuang Autonomous Region Information Center), Nanning Guangxi 530000, China
  • Received:2024-12-20 Revised:2025-04-26 Online:2026-01-05 Published:2026-01-26

摘要: 虚假新闻如果得不到及时处理,可能会造成严重后果。当前的多模态虚假新闻检测方法主要使用各种注意力机制对单模态特征进行融合,未考虑到不同模态的特征间可能存在语义差距,也未充分利用多模态预训练模型的潜力。本文提出一个新的多模态虚假新闻检测模型,对特征进行多阶段融合。该模型利用多模态预训练模型提取已对齐的特征,然后借助注意力机制使特征互相增强,拼接经过增强的特征以实现早期融合,再通过多尺度融合模块捕捉不同模态特征之间的交互信息,并学习融合权重以实现特征的后期融合。实验结果显示,本文提出的模型取得比同类模型更好的效果,验证了注意力机制与多尺度融合模块的有效性。

关键词: 多模态, 虚假新闻检测, 注意力机制, 多尺度融合, 多阶段融合

Abstract: Fake news can have serious consequences if not dealt with in a timely manner. Currently, various attention mechanisms are mainly employed in multimodal fake news detection methods to fuse unimodal features. The semantic gaps that may exist between different modal features are not taken into account, nor is the potential of multimodal pre-training models fully exploited. To this end, a new multimodal fake news detection model that performs multistage fusion of features is proposed in this paper. The pretrained multimodal model is utilized by the proposed model to extract the aligned features. Then the features are enhanced by each other through the attention mechanism, and the enhanced features are spliced to achieve early fusion. Finally, the interaction information between different modal features is captured by the multiscale fusion module, and the fusion weights are learned to realize the late fusion of features. It is shown by the experimental results that the model proposed in this paper achieves better results than similar models, and the effectiveness of the attention mechanism and the multiscale fusion module is also verified by the experimental results.

Key words: multimodality, fake news detection, attention mechanism, multiscale fusion, multistage fusion

中图分类号:  TP391.1

[1] MA J, GAO W, MITRA P, et al. Detecting rumors from microblogs with recurrent neural networks[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). Menlo Park, CA: AAAI Press, 2016: 3818-3824.
[2] YU F, LIU Q, WU S, et al. A convolutional approach for misinformation identification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17). Menlo Park, CA: AAAI Press, 2017: 3901-3907. DOI: 10.24963/ijcai.2017/545.
[3] MA J, GAO W, WONG K F. Detect rumors on twitter by promoting information campaigns with generative adversarial learning[C]//The World Wide Web Conference. New York, NY: Association for Computing Machinery, 2019: 3049-3055. DOI: 10.1145/3308558.3313741.
[4] VAIBHAV V, MANDYAM R, HOVY E. Do sentence interactions matter? Leveraging sentence level representations for fake news classification[C]//Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13). Stroudsburg, PA: Association for Computational Linguistics, 2019: 134-139. DOI: 10.18653/v1/D19-5316.
[5] QI P, CAO J, YANG T Y, et al. Exploiting multi-domain visual information for fake news detection[C]//2019 IEEE International Conference on Data Mining (ICDM). Los Alamitos, CA: IEEE Computer Society, 2019: 518-527. DOI: 10.1109/ICDM.2019.00062.
[6] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04)[2024-12-20]. https://arxiv.org/abs/1409.1556. DOI: 10.48550/arXiv.1409.1556.
[7] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
[8] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2017-06-12)[2024-12-20]. https://arxiv.org/abs/1706.03762. DOI: 10.48550/arXiv.1706.03762.
[9] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[10] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22)[2024-12-20]. https://arxiv.org/abs/2010.11929. DOI: 10.48550/arXiv.2010.11929.
[11] 亓鹏, 曹娟, 盛强. 语义增强的多模态虚假新闻检测[J]. 计算机研究与发展, 2021, 58(7): 1456-1465. DOI: 10.7544/issn1000-1239.2021.20200804.
[12] 戚力鑫, 万书振, 唐斌, 等. 基于注意力机制的多模态融合谣言检测方法[J]. 计算机工程与应用, 2022, 58(19): 209-217. DOI: 10.3778/j.issn.1002-8331.2102-0229.
[13] 袁玥, 刘永彬, 欧阳纯萍, 等. 基于一对多关系的多模态虚假新闻检测[J]. 中文信息学报, 2023, 37(9): 131-139. DOI: 10.3969/j.issn.1003-0077.2023.09.017.
[14] 王旭阳, 王常瑞, 张金峰, 等. 基于跨模态交叉注意力网络的多模态情感分析方法[J]. 广西师范大学学报(自然科学版), 2024, 42(2): 84-93. DOI: 10.16088/j.issn.1001-6600.2023052701.
[15] 吴聪, 孟敏智, 郑炜, 等. 基于生成对抗网络和对比学习的假新闻检测方法研究[J]. 网络空间安全科学学报, 2024, 2(3): 27-40. DOI: 10.20172/j.issn.2097-3136.240303.
[16] 乔禹涵, 贾彩燕. 基于图自监督对比学习的社交媒体谣言检测[J]. 南京大学学报(自然科学), 2023, 59(5): 823-832. DOI: 10.13232/j.cnki.jnju.2023.05.010.
[17] 张明道, 周欣, 吴晓红, 等. 基于语义扩充和HDGCN的虚假新闻联合检测技术[J]. 计算机科学, 2024, 51(4): 299-306. DOI: 10.11896/jsjkx.230700170.
[18] 韩晓鸿, 赵梦凡, 张钰涛. 联合异质图卷积网络和注意力机制的假新闻检测[J]. 小型微型计算机系统, 2024, 45(2): 301-308. DOI: 10.20009/j.cnki.21-1106/TP.2022-0412.
[19] 吴娇, 汪可馨, 许锟. 融合多模态的虚假新闻检测[J]. 哈尔滨商业大学学报(自然科学版), 2023, 39(1): 47-52. DOI: 10.19492/j.cnki.1672-0946.2023.01.011.
[20] WANG Y Q, MA F L, JIN Z W, et al. EANN: event adversarial neural networks for multi-modal fake news detection[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY: Association for Computing Machinery, 2018: 849-857. DOI: 10.1145/3219819.3219903.
[21] SINGHAL S, SHAH R R, CHAKRABORTY T, et al. SpotFake: a multi-modal framework for fake news detection[C]//2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). Los Alamitos, CA: IEEE Computer Society, 2019: 39-47. DOI: 10.1109/BigMM.2019.00-44.
[22] 刘金硕, 冯阔, PAN J Z, 等. MSRD: 多模态网络谣言检测方法[J]. 计算机研究与发展, 2020, 57(11): 2328-2336. DOI: 10.7544/issn1000-1239.2020.20200413.
[23] JIN Z W, CAO J, GUO H, et al. Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2017: 795-816. DOI: 10.1145/3123266.3123454.
[24] 周昊玮, 刘勇, 玄萍. 基于预训练和多模态融合的假新闻检测[J]. 计算机工程, 2024, 50(1): 289-295. DOI: 10.19678/j.issn.1000-3428.0066412.
[25] XUE J X, WANG Y B, TIAN Y C, et al. Detecting fake news by exploring the consistency of multimodal data[J]. Information Processing & Management, 2021, 58(5): 102610. DOI: 10.1016/j.ipm.2021.102610.
[26] QIAN S S, WANG J G, HU J, et al. Hierarchical multi-modal contextual attention network for fake news detection[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: Association for Computing Machinery, 2021: 153-162. DOI: 10.1145/3404835.3462871.
[27] QI P, CAO J, LI X R, et al. Improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2021: 1212-1220. DOI: 10.1145/3474085.3481548.
[28] WU Y, ZHAN P W, ZHANG Y J, et al. Multimodal fusion with co-attention networks for fake news detection[C]//Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Stroudsburg, PA: Association for Computational Linguistics, 2021: 2560-2569. DOI: 10.18653/v1/2021.findings-acl.226.
[29] CHEN Y X, LI D S, ZHANG P, et al. Cross-modal ambiguity learning for multimodal fake news detection[C]//Proceedings of the ACM Web Conference 2022. New York, NY: Association for Computing Machinery, 2022: 2897-2905. DOI: 10.1145/3485447.3511968.
[30] 彭广川, 吴飞, 韩璐, 等. 基于跨模态交互与特征融合网络的假新闻检测方法[J]. 计算机科学, 2024, 51(11): 23-29. DOI: 10.11896/jsjkx.231200186.
[31] 刘华玲, 陈尚辉, 曹世杰, 等. 基于多模态学习的虚假新闻检测研究[J]. 计算机科学与探索, 2023, 17(9): 2015-2029. DOI: 10.3778/j.issn.1673-9418.2301064.
[32] TAN H, BANSAL M. Lxmert: learning cross-modality encoder representations from transformers[EB/OL]. (2019-08-20)[2024-12-20]. https://arxiv.org/abs/1908.07490. DOI: 10.48550/arXiv.1908.07490.
[33] QI D, SU L, SONG J, et al. Imagebert: cross-modal pre-training with large-scale weak-supervised image-text data[EB/OL]. (2020-01-22)[2024-12-20]. https://arxiv.org/abs/2001.07966. DOI: 10.48550/arXiv.2001.07966.
[34] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. (2021-02-26)[2024-12-20]. https://arxiv.org/abs/2103.00020. DOI: 10.48550/arXiv.2103.00020.
[35] ZHOU Y M, YANG Y Z, YING Q C, et al. Multimodal fake news detection via CLIP-guided learning[C]//2023 IEEE International Conference on Multimedia and Expo (ICME). Los Alamitos, CA: IEEE Computer Society, 2023: 2825-2830. DOI: 10.1109/ICME55011.2023.00480.
[36] ZHOU Y M, YANG Y Z, YING Q C, et al. Multi-modal fake news detection on social media via multi-grained information fusion[C]//Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. New York, NY: Association for Computing Machinery, 2023: 343-352. DOI: 10.1145/3591106.3592271.
[37] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 11531-11539. DOI: 10.1109/CVPR42600.2020.01155.
[38] YANG A, PAN J S, LIN J Y, et al. Chinese clip: contrastive vision-language pretraining in chinese[EB/OL]. (2022-11-02)[2024-12-20]. https://arxiv.org/abs/2211.01335. DOI: 10.48550/arXiv.2211.01335.
[39] YANG Y, ZHENG L, ZHANG J W, et al. TI-CNN: convolutional neural networks for fake news detection[EB/OL]. (2018-01-03)[2024-12-20]. https://arxiv.org/abs/1806.00749. DOI: 10.48550/arXiv.1806.00749.
[40] ZHANG B C, ZHANG P, DONG X Y, et al. Long-CLIP: unlocking the long-text capability of CLIP[C]//Computer Vision-ECCV 2024: LNCS Volume 15109. Cham: Springer Nature Switzerland AG, 2025: 310-325. DOI: 10.1007/978-3-031-72983-6_18.
[41] PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates, Inc., 2019: 8026-8037.
[42] ZHU Y Y, LI Y J, WANG J L, et al. FaKnow: a unified library for fake news detection[EB/OL]. (2024-01-27)[2024-12-20]. https://arxiv.org/abs/2401.16441. DOI: 10.48550/arXiv.2401.16441.
[43] LAURENS VAN DER M, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605.
[44] YE X J. Calflops: a FLOPs and Params calculate tool for neural networks in pytorch framework[EB/OL]. (2023-08-20)[2024-12-20]. https://github.com/MrYxJ/calculate-flops.pytorch.
[1] 黄艳国, 肖洁, 吴水清. 基于D2STGNN的双向高效多尺度交通流预测[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 10-22.
[2] 刘志豪, 李自立, 苏珉. 智能通信与无人机结合的YOLOv8电动车骑行者头盔佩戴检测方法[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 23-32.
[3] 黄琪, 李必镡, 王明文, 肖聪, 刘璟, 罗文兵. 融合情感知识的虚假新闻检测[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 80-90.
[4] 王旭阳, 马瑾. 跨模态特征增强与层次化MLP通信的多模态情感分析[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 91-101.
[5] 黎豊玮, 谭玉枚, 宋树祥, 夏海英. 基于注意力引导的遮挡感知面部表情识别[J]. 广西师范大学学报(自然科学版), 2025, 43(5): 104-113.
[6] 田晟, 熊辰崟, 龙安洋. 基于改进PointNet++的城市道路点云分类方法[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 1-14.
[7] 韩烁, 江林峰, 杨建斌. 基于注意力机制PINNs方法求解圣维南方程[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 58-68.
[8] 石天怡, 南新元, 郭翔羽, 赵濮, 蔡鑫. 基于改进ConvNeXt的苹果叶片病害分类算法[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 83-96.
[9] 王旭阳, 章家瑜. 基于跨模态增强网络的时序多模态情感分析[J]. 广西师范大学学报(自然科学版), 2025, 43(4): 97-107.
[10] 卢展跃, 陈艳平, 杨卫哲, 黄瑞章, 秦永彬. 基于掩码注意力与多特征卷积网络的关系抽取方法[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 12-22.
[11] 李志欣, 刘鸣琦. 差异特征导向的解耦多模态情感分析[J]. 广西师范大学学报(自然科学版), 2025, 43(3): 57-71.
[12] 郭翔羽, 石天怡, 陈燕楠, 南新元, 蔡鑫. 基于YOLO-CDBW模型的列车接触网异物检测研究[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 56-69.
[13] 苏春海, 夏海英. 抗噪声双约束网络的面部表情识别[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 70-82.
[14] 刘玉娜, 马双宝. 基于改进YOLOv8n的轻量化织物疵点检测算法[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 83-94.
[15] 戴林华, 黎远松, 石睿, 何忠良, 李雷. HSED-YOLO:一种轻量化的带钢表面缺陷检测模型[J]. 广西师范大学学报(自然科学版), 2025, 43(2): 95-106.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘晓娟, 林璐, 胡郁葱, 潘雷. 站点周边用地类型对地铁乘车满意度影响研究[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 1 -12 .
[2] 韩华彬, 高丙朋, 蔡鑫, 孙凯. 基于HO-CNN-BiLSTM-Transformer模型的风机叶片结冰故障诊断[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 13 -28 .
[3] 陈建国, 梁恩华, 宋学伟, 覃章荣. 基于OCT图像三维重建的人眼房水动力学LBM模拟[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 29 -41 .
[4] 李好, 何冰. 凹槽结构表面液滴弹跳行为研究[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 42 -53 .
[5] 田晟, 赵凯龙, 苗佳霖. 基于改进YOLO11n模型的自动驾驶道路交通检测算法研究[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 1 -9 .
[6] 黄艳国, 肖洁, 吴水清. 基于D2STGNN的双向高效多尺度交通流预测[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 10 -22 .
[7] 刘志豪, 李自立, 苏珉. 智能通信与无人机结合的YOLOv8电动车骑行者头盔佩戴检测方法[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 23 -32 .
[8] 张竹露, 李华强, 刘洋, 许立雄. 基于Bi-LSTM特征融合和FT-FSL的非侵入式负荷辨识[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 33 -44 .
[9] 王涛, 黎远松, 石睿, 陈慧宁, 侯宪庆. MGDE-UNet:轻量化光伏电池缺陷分割模型[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 45 -55 .
[10] 黄文杰, 罗维平, 陈镇南, 彭志祥, 丁梓豪. 基于YOLO11的轻量化PCB缺陷检测算法研究[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 56 -67 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发