广西师范大学学报(自然科学版) ›› 2023, Vol. 41 ›› Issue (5): 26-36.doi: 10.16088/j.issn.1001-6600.2023020502

• 研究论文 • 上一篇    下一篇

基于注意力卷积神经网络的中文虚假评论检测

吴正清, 曹晖*, 刘宝锴   

  1. 中国民族语言文字信息技术教育部重点实验室(西北民族大学),甘肃 兰州 730030
  • 收稿日期:2023-02-05 修回日期:2023-04-02 发布日期:2023-10-09
  • 通讯作者: 曹晖(1971—), 女, 甘肃兰州人, 西北民族大学教授, 博导。E-mail:147625251@qq.com
  • 基金资助:
    国家自然科学基金(61633013)

Chinese Fake Review Detection Based on Attention Convolutional Neural Network

WU Zhengqing, CAO Hui*, LIU Baokai   

  1. Key Laboratory of China’s Ethnic Languages and Information Technology of Ministry of Education (Northwest Minzu University), Lanzhou Gansu 730030, China
  • Received:2023-02-05 Revised:2023-04-02 Published:2023-10-09

摘要: 针对现有的虚假评论检测方法未充分利用虚假评论文本特征这一问题,本文提出一种基于多层注意力机制的卷积神经网络模型。首先,使用多种预训练词向量初始化词嵌入层,并进行复值位置编码;然后,将经过多种卷积核卷积得到的多种特征映射依次通过嵌入用户特征的通道级和卷积核级的注意力层,根据特征重要程度分配不同权重;最后,将拟合的评论文本特征表示进行Softmax分类。实验结果表明,与诸多主流优秀神经网络模型相比,本文模型准确率和F1值分别提高4.74和3.86个百分点。

关键词: 虚假评论检测, 注意力机制, 卷积神经网络, 预训练词向量

Abstract: A convolutional neural network model based on multi-level attention mechanism is proposed to solve the problem that the existing methods of fake review detection do not make full use of the text features of fake reviews. Firstly, a variety of pre-trained word vectors are used to initialize the word embedding layer, and complex position coding is carried out. Then, multiple feature maps are obtained by convolution of multiple convolution kernels through the channel level and convolution kernel level attention layer embedded with user features, and different weights are assigned according to the importance of features. Finally, the feature representation of the fitted reviews text is classified by softmax. Experimental results show that compared with many mainstream excellent neural network models, the accuracy rate of the proposed model increases by 4.74%, and the F1 value gains by 3.86%.

Key words: fake review detection, attention mechanism, convolutional neural network, pre-trained word vector

中图分类号:  TP391.1

[1] JINDAL N, LIU B. Opinion spam and analysis[C]// WSDM’08: Proceedings of the 2008 International Conference on Web Search and Data Mining. New York, NY: Association for Computing Machinery, 2008: 219-230. DOI: 10.1145/1341531.1341560.
[2] OTT M, CHOI Y J, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2011: 309-319.
[3] MUKHERJEE A, VENKATARAMAN V, LIU B, et al. Fake review detection: classification and analysis of real and pseudo reviews: UIC-CS-2013-03[R]. Chicago: Department of Computer Science of University of Illinois at Chicago, 2013.
[4] LI H Y, CHEN Z Y, LIU B, et al. Spotting fake reviews via collective Positive-Unlabeled learning[C]// 2014 IEEE International Conference on Data Mining. Los Alamitos, CA: IEEE Computer Society, 2014: 899-904. DOI: 10.1109/ICDM.2014.47.
[5] 任亚峰, 姬东鸿, 张红斌, 等. 基于PU学习算法的虚假评论识别研究[J]. 计算机研究与发展, 2015, 52(3): 639-648. DOI: 10.7544/issn1000-1239.2015.20131473.
[6] ABRI F, GUTIERREZ L F, NAMIN A S, et al.Fake reviews detection through analysis of linguistic features[EB/OL]. (2020-10-08)[2023-02-05]. https://arxiv.org/abs/2010.04260. DOI: 10.48550/arXiv.2010.04260.
[7] 景亚鹏. 基于深度学习的欺骗性垃圾信息识别研究[D]. 上海: 华东师范大学, 2014.
[8] ZHANG W, DU Y H, YOSHIDA T, et al. DRI-RCNN: an approach to deceptive review identification using recurrent convolutional neural network[J]. Information Processing and Management, 2018, 54(4): 576-592. DOI: 10.1016/j.ipm.2018.03.007.
[9] LI A, QIN Z, LIU R S, et al.Spam review detection with graph convolutional networks[C]// CIKM’19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York, NY: Association for Computing Machinery, 2019: 2703-2711. DOI: 10.1145/3357384.3357820.
[10] STANTON G, IRISSAPPANE A A. GANs for semi-supervised opinion spam detection[C]// Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). Macao: International Joint Conferences on Artificial Intelligence Organization, 2019: 5204-5210. DOI: 10.24963/ijcai.2019/723.
[11] 李璐旸. 基于表示学习的虚假信息检测研究[D]. 哈尔滨: 哈尔滨工业大学, 2017. DOI: 10.7666/d.D01332130.
[12] LI L Y, QIN B, REN W J, et al. Document representation and feature combination for deceptive spam review detection[J]. Neurocomputing, 2017, 254: 33-41. DOI: 10.1016/j.neucom.2016.10.080.
[13] 刘雨心, 王莉, 张昊. 基于分层注意力机制的神经网络垃圾评论检测模型[J]. 计算机应用, 2018, 38(11): 3063-3068, 3074. DOI: 10.11772/j.issn.1001-9081.2018041356.
[14] 颜梦香, 姬东鸿, 任亚峰. 基于层次注意力机制神经网络模型的虚假评论识别[J]. 计算机应用, 2019, 39(7): 1925-1930. DOI: 10.11772/j.issn.1001-9081.2018112340.
[15] 曾致远, 卢晓勇, 徐盛剑, 等. 基于多层注意力机制深度学习模型的虚假评论检测[J]. 计算机应用与软件, 2020, 37(5): 177-182. DOI: 10.3969/j.issn.1000-386x.2020.05.031.
[16] 张蓉, 张献国. 基于层次异构图注意力网络的虚假评论检测[J]. 计算机应用, 2021, 41(5): 1275-1281. DOI: 10.11772/j.issn.1001-9081.2020081190.
[17] KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: Association for Computational Linguistics, 2014: 1746-1751. DOI: 10.3115/v1/D14-1181.
[18] 汤皓星. 商品虚假评论检测技术研究及软件实现[D]. 兰州: 西北民族大学, 2021. DOI: 10.27408/d.cnki.gxmzc.2021.000036.
[19] WANG B Y, ZHAO D H, LIOMA C, et al. Encoding word order in complex embeddings[C]// International Conference on Learning Representations 2020. Virtual: ICLR, 2020: 1-15.
[20] LI S, ZHAO Z, HU R F, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 138-143. DOI: 10.18653/v1/P18-2023.
[21] SONG Y, SHI S M, LI J, et al. Directional skip-gram: explicitly distinguishing left and right context for word embeddings[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 175-180. DOI: 10.18653/v1/N18-2028.
[22] ZHOU P, QI Z Y, ZHENG S C, et al. Text classification improved by integrating bidirectional LSTM with two-dimensional maxpooling[EB/OL]. (2016-11-21)[2023-02-05]. https://arxiv.org/abs/1611.06639. DOI: 10.48550/arXiv.1611.06639.
[23] ZHANG R, LEE H, RADEV D R. Dependency sensitive convolutional neural networks for modeling sentences and documents[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 1512-1521. DOI: 10.18653/v1/N16-1177.
[24] JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 562-570. DOI: 10.18653/v1/P17-1052.
[25] LAI A W, XU L H, LIU K, et al. Recurrent convolutional neural networks for text classification[C]// AAAI’15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2015: 2267-2273. DOI: 10.1609/aaai.v29i1.9513.
[26] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 207-212. DOI: 10.18653/v1/P16-2034.
[27] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017: 6000-6010.
[28] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[29] LIU P F, QIU X P, HUANG X J. Recurrent neural network for text classification with multi-task learning[C] // Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). Palo Alto, CA: AAAI Press, 2016: 2873-2879.
[30] JOULIN A, GRAVE E, BOJANOWSKI E, et al. Bag of tricks for efficient text classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Stroudsburg, PA: Association for Computational Linguistics, 2017: 427-431. DOI: 10.18653/v1/E17-2068.
[1] 郭嘉梁, 靳婷. 基于语义增强的多模态情感分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 14-25.
[2] 唐侯清, 辛斌斌, 朱虹谕, 乙加伟, 张冬冬, 武新章, 双丰. 基于多尺度注意力倒残差网络的轴承故障诊断[J]. 广西师范大学学报(自然科学版), 2023, 41(4): 109-122.
[3] 黄叶祺, 王明伟, 闫瑞, 雷涛. 基于改进的YOLOv5金刚石线表面质量检测[J]. 广西师范大学学报(自然科学版), 2023, 41(4): 123-134.
[4] 邓希桢, 蒋明, 岑明灿, 罗玉玲. 基于熵图像静态分析技术的勒索软件分类研究[J]. 广西师范大学学报(自然科学版), 2023, 41(3): 91-104.
[5] 王利娥, 王艺汇, 李先贤. POI推荐中的多源数据融合和隐私保护方法[J]. 广西师范大学学报(自然科学版), 2023, 41(1): 87-101.
[6] 潘海明, 陈庆锋, 邱杰, 何乃旭, 刘春雨, 杜晓敬. 基于卷积推理的多跳知识图谱问答算法[J]. 广西师范大学学报(自然科学版), 2023, 41(1): 102-112.
[7] 张涛, 杜建民. 基于无人机遥感的荒漠草原微斑块识别研究[J]. 广西师范大学学报(自然科学版), 2022, 40(6): 50-58.
[8] 田晟, 宋霖. 基于CNN和Bagging集成的交通标志识别[J]. 广西师范大学学报(自然科学版), 2022, 40(4): 35-46.
[9] 王宇航, 张灿龙, 李志欣, 王智文. 体现用户意图和风格的图像描述生成[J]. 广西师范大学学报(自然科学版), 2022, 40(4): 91-103.
[10] 李正光, 陈恒, 林鸿飞. 基于双向语言模型的社交媒体药物不良反应识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 40-48.
[11] 周圣凯, 富丽贞, 宋文爱. 基于深度学习的短文本语义相似度计算模型[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 49-56.
[12] 万黎明, 张小乾, 刘知贵, 宋林, 周莹, 李理. 基于高效通道注意力的UNet肺结节CT图像分割[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 66-75.
[13] 张萍, 徐巧枝. 基于多感受野与分组混合注意力机制的肺结节分割研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 76-87.
[14] 彭涛, 唐经, 何凯, 胡新荣, 刘军平, 何儒汉. 基于多步态特征融合的情感识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 104-111.
[15] 孔亚钰, 卢玉洁, 孙中天, 肖敬先, 侯昊辰, 陈廷伟. 面向强化当前兴趣的图神经网络推荐算法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 151-160.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 董淑龙, 马姜明, 辛文杰. 景观视觉评价研究进展与趋势——基于CiteSpace的知识图谱分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 1 -13 .
[2] 郭嘉梁, 靳婷. 基于语义增强的多模态情感分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 14 -25 .
[3] 梁正友, 蔡俊民, 孙宇, 陈磊. 结合残差动态图卷积与特征强化的点云分类[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 37 -48 .
[4] 欧阳舒歆, 王洺钧, 荣垂田, 孙华波. 基于改进LSTM的多维QAR数据异常检测[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 49 -60 .
[5] 李依洋, 曾才斌, 黄在堂. 分数Brown运动驱动的具有壁附着的恒化器模型的随机吸引子[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 61 -68 .
[6] 李鹏博, 李永祥. 外部区域上p-Laplace方程的径向对称解[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 69 -75 .
[7] 吴子弦, 成军, 符坚铃, 周心雯, 谢佳龙, 宁全. 基于PI的Semi-Markovian电力系统事件触发控制设计分析[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 76 -85 .
[8] 程蕾, 闫普选, 杜博豪, 叶思, 邹华红. MOF-2的水相合成及其热稳定和介电性能研究[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 86 -95 .
[9] 刘美余, 张进燕, 周童曦, 廖广凤, 杨新洲, 卢汝梅. 匙羹藤中一个新的C21甾体糖苷及其降血糖活性[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 96 -104 .
[10] 王威, 邓华, 胡乐宁, 李杨. 赤泥-海藻酸钠水凝胶对水中Pb(Ⅱ)的吸附性能[J]. 广西师范大学学报(自然科学版), 2023, 41(5): 105 -115 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发