基于注意力卷积神经网络的中文虚假评论检测

doi:10.16088/j.issn.1001-6600.2023020502

摘要/Abstract

摘要： 针对现有的虚假评论检测方法未充分利用虚假评论文本特征这一问题,本文提出一种基于多层注意力机制的卷积神经网络模型。首先,使用多种预训练词向量初始化词嵌入层,并进行复值位置编码;然后,将经过多种卷积核卷积得到的多种特征映射依次通过嵌入用户特征的通道级和卷积核级的注意力层,根据特征重要程度分配不同权重;最后,将拟合的评论文本特征表示进行Softmax分类。实验结果表明,与诸多主流优秀神经网络模型相比,本文模型准确率和F₁值分别提高4.74和3.86个百分点。

关键词: 虚假评论检测, 注意力机制, 卷积神经网络, 预训练词向量

Abstract: A convolutional neural network model based on multi-level attention mechanism is proposed to solve the problem that the existing methods of fake review detection do not make full use of the text features of fake reviews. Firstly, a variety of pre-trained word vectors are used to initialize the word embedding layer, and complex position coding is carried out. Then, multiple feature maps are obtained by convolution of multiple convolution kernels through the channel level and convolution kernel level attention layer embedded with user features, and different weights are assigned according to the importance of features. Finally, the feature representation of the fitted reviews text is classified by softmax. Experimental results show that compared with many mainstream excellent neural network models, the accuracy rate of the proposed model increases by 4.74%, and the F₁ value gains by 3.86%.

Key words: fake review detection, attention mechanism, convolutional neural network, pre-trained word vector

中图分类号: TP391.1

吴正清, 曹晖, 刘宝锴. 基于注意力卷积神经网络的中文虚假评论检测[J]. 广西师范大学学报（自然科学版）, 2023, 41(5): 26-36.

WU Zhengqing, CAO Hui, LIU Baokai. Chinese Fake Review Detection Based on Attention Convolutional Neural Network[J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(5): 26-36.

参考文献

[1] JINDAL N, LIU B. Opinion spam and analysis[C]// WSDM’08: Proceedings of the 2008 International Conference on Web Search and Data Mining. New York, NY: Association for Computing Machinery, 2008: 219-230. DOI: 10.1145/1341531.1341560.
[2] OTT M, CHOI Y J, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2011: 309-319.
[3] MUKHERJEE A, VENKATARAMAN V, LIU B, et al. Fake review detection: classification and analysis of real and pseudo reviews: UIC-CS-2013-03[R]. Chicago: Department of Computer Science of University of Illinois at Chicago, 2013.
[4] LI H Y, CHEN Z Y, LIU B, et al. Spotting fake reviews via collective Positive-Unlabeled learning[C]// 2014 IEEE International Conference on Data Mining. Los Alamitos, CA: IEEE Computer Society, 2014: 899-904. DOI: 10.1109/ICDM.2014.47.
[5] 任亚峰, 姬东鸿, 张红斌, 等. 基于PU学习算法的虚假评论识别研究[J]. 计算机研究与发展, 2015, 52(3): 639-648. DOI: 10.7544/issn1000-1239.2015.20131473.
[6] ABRI F, GUTIERREZ L F, NAMIN A S, et al.Fake reviews detection through analysis of linguistic features[EB/OL]. (2020-10-08)[2023-02-05]. https://arxiv.org/abs/2010.04260. DOI: 10.48550/arXiv.2010.04260.
[7] 景亚鹏. 基于深度学习的欺骗性垃圾信息识别研究[D]. 上海: 华东师范大学, 2014.
[8] ZHANG W, DU Y H, YOSHIDA T, et al. DRI-RCNN: an approach to deceptive review identification using recurrent convolutional neural network[J]. Information Processing and Management, 2018, 54(4): 576-592. DOI: 10.1016/j.ipm.2018.03.007.
[9] LI A, QIN Z, LIU R S, et al.Spam review detection with graph convolutional networks[C]// CIKM’19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York, NY: Association for Computing Machinery, 2019: 2703-2711. DOI: 10.1145/3357384.3357820.
[10] STANTON G, IRISSAPPANE A A. GANs for semi-supervised opinion spam detection[C]// Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). Macao: International Joint Conferences on Artificial Intelligence Organization, 2019: 5204-5210. DOI: 10.24963/ijcai.2019/723.
[11] 李璐旸. 基于表示学习的虚假信息检测研究[D]. 哈尔滨: 哈尔滨工业大学, 2017. DOI: 10.7666/d.D01332130.
[12] LI L Y, QIN B, REN W J, et al. Document representation and feature combination for deceptive spam review detection[J]. Neurocomputing, 2017, 254: 33-41. DOI: 10.1016/j.neucom.2016.10.080.
[13] 刘雨心, 王莉, 张昊. 基于分层注意力机制的神经网络垃圾评论检测模型[J]. 计算机应用, 2018, 38(11): 3063-3068, 3074. DOI: 10.11772/j.issn.1001-9081.2018041356.
[14] 颜梦香, 姬东鸿, 任亚峰. 基于层次注意力机制神经网络模型的虚假评论识别[J]. 计算机应用, 2019, 39(7): 1925-1930. DOI: 10.11772/j.issn.1001-9081.2018112340.
[15] 曾致远, 卢晓勇, 徐盛剑, 等. 基于多层注意力机制深度学习模型的虚假评论检测[J]. 计算机应用与软件, 2020, 37(5): 177-182. DOI: 10.3969/j.issn.1000-386x.2020.05.031.
[16] 张蓉, 张献国. 基于层次异构图注意力网络的虚假评论检测[J]. 计算机应用, 2021, 41(5): 1275-1281. DOI: 10.11772/j.issn.1001-9081.2020081190.
[17] KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: Association for Computational Linguistics, 2014: 1746-1751. DOI: 10.3115/v1/D14-1181.
[18] 汤皓星. 商品虚假评论检测技术研究及软件实现[D]. 兰州: 西北民族大学, 2021. DOI: 10.27408/d.cnki.gxmzc.2021.000036.
[19] WANG B Y, ZHAO D H, LIOMA C, et al. Encoding word order in complex embeddings[C]// International Conference on Learning Representations 2020. Virtual: ICLR, 2020: 1-15.
[20] LI S, ZHAO Z, HU R F, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 138-143. DOI: 10.18653/v1/P18-2023.
[21] SONG Y, SHI S M, LI J, et al. Directional skip-gram: explicitly distinguishing left and right context for word embeddings[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 175-180. DOI: 10.18653/v1/N18-2028.
[22] ZHOU P, QI Z Y, ZHENG S C, et al. Text classification improved by integrating bidirectional LSTM with two-dimensional maxpooling[EB/OL]. (2016-11-21)[2023-02-05]. https://arxiv.org/abs/1611.06639. DOI: 10.48550/arXiv.1611.06639.
[23] ZHANG R, LEE H, RADEV D R. Dependency sensitive convolutional neural networks for modeling sentences and documents[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 1512-1521. DOI: 10.18653/v1/N16-1177.
[24] JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 562-570. DOI: 10.18653/v1/P17-1052.
[25] LAI A W, XU L H, LIU K, et al. Recurrent convolutional neural networks for text classification[C]// AAAI’15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2015: 2267-2273. DOI: 10.1609/aaai.v29i1.9513.
[26] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 207-212. DOI: 10.18653/v1/P16-2034.
[27] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017: 6000-6010.
[28] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423.
[29] LIU P F, QIU X P, HUANG X J. Recurrent neural network for text classification with multi-task learning[C] // Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). Palo Alto, CA: AAAI Press, 2016: 2873-2879.
[30] JOULIN A, GRAVE E, BOJANOWSKI E, et al. Bag of tricks for efficient text classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Stroudsburg, PA: Association for Computational Linguistics, 2017: 427-431. DOI: 10.18653/v1/E17-2068.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed