Journal of Guangxi Normal University(Natural Science Edition) ›› 2023, Vol. 41 ›› Issue (4): 96-108.doi: 10.16088/j.issn.1001-6600.2022103101

Previous Articles     Next Articles

Microblog Opinion Summarization Method Based on Transformer and TextRank

SUN Xu1, SHEN Bin1, YAN Xin1,2*, ZHANG Jinpeng3,4, XU Guangyi5   

  1. 1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
    2. Key Laboratory of Artificial Intelligence in Yunnan Province (Kunming University of Science and Technology), Kunming Yunnan 650500, China;
    3. School of Computer Science and Engineering, Yunnan University, Kunming Yunnan 650091, China;
    4. School of Information, Yunnan University of Finance and Economics, Kunming Yunnan 650221, China;
    5. Yunnan Nantian Electronic Information Industry Co., Ltd., Kunming Yunnan 650040, China
  • Received:2022-10-31 Revised:2023-03-16 Online:2023-07-25 Published:2023-09-06

Abstract: The association of sentiment among microblog texts has not been considered by previous research. A microblog opinion summarization method based on Transformer and TextRank is proposed in this paper. Firstly, the word vectors of the texts are encoded and quantified by encoder and quantization space of Transformer. Then according to the quantization results, the opinion categories of microblog textset are divided by semanteme clustering, and the important categories are selected for summary extraction. Then the sentiment feature vector and the microblog text feature vector are concatenated. Then TextRank algorithm with sentiment features is used in every category, and the microblog text with the highest weight is extracted as the summary text. Finally, the most representative summary texts in all categories are combined to obtain the final microblog opinion summarizations. The experimental results show that, after adding the sentiment polarity influence factor, the ROUGE values of the proposed method has significantly improved compared with the baseline method. The maximum F-measure values of Rouge-1, Rouge-2 and Rouge-SU4 can top out at 0.493 7, 0.255 5, 0.270 6 respectively. It proves that the proposed method is effective for the task of extracting microblog opinion summarizations.

Key words: sentiment feature, opinion summarization, semanteme clustering, summary extraction, Transformer, TextRank

CLC Number:  TP391.1
[1] 田宁梦. 面向微博话题的立场检测和观点摘要[D]. 武汉: 中南财经政法大学, 2019.
[2] LLORET E, PALOMAR M. Analyzing the use of word graphs for abstractive text summarization[C]// IMMM 2011: The
First International Conference on Advances in Information Mining and Management. Barcelona: IARIA, 2011: 61-66.
[3] GANESAN K, ZHAI C X, HAN J W. Opinosis:a graph based approach to abstractive summarization of highly redundant opinions[C]// Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). Beijing: Tsinghua University Press, 2010: 340-348.
[4] GERANI S, MEHDAD Y, CARENINI G, et al. Abstractive summarization of product reviews using discourse structure[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Stroudsburg, PA: Association for Computational Linguistics, 2014: 1602-1613. DOI: 10.3115/v1/D14-1168.
[5] KHAN A, GUL M A, ZAREEI M, et al. Movie review summarization using supervised learning and graph-based ranking algorithm[J]. Computational Intelligence and Neuroscience, 2020, 2020:7526580. DOI: 10.1155/2020/7526580.
[6] ZHU L H, GAO S, PAN S J, et al. Graph-based informative-sentence selection for opinion summarization[C]// Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). New York, NY: Association for Computing Machinery, 2013: 408-412. DOI: 10.1145/2492517.2492651.
[7] SANKARASUBRAMANIAM Y, RAMANATHAN K, GHOSH S. Text summarization using Wikipedia[J]. Information Processing & Management, 2014, 50(3): 443-461. DOI: 10.1016/j.ipm.2014.02.001.
[8] 张聪, 裴家欢, 黄锴宇, 等. 基于语义图优化算法的中文微博观点摘要研究[J]. 山东大学学报(理学版), 2017, 52(7): 59-65. DOI: 10.6040/j.issn.1671-9352.1.2016.PC2.
[9] 熊娇, 王明文, 李茂西, 等. 基于词项—句子—文档三层图模型的多文档自动摘要[J]. 中文信息学报, 2014, 28(6): 201-207. DOI: 10.3969/j.issn.1003-0077.2014.06.029.
[10] 余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247. DOI: 10.11896/j.issn.1002-137X.2016.6.048.
[11] 莫鹏, 胡珀, 黄湘冀, 等. 基于超图的文本摘要与关键词协同抽取研究[J]. 中文信息学报, 2015, 29(6): 135-140. DOI: 10.3969/j.issn.1003-0077.2015.06.018.
[12] ANGELIDIS S, LAPATA M. Summarizing opinions:aspect extraction meets sentiment prediction and they are both weakly supervised[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 3675-3686. DOI: 10.18653/v1/D18-1403.
[13] CHU E, LIU P. Meansum: a neural model for unsupervised multi-document abstractive summarization[C]// Proceedings of the 36th International Conference on Machine Learning. Long Beach, CA: PMLR, 2019: 1223-1232.
[14] BRAŽINSKAS A, LAPATA M, TITOV I. Unsupervised opinion summarization as copycat-review generation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 5151-5169. DOI: 10.18653/v1/2020.acl-main.461.
[15] ANGELIDIS S, AMPLAYO R K, SUHARA Y, et al. Extractive opinion summarization in quantized transformer spaces[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 277-293. DOI: 10.1162/ tacl_a_00366.
[16] KE W J, GAO J H, SHEN H W, et al. ConsistSum: unsupervised opinion summarization with the consistency of aspect, sentiment and semantic[C]// Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. New York, NY: Association for Computing Machinery, 2022: 467-475. DOI: 10.1145/3488560.3498463.
[17] IM J, KIM M, LEE H, et al. Self-supervised multimodal opinion summarization[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2021: 388-403. DOI: 10.18653/v1/2021.acl-long.33.
[18] ABDI A, HASAN S, SHAMSUDDIN S M, et al. A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion[J]. Knowledge-Based Systems, 2021, 213: 106658. DOI: 10.1016/j.knosys.2020.106658.
[19] 余传明, 郑智梁, 朱星宇, 等. 面向查询的观点摘要模型研究:以Debatepedia为数据源[J]. 情报学报, 2020, 39(4): 374-386. DOI: 10.3772/j.issn.1000-0135.2020.04.004.
[20] 苏放, 王晓宇, 张治. 基于注意力机制的评论摘要生成[J]. 北京邮电大学学报, 2018, 41(3): 7-13. DOI: 10.13190/ j.jbupt.2017-219.
[21] 余传明, 朱星宇, 龚雨田, 等. 基于序列到序列模型的抽象式中文文本摘要研究[J]. 图书情报工作, 2019, 63(11): 108-117. DOI: 10.13266/j.issn.0252-3116.2019.11.012.
[22] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systerms 30 (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017: 6000-6010.
[23] ROY A, VASWANI A, NEELAKANTAN A, et al. Theory and experiments on vector quantizedautoencoders[EB/OL]. (2018-07-20)[2022-10-31]. https://arxiv.org/abs/1805.11063. DOI: 10.48550/arXiv.1805.11063.
[24] BENGIO Y, LÉONARD N, COURVILLE A. Estimating or propagating gradients through stochastic neurons for conditional computation[EB/OL].(2013-08-15)[2022-10-31]. https://arxiv.org/abs/1308.3432. DOI: 10.48550/arXiv.1308.3432.
[25] 林莉媛, 王中卿, 李寿山, 等. 基于PageRank的中文多文档文本情感摘要[J]. 中文信息学报, 2014, 28(2): 85-90. DOI: 10.3969/j.issn.1003-0077.2014.02.013.
[26] 沈彬, 严馨, 周丽华, 等. 基于ERNIE和双重注意力机制的微博情感分析[J]. 云南大学学报(自然科学版), 2022, 44(3): 480-489. DOI: 10.7540/j.ynu.20210263.
[27] LI S, ZHAO Z, HU R F, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 138-143. DOI: 10.18653/v1/P18-2023.
[28] KINGMA D P, BA J. Adam: a method for stochasticoptimization[EB/OL]. (2017-01-30)[2022-10-31]. https://arxiv.org/abs/1412.6980. DOI: 10.48550/arXiv.1412.6980.
[29] LIN C Y. Rouge: a package for automatic evaluation of summaries[C]// Text Summarization Branches Out. Stroudsburg, PA: Association for Computational Linguistics, 2004: 74-81.
[30] PUDUPPULLY R, STEEDMAN M. Multi-document summarization with centroid-based pretraining[EB/OL]. (2022-08-01)[2022-10-31]. https://arxiv.org/abs/2208.01006. DOI: 10.48550/arXiv.2208.01006.
[31] KAZEMI A, PÉREZ-ROSAS V, MIHALCEA R. Biased TextRank: unsupervised graph-based content extraction[C]// Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: International Committee on Computational Linguistics, 2020: 1642-1652. DOI: 10.18653/v1/2020.coling-main.144.
[1] LIN Yue, LIU Tingzhang, HUANG Lirong, XI Xiaoye, PAN Jian. Anomalous State Detection of Power Transformer Basedon Bidirectional KL Distance Clustering Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 20-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] XU Jiu-cheng, LI Xiao-yan, LI Shuang-qun, ZHANG Ling-jun. Feature Images Retrieval Method of Tolerance Granular-basedMulti-level Texture[J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 186 -187 .
[2] BAI Defa, XU Xin, WANG Guochang. Review of Generalized Linear Models and Classification for Functional Data[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 15 -29 .
[3] ZENG Qingfan, QIN Yongsong, LI Yufang. Empirical Likelihood Inference for a Class of Spatial Panel Data Models[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 30 -42 .
[4] ZHANG Xilong, HAN Meng, CHEN Zhiqiang, WU Hongxin, LI Muhang. Survey of Ensemble Classification Methods for Complex Data Stream[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 1 -21 .
[5] TONG Lingchen, LI Qiang, YUE Pengpeng. Research Progress and Prospects of Karst Soil Organic Carbon Based on CiteSpace[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 22 -34 .
[6] WANG Dangshu, YI Jiaan, DONG Zhen, YANG Yaqiang, DENG Xuan. Research on Bridgeless Boost PFC Converter with Ripple Suppression Unit Based on Single Cycle Control[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 47 -57 .
[7] YU Siting, PENG Jingjing, PENG Zhenyun. Rank Constraint Least Square Symmetric Semidefinite Solutions and Its Optimal Approximation of the Matrix Equation[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 136 -144 .
[8] QIN Chengfu, MO Fenmei. Structure ofC3-and C4-Critical Graphs[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 145 -153 .
[9] YIN Yudong, KE Shanzhe, HUANG Jiayan, DENG Mengxiang, LIU Guanyan, CHENG Keguang. One-pot Generation of Allylated Products from Alcohols, Carboxylic Acids and Amines with 1,3-Dibromopropane by Sodium Hydride[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 154 -161 .
[10] DU Libo, LI Jinyu, ZHANG Xiao, LI Yonghong, PAN Weidong. Chemical Constituents and Biological Activity from the Bark of Toona ciliata var. pubescens[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 162 -172 .