Journal of Guangxi Normal University(Natural Science Edition) ›› 2020, Vol. 38 ›› Issue (2): 51-63.doi: 10.16088/j.issn.1001-6600.2020.02.006

Previous Articles     Next Articles

An Automatic Summarization Model Based on Deep Learning for Chinese

LI Weiyong1, LIU Bin2, ZHANG Wei2, CHEN Yunfang2*   

  1. 1. Institute of Computing Software,Nanjing Vocational College of Information Technology, Nanjing Jiangsu210023,China;
    2.School of Computer Science and Technology,Nanjing University of Postsand Telecommunications, Nanjing Jiangsu 210023,China
  • Received:2019-10-08 Published:2020-04-02

Abstract: Based on the unique pictograph and the structure of Chinese character, a new way to form automatic summarization is proposed in the paper, which includes text vector technique directing at Chinese stroke and an automatic summarizing model. Stroke-based text vector codes the basic element of Chinese character and it highlights the specific characteristics of the word, which makes the relationship between words tightened. The corresponding text vector of Chinese word is gained by Skip-Gram model and optimized through Seq2Seq model. It solves the problem of long-sequence text information loss and the supplement of reversing information by using Bi-LSTM. Attention mechanism is used in encoder to weigh different effects of the input statement on decoder and meanwhile the use of Beam Search in the decoder optimizes the sequence of the results. The experiments based on LCSTS data set training model show the automatic summarization model can improve the quality and the readability of Chinese text summary.

Key words: deep learning, generation summarization, stroke_embedding, Seq2Seq, attention mechanism

CLC Number: 

  • TP391
[1] LUHN H P.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958, 2(2):159-165.DOI: 10.1147/rd.22.0159.
[2] 张随远,薛源海,俞晓明,等.多文档短摘要生成技术研究[J].广西师范大学学报(自然科学版),2019,37(2):60-74.DOI: 10.16088/j.issn.1001-6600.2019.02.008.
[3] LOPYREVK. Generating news headlines with recurrent neural networks[EB/OL].(2015-12-05)[2019-10-08]. https://arxiv.org/abs/1512.01712.
[4] 宋俊,韩啸宇,黄宇,等.一种面向实体的演化式多文档摘要生成方法[J].广西师范大学学报(自然科学版),2015,33(2):36-41.DOI: 10.16088/j.issn.1001-6600.2015.02.006.
[5] CHO K, van MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2014:1724-1734.DOI:10.3115/v1/D14-1179.
[6] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL]. (2016-05-19)[2019-10-08].https://arxiv.org/abs/1409.0473v7.
[7] 张仰森,曹元大,俞士汶.基于规则与统计相结合的中文文本自动查错模型与算法[J].中文信息学报,2006, 20(4):1-7,55.DOI: 10.3969/j.issn.1003-0077.2006.04.001.
[8] HU Baotian,CHEN Qingcai,ZHU Fangze.LCSTS:a large scale chinese short text summarization dataset[EB/OL]. (2015-06-19)[2019-10-08].https://arxiv.org/abs/1506.05865.
[9] RUSHA M,CHOPRA S,WESTON J.A neural attention model for abstractive sentence summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA: Association for Computational Linguistics,2015:379-389.DOI: 10.18653/v1/D15-1044.
[10]BENGIO Y,DUCHARME R,VINCENT P, et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3: 1137-1155.
[11]GOODFELLOWI J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems:Volume 2.Cambridge,MA:MIT Press,2014: 2672-2680.
[12]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146.DOI 10.1162/tacl_a_00051.
[13]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems:Volume 2.Cambridge,MA:MIT Press,2014: 3104-3112.
[14]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL]. (2014-09-01)[2019-10-08].https://arxiv.org/abs/1409.0473v7.
[15]LIN C Y,HOVY E.Automatic evaluation of summaries using N-gram co-occurrence statistics[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology:Volume 1.Stroudsburg,PA:Association for Computational Linguistics,2003:71-78.DOI:10.3115/ 1073445.1073465.
[16]YU Jinxing,JIAN Xun,XIN Hao,et al.Joint embeddings of Chinese words, characters, and fine-grained subcharacter components[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing Stroudsburg,PA:Association for Computational Linguistics,2017:286-291.DOI:10.18653/v1/D17-1027.
[17]LUONG T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA: Association for Computational Linguistics,2015:1412-1421.DOI:10.18653/v1/D15-1166.
[18]Term Frequency by Inverse Document Frequency[M]//LIU Ling,ÖZSU M.Encyclopedia of Database Systems.Boston, MA:Springer,2009. DOI: 10.1007/978-0-387-39940-9_3784.
[1] ZHANG Mingyu,ZHAO Meng,CAI Fuhong,LIANG Yu,WANG Xinhong. Wave Power Prediction Based on Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(3): 25-32.
[2] LIU Yingxuan, WU Xiru, XUE Ganggang. Multi-target Real-time Detection for Road Traffic SignsBased on Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(2): 96-106.
[3] WANG Jian, ZHENG Qifan, LI Chao, SHI Jing. Remote Supervision Relationship Extraction Based on Encoder and Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 53-60.
[4] ZHANG Suiyuan, XUE Yuanhai, YU Xiaoming, LIU Yue, CHENG Xueqi. Research on Short Summary Generation of Multi-Document [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 60-74.
[5] ZHANG Jinlei, LUO Yuling, FU Qiang. Predicting Financial Time Series Based on Gated Recurrent Unit Neural Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 82-89.
[6] HUANG Liming,CHEN Weizheng,YAN Hongfei,CHEN Chong. A Stock Prediction Method Based on Recurrent Neural Network and Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 13-22.
[7] WU Wenya,CHEN Yufeng,XU Jin’an,ZHANG Yujie. High-level Semantic Attention-based Convolutional Neural Networks for Chinese Relation Extraction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 32-41.
[8] YUE Tianchi, ZHANG Shaowu, YANG Liang, LIN Hongfei, YU Kai. Stance Detection Method Based on Two-Stage Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 42-49.
[9] YU Chuanming,LI Haonan,AN Lu. Analysis of Text Emotion Cause Based on Multi-task Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 50-61.
[10] WANG Qi,QIU Jiahui,RUAN Tong,GAO Daqi,GAO Ju. Recurrent Capsule Network for Clinical Relation Extraction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 80-88.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Yuhui, CHEN Zening, HUANG Zhonghao, ZHOU Qihai. Activity Time Budget of Assamese macaque (Macaca assamensis) during Rainy Season in Nonggang Nature Reserve, Guangxi, China[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 80 -86 .
[2] LI Xianjiang, SHI Shuqin, CAI Weimin, CAO Yuqing. Simulation of Land Use Change in Tianjin Binhai New Area Based on CA-Markov Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 133 -143 .
[3] LIU Guolun, SONG Shuxiang, CEN Mingcan, LI Guiqin, XIE Lina. Design of Bandwidth Tunable Band-Stop Filter[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 1 -8 .
[4] HUANG Yanping, WEI Yuming. Multiple Solutions of Multiple-points Boundary Value Problem for a Class of Fractional Differential Equation[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 41 -49 .
[5] WAN Lei,LUO Yuling,HUANG Xingyue. Monitoring Platform for the Hardware Spike Neural Networks[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(1): 9 -16 .
[6] LIN Yue. The Fault Diagnosis of Charging Piles Based on Hybrid AP-HMM Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(1): 25 -33 .
[7] LU Jiakuan,LIU Xuexia,QIN Xueqing. Notes on Frobenius Groups[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(1): 84 -87 .
[8] WU Lei,YANG Li,GUO Pengxiao. Feedback Linearization Control of Rucklidge System[J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(1): 21 -27 .
[9] HAN Caihong, LI Lüe, HUANG Lili. Global Asymptotic Stability of a Class of Difference Equations[J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(1): 53 -57 .
[10] WANG Juan, QIN Jingfang,LIU Chunyan,TANG Huang. α-Glucosidase and Acetylcholinesterase Inhibitory Potency of Extract from Rhizome of Ficus pumila L.[J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(1): 69 -74 .