Journal of Guangxi Normal University(Natural Science Edition) ›› 2019, Vol. 37 ›› Issue (2): 60-74.doi: 10.16088/j.issn.1001-6600.2019.02.008

Previous Articles     Next Articles

Research on Short Summary Generation of Multi-Document

ZHANG Suiyuan1,2,3, XUE Yuanhai1,2*, YU Xiaoming1,2, LIU Yue1,2, CHENG Xueqi1,2   

  1. 1.Key Laboratory of Network Data Science and Technology, Chinese Academy of Sciences, Beijing 100190, China;
    2.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    3.University of Chinese Academy of Sciences, Beijing 100190, China
  • Received:2018-11-02 Online:2019-04-25 Published:2019-04-28

Abstract: The automatic summarization technique is used to compress a long piece of articles into a shorter text that can generalize the content of the original text. Multiple documents are highly redundant, while the space of electronic devices is limited. Thus, the development of abstracts has faced with challenges. In this paper,a rough granularity sorting method with convolution features is proposed. First,the similarity matrix between sentences is regarded as a topological graph,and the convolution features of the graph convolution are obtained. Then,the convolution features of the graph convolution are fused and the mainstream extraction type multi document summarization technology is used to repeat the sentence,the top four sentences are selected as summaries. Lastly,a short summary generation model based on Seq2seq framework is proposed:1)the method based on the convolution neural network (CNN) is adopted in the encoder part;2)the pointer mechanism based on attention is introduced,the subject vector is incorporated into it. A series of experimental results show that in this scenario,compared with recurrent neural network (RNN),the encoder part based on CNN can be better parallelized,thus increasing the efficiency significantly on the premise of basically consistent effect. In addition, compared with the traditional extraction and compression model, the model proposed in this paper has been significantly improved in ROUGE indicators and readability (information and fluency).

Key words: multi-document, short summary generation, Seq2seq

CLC Number: 

  • TP391
[1] 中国互联网络信息中心.第41次中国互联网络发展状况统计报告[EB/OL]. (2018-03-05) [2018-11-02]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201803/t20180305_70249.htm.
[2] RADEV D R,JING H,TAM D. Centroid-based summarization of multiple documents[J]. Information Processing & Management,2004,40(6):919-938.
[3] MULLON C,SHIN Y,CURY P. NEATS:A network economics approach to trophic systems[J]. Ecological Modelling,2009,220(21):3033-3045.
[4] EVANS D K,KLAVANS J L,MCKEOWN K R. Columbia Newsblaster:multilingual news summarization on the web[C]//Demonstration Papers at HLT-NAACL. Association for Computational Linguistics. Stroudburg,PA:ACL,2008:1-4.
[5] OUYANG Y,LI W,LI S,et al. Applying regression models to query-focused multi-document summarization[J]. Information Processing & Management,2011,47(2):227-237.
[6] MIHALCEA R,TARAU P. TextRank:Bringing order into texts[C]//Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing. Stroudburg,PA:ACL,2004:404-411.
[7] ERKAN G,RADEV D R. LexRank:graph-based lexical centrality as salience in text summarization[J].Journal of Artificial Intelligence Research,2004,22:457-479.
[8] WAN X,XIAO J. Graph-based multi-modality learning for topic-focused multi-document summarization[C]//Proceedings of the International Joint Conference on Artificial Intelligence IJCAI 2009. California:AAAI,2009:1586-1591.
[9] WAN X,YANG J. Improved affinity graph based multi-document summarization[C]//Human Language Technology Conference of the NAACl,Companion Volume:Short Papers. Stroudburg,PA:ACL,2006:181-184.
[10] WAN X,YANG J. Multi-document summarization using cluster-based link analysis[C]//International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2008:299-306.
[11] YAN R,YUAN Z,WAN X,et al. Hierarchical graph summarization:Leveraging hybrid information through visible and invisible linkage[M]//Advances in Knowledge Discovery and Data Mining. Berlin Heidelberg:Springer,2012:97-108.
[12] WAN X. TimedTextRank:adding the temporal dimension to multi-document summarization[C] //International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2007:867-868.
[13] SWAN R,ALLAN J. Automatic generation of overview timelines[C]//Proceedings of International ACM SIGIR Conference on Research & Development in Information Retrieval. New York:ACM,2000:49-56.
[14] HAI L C,LEE Y K. Query based event extraction along a timeline[C]//International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2004:425-432.
[15] WHITE M,KORELSKY T,CARDIE C,et al. Multidocument summarization via information extraction[C]//International Conference on Human Language Technology Research. Association for Computational Linguistics. Stroudburg,PA:ACL,2001:1-7.
[16] LI L,WANG D,SHEN C,et al. Ontology-enriched multi-document summarization in disaster management[J]. Information Sciences,2013,224(2):118-129.
[17] DORR B,ZAJIC D,SCHWARTZ R. Hedgetrimmer:a parse-and-trim approach to headline generation[C]//Hlt-Naacl 03 on Text Summarization Workshop. Association for Computational Linguistics. Stroudburg,PA:ACL,2003:1-8.
[18] ZAJIC D,DORR B,SCHWARTZ R. Headline generation for written and broadcast news[EB/OL]. (2003-07-01)[2018-11-02]. https://www.researchgate.net/publication/228509374_Headline_generation_for_written_and_broadcast_news.
[19] ALFONSECA E,PIGHIN D,GARRIDO G. HEADY:News headline abstraction through event pattern clustering[C]//Proceedings of the the Meeting of the Association for Computational Linguistics. Stroudburg,PA:ACL,2013:1243-1253.
[20] COLMENARES C A,LITVAK M,MANTRACH A,et al. HEADS:Headlinegeneration as sequence prediction using an abstract feature-rich space[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudburg,PA:ACL,2015:133-142.
[21] BANKO M,MITTAL V O,WITBROCK M J. Headline generation based on statistical translation[C]//Proceedings of the Meeting of the Association of the Computational Linguistics. Stroudburg,PA:ACL,2000:318-325.
[22] SORICUT R,MARCU D. Abstractive headline generation using WIDL-expressions[J]. Information Processing & Management,2007,43(6):1536-1548.
[23] WOODSEND K,FENG Y,LAPATA M. Title generation with quasi-synchronous grammar[C]// Conference on Empirical Methods in Natural Language Processing. Stroudburg,PA:ACL,2010:513-523.
[24] SUN R,ZHANG Y,ZHANG M,et al. Event-driven headline generation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudburg,PA:ACL,2015:462-472.
[25] XU L,WANG Z,LIU Z,et al. Topicsensitive neural headline generation[EB/OL]. (2016-08-20)[2018-11-02]. https://arXiv.org/abs/1608.05777v1.
[26] TAN J,WAN X,XIAO J,et al. Fromneural sentence summarization to headline generation:A coarse-to-fine approach[C]//Twenty-Sixth International Joint Conference on Artificial Intelligence. California:AAAI,2017:4109-4115.
[27] GEHRING J,AULI M,GRANGIER D,et al. Convolutionalsequence to sequence learning[EB/OL].(2017-07-25)[2018-11-02]. https://arXiv.org/abs/1705.03122v3.
[28] MOODY C E. Mixing Dirichlettopic models and eord embeddings to make Lda2vec[EB/OL].(2016-05-06)[2018-11-02]. http://cn.arXiv.org/abs/1605.02019v1.
[1] LI Weiyong, LIU Bin, ZHANG Wei, CHEN Yunfang. An Automatic Summarization Model Based on Deep Learning for Chinese [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(2): 51-63.
[2] CHENG Xian-yi, PAN Yan, ZHU Qian, SUN Ping. Automatic Generating Algorithm of Event-oriented Multi-documentSummarization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 147-150.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!