|
广西师范大学学报(自然科学版) ›› 2019, Vol. 37 ›› Issue (2): 60-74.doi: 10.16088/j.issn.1001-6600.2019.02.008
张随远1,2,3, 薛源海1,2*, 俞晓明1,2, 刘悦1,2, 程学旗1,2
ZHANG Suiyuan1,2,3, XUE Yuanhai1,2*, YU Xiaoming1,2, LIU Yue1,2, CHENG Xueqi1,2
摘要: 自动摘要技术用于将较长篇幅的文章压缩为一段较短的能概括原文中心内容的文本。多文档冗余度高,电子设备所展示的空间有限,成为摘要发展面临的挑战。本文提出融合图卷积特征的句子粗粒度排序方法。首先将句子之间的相似度矩阵视为拓扑关系图,对其进行图卷积计算得到图卷积特征。然后通过排序模型融合图卷积特征以及主流的抽取式多文档摘要技术对句子进行重要度排序,选取排名前四的句子作为摘要。最后提出基于Seq2seq框架的短摘要生成模型:①在Encoder部分采用基于卷积神经网络(CNN)的方法;②引入基于注意力的指针机制,并将主题向量融入其中。实验结果表明,在本文场景下,相较于循环神经网络(RNN),在Encoder部分基于CNN能够更好地进行并行化,在效果基本一致的前提下,显著提升效率。此外,相较于传统的基于抽取和压缩的模型,本文提出的模型在ROUGE指标以及可读性(信息度和流利度)方面均取得了显著的效果提升。
中图分类号:
[1] 中国互联网络信息中心.第41次中国互联网络发展状况统计报告[EB/OL]. (2018-03-05) [2018-11-02]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201803/t20180305_70249.htm. [2] RADEV D R,JING H,TAM D. Centroid-based summarization of multiple documents[J]. Information Processing & Management,2004,40(6):919-938. [3] MULLON C,SHIN Y,CURY P. NEATS:A network economics approach to trophic systems[J]. Ecological Modelling,2009,220(21):3033-3045. [4] EVANS D K,KLAVANS J L,MCKEOWN K R. Columbia Newsblaster:multilingual news summarization on the web[C]//Demonstration Papers at HLT-NAACL. Association for Computational Linguistics. Stroudburg,PA:ACL,2008:1-4. [5] OUYANG Y,LI W,LI S,et al. Applying regression models to query-focused multi-document summarization[J]. Information Processing & Management,2011,47(2):227-237. [6] MIHALCEA R,TARAU P. TextRank:Bringing order into texts[C]//Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing. Stroudburg,PA:ACL,2004:404-411. [7] ERKAN G,RADEV D R. LexRank:graph-based lexical centrality as salience in text summarization[J].Journal of Artificial Intelligence Research,2004,22:457-479. [8] WAN X,XIAO J. Graph-based multi-modality learning for topic-focused multi-document summarization[C]//Proceedings of the International Joint Conference on Artificial Intelligence IJCAI 2009. California:AAAI,2009:1586-1591. [9] WAN X,YANG J. Improved affinity graph based multi-document summarization[C]//Human Language Technology Conference of the NAACl,Companion Volume:Short Papers. Stroudburg,PA:ACL,2006:181-184. [10] WAN X,YANG J. Multi-document summarization using cluster-based link analysis[C]//International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2008:299-306. [11] YAN R,YUAN Z,WAN X,et al. Hierarchical graph summarization:Leveraging hybrid information through visible and invisible linkage[M]//Advances in Knowledge Discovery and Data Mining. Berlin Heidelberg:Springer,2012:97-108. [12] WAN X. TimedTextRank:adding the temporal dimension to multi-document summarization[C] //International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2007:867-868. [13] SWAN R,ALLAN J. Automatic generation of overview timelines[C]//Proceedings of International ACM SIGIR Conference on Research & Development in Information Retrieval. New York:ACM,2000:49-56. [14] HAI L C,LEE Y K. Query based event extraction along a timeline[C]//International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2004:425-432. [15] WHITE M,KORELSKY T,CARDIE C,et al. Multidocument summarization via information extraction[C]//International Conference on Human Language Technology Research. Association for Computational Linguistics. Stroudburg,PA:ACL,2001:1-7. [16] LI L,WANG D,SHEN C,et al. Ontology-enriched multi-document summarization in disaster management[J]. Information Sciences,2013,224(2):118-129. [17] DORR B,ZAJIC D,SCHWARTZ R. Hedgetrimmer:a parse-and-trim approach to headline generation[C]//Hlt-Naacl 03 on Text Summarization Workshop. Association for Computational Linguistics. Stroudburg,PA:ACL,2003:1-8. [18] ZAJIC D,DORR B,SCHWARTZ R. Headline generation for written and broadcast news[EB/OL]. (2003-07-01)[2018-11-02]. https://www.researchgate.net/publication/228509374_Headline_generation_for_written_and_broadcast_news. [19] ALFONSECA E,PIGHIN D,GARRIDO G. HEADY:News headline abstraction through event pattern clustering[C]//Proceedings of the the Meeting of the Association for Computational Linguistics. Stroudburg,PA:ACL,2013:1243-1253. [20] COLMENARES C A,LITVAK M,MANTRACH A,et al. HEADS:Headlinegeneration as sequence prediction using an abstract feature-rich space[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudburg,PA:ACL,2015:133-142. [21] BANKO M,MITTAL V O,WITBROCK M J. Headline generation based on statistical translation[C]//Proceedings of the Meeting of the Association of the Computational Linguistics. Stroudburg,PA:ACL,2000:318-325. [22] SORICUT R,MARCU D. Abstractive headline generation using WIDL-expressions[J]. Information Processing & Management,2007,43(6):1536-1548. [23] WOODSEND K,FENG Y,LAPATA M. Title generation with quasi-synchronous grammar[C]// Conference on Empirical Methods in Natural Language Processing. Stroudburg,PA:ACL,2010:513-523. [24] SUN R,ZHANG Y,ZHANG M,et al. Event-driven headline generation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudburg,PA:ACL,2015:462-472. [25] XU L,WANG Z,LIU Z,et al. Topicsensitive neural headline generation[EB/OL]. (2016-08-20)[2018-11-02]. https://arXiv.org/abs/1608.05777v1. [26] TAN J,WAN X,XIAO J,et al. Fromneural sentence summarization to headline generation:A coarse-to-fine approach[C]//Twenty-Sixth International Joint Conference on Artificial Intelligence. California:AAAI,2017:4109-4115. [27] GEHRING J,AULI M,GRANGIER D,et al. Convolutionalsequence to sequence learning[EB/OL].(2017-07-25)[2018-11-02]. https://arXiv.org/abs/1705.03122v3. [28] MOODY C E. Mixing Dirichlettopic models and eord embeddings to make Lda2vec[EB/OL].(2016-05-06)[2018-11-02]. http://cn.arXiv.org/abs/1605.02019v1. |
[1] | 李维勇, 柳斌, 张伟, 陈云芳. 一种基于深度学习的中文生成式自动摘要方法[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 51-63. |
[2] | 宋俊, 韩啸宇, 黄宇, 黄廷磊, 付琨. 一种面向实体的演化式多文档摘要生成方法[J]. 广西师范大学学报(自然科学版), 2015, 33(2): 36-41. |
[3] | 程显毅, 潘燕, 朱倩, 孙萍. 面向事件的多文档文摘生成算法的研究[J]. 广西师范大学学报(自然科学版), 2011, 29(1): 147-150. |
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |