广西师范大学学报(自然科学版) ›› 2015, Vol. 33 ›› Issue (2): 36-41.doi: 10.16088/j.issn.1001-6600.2015.02.006

• • 上一篇    下一篇

一种面向实体的演化式多文档摘要生成方法

宋俊1,2,3, 韩啸宇1,2,3, 黄宇1,2, 黄廷磊1,2, 付琨1,2   

  1. 1.中科院空间信息处理与应用系统技术重点实验室,北京100190;
    2.中国科学院电子学研究所,北京100190;
    3.中国科学院大学,北京100190
  • 收稿日期:2015-03-19 出版日期:2015-02-10 发布日期:2018-09-20
  • 通讯作者: 付琨(1974—),男,湖北荆州人,中国科学院电子学研究所研究员,博导。E-mail: kunfuiecas@gmail.com
  • 基金资助:
    “863”国家重大课题资助项目(2014AA7013033,2014AA7115061,2014AA7115028)

A Method for Entity-Oriented Timeline Summarization

SONG Jun1,2,3, HAN Xiao-yu1,2,3, HUANG Yu1,2, HUANG Ting-lei1,2, FU Kun1,2   

  1. 1. CAS Key Laboratory of Spatial Information Processing and Applied System Technology, Beijing 100190, China;
    2. Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China;
    3. University of Chinese Academy of Sciences, Beijing 100190, China
  • Received:2015-03-19 Online:2015-02-10 Published:2018-09-20

摘要: 本文针对多文档摘要没有考虑实体、仅仅生成通用摘要的问题,提出面向实体的演化式多文档摘要生成方法。本文首先利用一个概率主题模型联合建模文档主题的演化和实体的参与情况,然后结合实体对句子进行评分和选择,针对不同的实体,同一个句子可能获得不同的评分。此外,本文在真实数据集上进行了大量的实验和分析,实验结果表明,该方法可以面向不同的实体生成关于事件发展的个性化摘要,同时与现有方法相比,该方法还得到了更好的通用摘要。

关键词: 多文档摘要, 概率主题模型, 自然语言处理

Abstract: The objective of this paper is to propose a novel entity-oriented timeline summarization from multiple documents. To achieve this, this paper firstly proposes a topic model to simultaneously model the dynamic topics and the entity’s participation. An efficient Gibbs sampler is also developed for this model. Then each sentence is allocated a score based on the discovered topics and the sentences with high score are selected as summaries. Experimental results on real-world datasets verify that the proposed model can not only generate summaries for entities, but also outperform the baseline model on Rouge evaluation.

Key words: multiple document summarization, topic model, natural language process

中图分类号: 

  • TP391.1
[1] 秦兵, 刘挺, 李生. 多文档自动文摘综述[J]. 中文信息学报, 2005, 19(6):13-20.
[2] YAN R, KONG L, HUANG C, et al. Timeline generation through evolutionary trans-temporal summarization[C] //Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, United Kingdom:Association for Computational Linguistics,2011:433-443.
[3] 严睿. 演进式动态新闻文档摘要生成方法研究[D]. 北京:北京大学, 2013.
[4] RADEV D R, JING H, STYS' M, et al. Centroid-based summarization of multiple documents[J]. Information Processing and Management, 2004, 40(6):919-938.
[5] 程显毅, 潘燕, 朱倩,等. 面向事件的多文档文摘生成算法的研究[J]. 广西师范大学学报:自然科学版, 2011, 29(1):147-150.
[6] 刘晓燕, 黄宇, 尤红建. 基于仿射传播算法的多文档摘要方法[J]. 国外电子测量技术, 2014,33(8):29-33.
[7] 林立, 胡侠, 朱俊彦. 基于谱聚类的多文档摘要新方法[J]. Computer Engineering, 2010, 36(22):64-65.
[8] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(Jan):993-1022.
[9] LI J, LI S. Evolutionary hierarchical dirichlet process for timeline summarization[C] //Association for computational Linguistics (2). Sofia, Bulgaria:Association for Computational Linguistics,2013,556-560.
[10] 刘美玲, 郑德权, 赵铁军,等. 动态多文档文摘模型[J]. Journal of Software, 2012, 23(2):289-298
[11] 付玲, 张晖. 结合 LDA 和谱聚类的多文档摘要[J]. Computer Engineering and Applications, 2013, 49(16):142-145
[12] TEH Y W, JORDAN M I, BEAL M J, et al. Hierarchical dirichlet processes[J]. Journal of the American Statistical Association, 2006, 101(476):1566-1581
[13] LIN C Y. Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out:Proceedings of the ACL-04 Workshop. Barcelona, Spain:Association for Computational Linguistics,2004:74-81.
[1] 程显毅, 潘燕, 朱倩, 孙萍. 面向事件的多文档文摘生成算法的研究[J]. 广西师范大学学报(自然科学版), 2011, 29(1): 147-150.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发