广西师范大学学报(自然科学版) ›› 2020, Vol. 38 ›› Issue (2): 64-71.doi: 10.16088/j.issn.1001-6600.2020.02.007

• CCIR2019 • 上一篇    下一篇

开放式中文事件检测研究

严浩, 许洪波*, 沈英汉, 程学旗   

  1. 中国科学院计算技术研究所网络数据科学与技术重点实验室,北京100080
  • 收稿日期:2019-10-10 发布日期:2020-04-02
  • 通讯作者: 许洪波(1975—),男,山东栖霞人,中国科学院计算技术研究所副研究员。E-mail:hbxu@ict.ac.cn
  • 基金资助:
    国家重点研发计划项目(2016QY03D0504)

Research on Open Chinese Event Detection

YAN Hao, XU Hongbo*, SHEN Yinghan, CHENG Xueqi   

  1. Key Laboratory of Network Data Science and Technology,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,China
  • Received:2019-10-10 Published:2020-04-02

摘要: 在中文事件检测任务中,存在着领域互相独立,领域间的数据无法互通,需要分别为每个领域标注大量数据的问题。本文充分借鉴前人的研究,提出一个基于迁移学习的开放式中文事件检测方法。首先基于两个触发词关联假设,一个是同一个事件类型下,触发词与触发词在语义空间上有着一定的关联,且关联性较强。第二个是不同事件类型之间的触发词和触发词之间也存在着一定的关联,不过其关联性弱于相同事件类型下触发词之间的关联性。之后借助外部词典,构建候选词与种子触发词的关系特征以及候选词的上下文特征,再利用卷积神经网络构建事件检测的基础模型和迁移模型。最后对于新领域下的事件检测,只需要借助极少量的已知领域的标注数据即可完成。在ACE2005的中文事件数据集上,该方法在触发词识别这项任务上仅用20%的数据,其效果即可超越当前的主流方法。

关键词: 事件检测, 迁移学习, 触发词, 卷积神经网络

Abstract: In the task of Chinese event detection, there is a problem that domains are independent from each other, and data among domains can not be exchanged. It is necessary to label a large number of data for each domain. Based on previous studies, an open Chinese event detection method based on transfer learning is proposed in this paper. Two association hypotheses of trigger words are studied. The first one is that under the same event type, trigger words are strongly relevant in semantic space with each other. The other one is that among different event types, trigger words are also related with each other, but their relationship are weaker than those under the same event type. Based on the hypotheses, the relationship between candidate words and seed trigger words and the contextual features of candidate words are constructed with the help of external dictionaries. Then,the basic model and the transfer model of event detection are constructed by using convolutional neural network. Finally, only a small amount of tagged data is needed to detect events in the new domain. On ACE2005 Chinese event data set, this method only uses 20% of the data for trigger word recognition,and its effect can surpass the current mainstream method.

Key words: event extraction, transfer learing, seed, convolutional neural network

中图分类号: 

  • TP391
[1] 朱靖波,姚天顺.中文信息自动抽取[J].东北大学学报(自然科学版),1998,19(1):52-54.DOI:10.3321/j.issn:1005-3026.1998.01.015.
[2] 赵妍妍,秦兵,车万翔,等.中文事件抽取技术研究[J].中文信息学报,2008,22(1):3-8.DOI:10.3969/j.issn. 1003-0077.2008.01.001.
[3] 李培峰,周国栋,朱巧明.基于语义的中文事件触发词抽取联合模型[J].软件学报,2016,27(2):280-294.DOI: 10.13328/j.cnki.jos.004833.
[4] 轩小星,廖涛,高贝贝.中文事件触发词的自动抽取研究[J].计算机与数字工程,2015,43(3):457-461.DOI: 10.3969/j.issn1672-9722.2015.03.026.
[5] 秦彦霞,张民,郑德权.神经网络事件抽取技术综述[J].智能计算机与应用,2018,8(3):1-5,10.DOI:10.3969/j.issn.2095-2163.2018.03.002.
[6] 高强,游宏梁.事件抽取技术研究综述[J].情报理论与实践,2013,36(4):114-117,128.DOI:10.16353/j.cnki.1000-7490.2013.04.011.
[7] AHN D.The stages of event extraction[C]//ARTE'06:Proceedings of the Workshop on Annotating and Reasoning about Time and Events.Stroudsburg,PA:Association for Computational Linguistics, 2006:1-8.DOI:10.3115/1629235.1629236.
[8] BORSJE J,HOGENBOOM F,FRASINCAR F.Semi-automatic financial events discovery based on lexico- semantic patterns[J].International Journal of Web Engineering and Technology,2010,6(2):115-140. DOI:10.1504/IJWET.2010.038242.
[9] HEARST M A.Automated discovery of wordnet relations[M]//FELLBAUM C.WordNet:An Electronic Lexical Database and Some of its Applications.Cambridge,MA:MIT Press,1998:131-151.
[10]OKAMOTO M,KIKUCHI M.Discovering volatile events in your neighborhood: local-area topic extraction from blog entries[C]//5th Asia Information Retrieval Symposium(AIRS 2009):LNCS Vol.5839.Berlin: Sringer,2009:181-192.DOI:10.1007/978-3-642-04769-5_16.
[11]JI Heng,GRISHMAN R.Refining event extraction through cross-document inference[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.Stroudsburg,PA:Association for Computational Linguistics,2008:254-262.
[12]CHEN Yubo,XU Liheng,LIU Kang,et al.Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing(Volume 1: Long Papers).Stroudsburg,PA:Association for Computational Linguistics,2015:167-176.DOI: 10.3115/v1/P15-1017.
[13]NGUYEN T H,CHO K,GRISHMAN R.Joint event extraction via recurrent neural networks[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg,PA:Association for Computational Linguistics, 2016:300-309.DOI:10.18653/v1/N16-1034.
[14]LIN Hongyu,LU Yaojie,HAN Xianpei,et al.Nugget proposal networks for Chinese event detection[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg,PA:Association for Computational Linguistics,2018:1565-1574.DOI: 10.18653/v1/P18-1145.
[15]王红斌,沈强,线岩团.融合迁移学习的中文命名实体识别[J].小型微型计算机系统,2017,38(2):346-351.
[16]LEVY O,SEO M,CHOI E,et al.Zero-shot relation extraction via reading comprehension[C]//Proceedings of the 21st Conference on Computational Natural Language Learning(CoNLL 2017).Stroudsburg,PA: Association for Computational Linguistics,2017:333-342.DOI:10.18653/v1/K17-1034.
[17]HUANG Lifu,JI Heng,CHO K,et al.Zero-shot transfer learning for event extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg,PA:Association for Computational Linguistics,2018:2160-2170.
[18]GHAEINI R,Fern X,HUANG Liang,et al.Event nugget detection with forward-backward recurrent neural networks[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).Stroudsburg,PA:Association for Computational Linguistics, 2016:369-373.DOI:10.18653/v1/P16-2060.
[19]ZENG Ying,YANG Honghui,FENG Yansong,et al.A convolution BiLSTM neural network model for Chinese event extraction[C]//Natural Language Understanding and Intelligent Applications:5th CCF Conference on Natural Language Processing and Chinese Computing,NLPCC 2016,and 24th International Conference on Computer Processing of Oriental Languages,ICCPOL 2016:LNCS Vol.10102.Berlin: Springer,2016:275-287.DOI:10.1007/978-3-319-50496-4_23.
[20]FENG Xiaocheng,HUANG Lifu,TANG Duyu,et al.A language-independent neural network for event detection[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 2: Short Papers).Stroudsburg,PA:Association for Computational Linguistics, 2016:66-71.DOI:10.18653/v1/P16-2011.
[21]CHEN Chen,Ng V.Joint modeling for Chinese event extraction with rich linguistic features[C]//Proceedings of COLING 2012:Technical Papers.Stroudsburg,PA:Association for Computational Linguistics,2012:529-544.
[1] 白捷, 高海力, 王永众, 杨来邦, 项晓航, 楼雄伟. 基于多路特征融合的Faster R-CNN与迁移学习的学生课堂行为检测[J]. 广西师范大学学报(自然科学版), 2020, 38(5): 1-11.
[2] 范瑞,蒋品群,曾上游,夏海英,廖志贤,李鹏. 多尺度并行融合的轻量级卷积神经网络设计[J]. 广西师范大学学报(自然科学版), 2019, 37(3): 50-59.
[3] 武文雅, 陈钰枫, 徐金安, 张玉洁. 基于高层语义注意力机制的中文实体关系抽取[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 32-41.
[4] 薛洋,曾庆科,夏海英,王文涛. 基于卷积神经网络超分辨率重建的遥感图像融合[J]. 广西师范大学学报(自然科学版), 2018, 36(2): 33-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 覃盈盈, 漆光超, 梁士楚. 凤眼莲组织浸提液对靖西海菜花种子萌发的影响[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 87 -92 .
[2] 庄枫红, 马姜明, 张雅君, 苏静, 于方明. 中华水韭对不同光照条件的生理生态响应[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 93 -100 .
[3] 韦宏金, 周喜乐, 金冬梅, 严岳鸿. 湖南蕨类植物增补[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 101 -106 .
[4] 林永生, 裴建国, 邹胜章, 杜毓超, 卢丽. 清江下游红层岩溶及其水化学特征[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 113 -120 .
[5] 滕志军, 吕金玲, 郭力文, 许媛媛. 基于改进粒子群算法的无线传感器网络覆盖策略[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 9 -16 .
[6] 温玉卓, 唐胜达, 邓国和. 随机环境下具有阈值分红策略的风险过程的破产时间分析[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 56 -62 .
[7] 陈梦华,刘敏,王宁. Weizscker-Skyrme核质量公式的理论预言能力研究[J]. 广西师范大学学报(自然科学版), 2018, 36(1): 1 -8 .
[8] 林越,刘廷章,陈一凡,金勇,梁立新. 基于AP-HMM混合模型的充电桩故障诊断[J]. 广西师范大学学报(自然科学版), 2018, 36(1): 25 -33 .
[9] 唐国吉,赵婷,何登旭. 扰动广义混合变分不等式的可解性[J]. 广西师范大学学报(自然科学版), 2018, 36(1): 76 -83 .
[10] 梁艳,周德雄,薛佳津,刘晓波,李俊,杨瑞云. 中药血散薯中非生物碱类化学成分研究[J]. 广西师范大学学报(自然科学版), 2018, 36(1): 95 -98 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发