|
广西师范大学学报(自然科学版) ›› 2010, Vol. 28 ›› Issue (3): 126-130.
房璐1, 葛运东1, 洪宇1, 姚建民1,2
FANG Lu1, GE Yun-dong1, HONG Yu1, YAO Jian-ming1,2
摘要: 本文主要研究基于网络的可比较语料库的构建以及其在跨语言信息检索(CLIR)中的应用。首先从新闻网站获取新闻文本,用Lucene进行对齐,构建可比较语料库;其次,利用上下文信息从对齐的文本中抽取翻译知识;最后,用获得的翻译知识在TDT4语料上进行CLIR性能实验。实验表明,所抽取的翻译知识可以提高CLIR的性能,取得了0.272 8的MAP值,相对于基于本地词典的方法提高了35.44个百分点。
中图分类号:
[1] TAO Tao,ZHAI Cheng-xiang.Mining comparable bilingual text corporafor cross-language information integration[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge and Data Mining.New York:ACM Press,2005:691-696. [2] VU T,AW A T,ZHANG Min.Feature-based method for document alignmentin comparable news corpora[C]//Proceeding s of the 12th Conference of the European Chapter of the ACL.Morristown,NJ:ACL,2009:843-851. [3] TUOMAS T,ARI P,KALERVO J,et al.Focused web crawling in the acquisition of comparable corpora[J].Information Retrieval,2008,11(5):427-445. [4] RAPP R.Identifying word translations in non-parallel texts[C]//Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics.Morristown,NJ:ACL,1995:320-322. [5] FUNG P.A statistical view on bilingual lexicon extraction:from parallel corpora to non-parallel corpora[C]//Machine Translation and the Information Soup;LNCS Vol 1529.Berlin:Springer-Verlag,1998:1-17. [6] TALVENSAARI T.Effects of aligned corpus quality and size in corpus-based CLIR[C]//Proceedings of the IR Research,30th European Conference on Advances in Information Retrieval.Berlin:Springer-Verlag,2008:114-125. [7] CHENG Pu-jen,TENG Jei-wen,CHEN Ruei-cheng,et al.Translating unknown queries with web corpora for cross-language information retrieval[C]//Proceeding of 27th AnnualInternational ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2004:146-153. |
[1] | 代佳洋, 周栋. 基于多任务学习的跨语言信息检索方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(6): 69-81. |
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |