Journal of Guangxi Normal University(Natural Science Edition) ›› 2010, Vol. 28 ›› Issue (3): 126-130.

Previous Articles     Next Articles

Acquisition of Comparable and Its Application in CLIR

FANG Lu1, GE Yun-dong1, HONG Yu1, YAO Jian-ming1,2   

  1. 1. School of Computer Science and Technology,Soochow University,Suzhou Jiangsu 215006,China;
    2. Office of Science and Technology of Suzhou,Suzhou Jiangsu 215006,China
  • Received:2010-06-05 Online:2010-09-20 Published:2023-02-06

Abstract: This paper studies the acquisition of comparable corpora and its application in cross-language information retrieval (CLIR).First,download news articles from news sites,and align them with Lucene,and acquire comparable corpora.Then translation knowledge is extracted from the aligned articles.At last,apply the translation knowledge on TDT4 to test the performance of CLIR system.Theexperiments show that the translation knowledge could improve the performance of CLIR,achieve the MAP value of 0.272 8,35.44 percentage points higher than the method based on dictionary.

Key words: comparable corpora, translation knowledge extraction, context vector, cross-language information retrieval, query translation

CLC Number: 

  • TP391
[1] TAO Tao,ZHAI Cheng-xiang.Mining comparable bilingual text corporafor cross-language information integration[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge and Data Mining.New York:ACM Press,2005:691-696.
[2] VU T,AW A T,ZHANG Min.Feature-based method for document alignmentin comparable news corpora[C]//Proceeding s of the 12th Conference of the European Chapter of the ACL.Morristown,NJ:ACL,2009:843-851.
[3] TUOMAS T,ARI P,KALERVO J,et al.Focused web crawling in the acquisition of comparable corpora[J].Information Retrieval,2008,11(5):427-445.
[4] RAPP R.Identifying word translations in non-parallel texts[C]//Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics.Morristown,NJ:ACL,1995:320-322.
[5] FUNG P.A statistical view on bilingual lexicon extraction:from parallel corpora to non-parallel corpora[C]//Machine Translation and the Information Soup;LNCS Vol 1529.Berlin:Springer-Verlag,1998:1-17.
[6] TALVENSAARI T.Effects of aligned corpus quality and size in corpus-based CLIR[C]//Proceedings of the IR Research,30th European Conference on Advances in Information Retrieval.Berlin:Springer-Verlag,2008:114-125.
[7] CHENG Pu-jen,TENG Jei-wen,CHEN Ruei-cheng,et al.Translating unknown queries with web corpora for cross-language information retrieval[C]//Proceeding of 27th AnnualInternational ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2004:146-153.
[1] DAI Jiayang, ZHOU Dong. Research on Cross-Language Information Retrieval Method Based on Multi-task Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(6): 69-81.
Full text



No Suggested Reading articles found!