可比较语料库构建及在跨语言信息检索中的应用

Journal of Guangxi Normal University(Natural Science Edition) ›› 2010, Vol. 28 ›› Issue (3): 126-130.

Previous Articles Next Articles

Acquisition of Comparable and Its Application in CLIR

FANG Lu¹, GE Yun-dong¹, HONG Yu¹, YAO Jian-ming^1,2

1. School of Computer Science and Technology,Soochow University,Suzhou Jiangsu 215006,China;
2. Office of Science and Technology of Suzhou,Suzhou Jiangsu 215006,China

Received:2010-06-05 Online:2010-09-20 Published:2023-02-06

Abstract

Abstract: This paper studies the acquisition of comparable corpora and its application in cross-language information retrieval (CLIR).First,download news articles from news sites,and align them with Lucene,and acquire comparable corpora.Then translation knowledge is extracted from the aligned articles.At last,apply the translation knowledge on TDT4 to test the performance of CLIR system.Theexperiments show that the translation knowledge could improve the performance of CLIR,achieve the MAP value of 0.272 8,35.44 percentage points higher than the method based on dictionary.

Key words: comparable corpora, translation knowledge extraction, context vector, cross-language information retrieval, query translation

CLC Number:

TP391

FANG Lu, GE Yun-dong, HONG Yu, YAO Jian-ming. Acquisition of Comparable and Its Application in CLIR[J].Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 126-130.

References

[1] TAO Tao,ZHAI Cheng-xiang.Mining comparable bilingual text corporafor cross-language information integration[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge and Data Mining.New York:ACM Press,2005:691-696.
[2] VU T,AW A T,ZHANG Min.Feature-based method for document alignmentin comparable news corpora[C]//Proceeding s of the 12th Conference of the European Chapter of the ACL.Morristown,NJ:ACL,2009:843-851.
[3] TUOMAS T,ARI P,KALERVO J,et al.Focused web crawling in the acquisition of comparable corpora[J].Information Retrieval,2008,11(5):427-445.
[4] RAPP R.Identifying word translations in non-parallel texts[C]//Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics.Morristown,NJ:ACL,1995:320-322.
[5] FUNG P.A statistical view on bilingual lexicon extraction:from parallel corpora to non-parallel corpora[C]//Machine Translation and the Information Soup;LNCS Vol 1529.Berlin:Springer-Verlag,1998:1-17.
[6] TALVENSAARI T.Effects of aligned corpus quality and size in corpus-based CLIR[C]//Proceedings of the IR Research,30th European Conference on Advances in Information Retrieval.Berlin:Springer-Verlag,2008:114-125.
[7] CHENG Pu-jen,TENG Jei-wen,CHEN Ruei-cheng,et al.Translating unknown queries with web corpora for cross-language information retrieval[C]//Proceeding of 27th AnnualInternational ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2004:146-153.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Acquisition of Comparable and Its Application in CLIR

Abstract

Cite this article

share this article

References

Related Articles 1

Metrics

Comments

Recommended 0