Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (6): 69-81.doi: 10.16088/j.issn.1001-6600.2022022201

Previous Articles     Next Articles

Research on Cross-Language Information Retrieval Method Based on Multi-task Learning

DAI Jiayang, ZHOU Dong*   

  1. School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan Hunan 411201, China
  • Received:2022-02-22 Revised:2022-03-07 Online:2022-11-25 Published:2023-01-17

Abstract: Cross-language information retrieval is one of the important tasks in the field of information retrieval. Existing cross-lingual neural retrieval methods usually use single-task learning, and the single feature capture model limits the performance of neural retrieval models. Therefore, a cross-language retrieval method based on multi-task learning is proposed, which uses a text classification task as a secondary task and captures feature information of both tasks simultaneously using a shared text feature extraction layer so that it learns the feature patterns of different tasks, and then feeds the feature vectors into the neural retrieval model and the text classification model to complete the two tasks, respectively. In addition, the external corpus introduced by the text classification task also plays a role in data augmentation to a certain extent, further increasing the level of feature information. Experiments conducted on four language pairs from the CLEF 2000-2003 dataset show that the present method significantly improves the text feature extraction and thus enhances the neural retrieval model performance, increasing the MAP values of the neural retrieval model by 0.012-0.188 and increased the speed of model convergence by an average of 24.3%.

Key words: information retrieval, multi-task learning, cross-language information retrieval, neural retrieval model, external corpus

CLC Number: 

  • TP391.3
[1] 周栋, 赵文玉, 伍璇, 等. 个性化跨语言信息检索中结果重排序研究[J].计算机工程与科学, 2017, 39(10): 1923-1929. DOI: 10.3969/j.issn.1007-130X.2017.10.022.
[2] 王灿辉, 张敏, 马少平. 自然语言处理在信息检索中的应用综述[J].中文信息学报, 2007, 21(2): 35-45. DOI: 10.3969/j.issn.1003-0077.2007.02.006.
[3] 苏祺, 昝红英, 胡景贺, 等. 词性标注对信息检索系统性能的影响[J].中文信息学报, 2005, 19(2): 58-65. DOI: 10.3969/j.issn.1003-0077.2005.02.009.
[4] PANG L, LAN YY, Guo J F, et al. Text matching as image recognition[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2016, 30(1): 2793-2799. DOI: 10.1609/aaai.v30i1.10341.
[5] XIONG C Y, DAIZ Y, CALLAN J, et al. End-to-end neural ad-hoc ranking with kernel pooling[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York, NY: ACM, 2017: 55-64. DOI: 10.1145/3077136.3080809.
[6] GUO J F, FANY X, PANGL, et al. A deep look into neural ranking models for information retrieval[J].Information Processing & Management, 2020, 57(6): 102067. DOI: 10.1016/j.ipm.2019.102067.
[7] YU P X, ALLAN J. A study of neural matching models for cross-lingual IR[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 2020: 1637-1640. DOI: 10.1145/3397271.3401322.
[8] BONAB H, SARWAR S M, ALLAN J. Training effective neural CLIR by bridging the translation gap[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 2020: 9-18. DOI: 10.1145/3397271.3401035.
[9] 彭晓娅, 周栋. 跨语言词向量研究综述[J].中文信息学报, 2020, 34(2): 1-15, 26. DOI: 10.3969/j.issn.1003-0077.2020.02.001.
[10] 李岩, 郭军军, 余正涛, 等.基于词映射构建伪查询改善低资源跨语言信息检索研究[J].山西大学学报(自然科学版), 2022, 45(2): 322-331. DOI: 10.13451/j.sxu.ns.2021106.
[11] 戚园园. 基于特征表示学习的文本检索研究[D].北京: 北京邮电大学, 2021. DOI: 10.26969/d.cnki.gbydu.2021.000110.
[12] ZHANG Y, YANG Q. An overview of multi-task learning[J]. National Science Review, 2018, 5(1): 30-43. DOI: 10.1093/nsr/nwx105.
[13] LIU X D, GAOJ F, HEX D, et al. Representation learning using multi-task deep neural networks for semantic classification and information retrieval[C]// Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2015: 912-921. DOI: 10.3115/v1/N15-1092.
[14] NIE J Y, SIMARD M, ISABELLE P, et al. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 1999: 74-81. DOI: 10.1145/312624.312656.
[15] ELAYEB B, ROMDHANE W B, SAOUD N B B. Towards a new possibilistic query translation tool for cross-language information retrieval[J]. Multimedia Tools and Applications, 2018, 77(2): 2423-2465. DOI: 10.1007/s11042-017-4398-2.
[16] 黄名选, 蒋曹清.基于项权值排序挖掘的跨语言查询扩展[J].电子学报, 2020, 48(3): 568-576. DOI: 10.3969/j.issn.0372-2112.2020.03.021.
[17] TURE F, LIN J. Flat vs. hierarchical phrase-based translation models for cross-language information retrieval[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 2013: 813-816. DOI: 10.1145/2484028.2484137.
[18] AZARBONYAD H, SHAKERY A, FAILI H. A learning to rank approach for cross-language information retrieval exploiting multiple translation resources[J]. Natural Language Engineering, 2019, 25(3): 363-384. DOI: 10.1017/S1351324919000032.
[19] 梁少博, 朱慧宁, 吴丹.基于公共数字文化资源命名实体识别与翻译的跨语言信息检索研究[J].图书馆建设, 2022(1): 87-95. DOI: 10.19764/j.cnki.tsgjs.20211994.
[20] CHANDRA G, DWIVEDI S K. Assessing query translation quality using back translation in Hindi-English CLIR[J]. International Journal of Intelligent Systems and Applications, 2017, 9(3): 51-59. DOI: 10.5815/ijisa.2017.03.07.
[21] 马路佳, 赖文, 赵小兵.基于跨语言词向量模型的蒙汉查询词扩展方法研究[J].中文信息学报, 2019, 33(6): 27-34. DOI: 10.3969/j.issn.1003-0077.2019.06.004.
[22] LITSCHKO R, GLAVAŠ G, PONZETTO S P, et al. Unsupervised cross-lingual information retrieval using monolingual data only[C]// The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 2018: 1253-1256. DOI: 10.1145/3209978.3210157.
[23] 邹小芳, 王明文, 左家莉, 等. 新的基于中间语义的多语言信息检索模型[J]. 小型微型计算机系统, 2010, 31(4): 696-701.
[24] VULIC′ I, DE SMET W, MOENS M F. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora[J]. Information Retrieval, 2013, 16(3): 331-368. DOI: 10.1007/s10791-012-9200-5.
[25] HUO Z L, WU J F, LU Y, et al. A topic-based cross-language retrieval model with PLSA and TF-IDF[C]// 2018 IEEE 3rd International Conference on Big Data Analysis(ICBDA). Piscataway, NJ: IEEE, 2018: 340-344. DOI: 10.1109/ICBDA.2018.8367704.
[26] GLAVAŠ G, VULIC′ I. Zero-shot language transfer for cross-lingual sentence retrieval using bidirectional attention model[C]// Advances in Information Retrieval: LNCS Volume 11437. Cham: Springer, 2019: 523-538. DOI: 10.1007/978-3-030-15712-8_34.
[27] JIANG Z L, EL-JAROUDI A, HARTMANN W, et al. Cross-lingual information retrieval with BERT[C]// Proceedings of the Workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020).Paris: European Language Resources Association, 2020: 26-31.
[28] 曲琳琳.查询翻译方法研究: 以汉英跨语言信息检索为例[J].情报科学, 2021, 39(8): 132-138, 193. DOI: 10.13833/j.issn.1007-7634.2021.08.017.
[29] 叶雪,梁娟.基于平行语料库的英汉跨语言信息检索设计研究[J].电子设计工程,2021,29(17):135-138.DOI: 10.14022/j.issn1674-6236.2021.17.029.
[30] OARD D W, HE D Q, WANG J Q. User-assisted query translation for interactive cross-language information retrieval[J]. Information Processing & Management, 2008, 44(1): 181-211. DOI: 10.1016/j.ipm.2006.12.009.
[31] YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 1480-1489. DOI: 10.18653/v1/N16-1174.
[1] YANG Zhou, FAN Yixing, ZHU Xiaofei, GUO Jiafeng, WANG Yue. Survey on Modeling Factors of Neural Information Retrieval Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 1-12.
[2] GE Yifei, ZHENG Yanbin. Private Information Retrieval Schemes with Erasure-correcting or Error-correcting Properties [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(3): 33-44.
[3] YU Chuanming,LI Haonan,AN Lu. Analysis of Text Emotion Cause Based on Multi-task Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 50-61.
[4] LIN Yuan, LIU Haifeng, LIN Hongfei, XU Kan. Group Ranking Methods with Loss Function Incorporation [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 62-70.
[5] LU·· Xue-qiang, SHU Yan, SUN Li-hua, CHENG Tao. Phrase of “V+N1+N2” Structure in Search Engine Query Logs [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 109-115.
[6] FANG Lu, GE Yun-dong, HONG Yu, YAO Jian-ming. Acquisition of Comparable and Its Application in CLIR [J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 126-130.
[7] LI Ying, LIU Jing-bo. Academic Information Retrieval System Based on “Structured Digital Object” [J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(1): 82-87.
[8] LUO Wen-bing, WU Run-xiu, WANG Ming-wen, ZHU Ying-ting, XIONG Chao. Personalized Recommendation Model Based on Results Clustering Analysis [J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(1): 113-116.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!