|
广西师范大学学报(自然科学版) ›› 2021, Vol. 39 ›› Issue (2): 1-12.doi: 10.16088/j.issn.1001-6600.2020082603
• CCIR2020 • 下一篇
杨州1,2, 范意兴3, 朱小飞1*, 郭嘉丰3, 王越2
YANG Zhou1,2, FAN Yixing3, ZHU Xiaofei1*, GUO Jiafeng3, WANG Yue2
摘要: 信息检索模型被广泛运用于搜索引擎中,且在工业领域被广泛应用。信息检索任务中,模型对信号量的侧重建模导致模型指标差异巨大。目前模型大部分基于以下部分或全部信息建模:精确信号量、相似信号量、信号量区分度、查询词权重、临近量、文本结构信息、不同分布假设。本文介绍各个建模因素的具体含义,并通过引用相关实验例证该因素对于建模起到的积极作用。基于以上实验及分析,最后对信息检索模型的未来发展及趋势作进一步讨论和分析。
中图分类号:
[1] YANG Y,YIH S W,MEEK C.WikiQA:a challenge dataset for open-domain question answering[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudburg,PA:ACL,2015:2013-2018. [2] RAJPURKAR P,ZHANG J,LOPYREV K,et al.SQuAD:100,000+ questions for machine comprehension of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.Stroudburg,PA:ACL, 2016:2383-2392. [3] YANG L,QIU M,GOTTIPATI S,et al.CQARank:jointly model topics and expertise in community question answering[C]//Proceedings of the 22nd ACM international conference on Conference on information &knowledge management.New York:ACM,2013:99-108. [4] LECUN Y,BENGIO Y,HINTON G.Deep learning[J]. Nature,2015,521(7553):436-444. [5] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011,12(1):2493-2537. [6] VINYALS O,KAISER L,KOO T,et al.Grammar as a foreign language[EB/OL].(2015-06-09)[2020-08-26].https://arxiv.org/abs/1412.7449. [7] LI H,XU J.Semantic matching in search[J]. Foundations and Trends in Information Retrieval,2014,7(5):343-469. [8] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].(2013-09-07)[2020-08-26].https://arxiv.org/abs/1301.3781. [9] PANG L,LAN Y Y, GUO J F,et al.A deep investigation of deep IR models[EB/OL].(2017-07-24)[2020-08-26].https://arxiv.org/abs/1707.07700. [10] FANG H,TAO T,ZHAI C X.Diagnostic evaluation of information retrieval models[J]. ACM Transactions on Information Systems,2011:7. [11] 庞亮,兰艳艳,徐君,等.深度文本匹配综述[J].计算机学报,2017,40(4): 985-1003. [12] CHUKLIN A,MARKOV I,RIJKE M D.Click models for Web search[J]. Synthesis Lectures on Information Concepts Retrieval &Services,2015,7(3):1-115. [13] LIU Y Q,XIE X H,WANG C,et al.Time-aware click model[J]. ACM Transactions on Information Systems,2016,35(3):16. [14] ROBERTSON S E,WALKER S.Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Berlin:Springer-Verlang,1994:232-241. [15] ZHAI C,LAFFERTY J.A study of smoothing methods for language models applied to Ad Hoc information retrieval[C]//Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2001:333-342. [16] HU B T, LU Z D, LI H,et al.Convolutional neural network architectures for matching natural language sentences[EB/OL].(2015-03-11)[2020-08-26].https://arxiv.org/abs/1503.03244v1. [17] HUANG P S,HE X D,GAO J F,et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information &Knowledge Management. New York: ACM,2013:2333-2338. [18] SHEN Y L,HE X D, GAO J F,et al. Learning semantic representations using convolutional neural networks for web search[C]//Proceedings of the 23rd International Conference on World Wide Web.New York:ACM,2014:373-374. [19] GUO J F,FAN Y X,AI Q Y,et al.A deep relevance matching model for ad-hoc retrieval[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.New York:ACM,2016:55-64. [20] FAN Y X,GUO J F, LAN Y Y,et al. Modeling diverse relevance patterns in Ad-hoc retrieval[C]//The 41st International ACM SIGIR Conference on Research &Development in Information Retrieval.New York:ACM,2018:375-384. [21] MITRA B,DIAZ F,CRASWELL N.Learning to match using local and distributed representations of text for web search [EB/OL].(2016-10-26)[2020-08-26].https://arxiv.org/abs/1610.08136. [22] GRAVES A.Offline handwriting recognition with multidimensional recurrent neural networks[M]//MÄRGNER V,EL ABED H.Guide to OCR for Arabic Scripts.London:Springer,2012:297-313. [23] CHO K,VAN MERRIENBOER B, GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudburg,PA:ACL,2014:1724-1734. [24] WAN S X,LAN Y Y,XU J,et al.Match-SRNN:modeling the recursive matching structure with spatial RNN[J].Computers &Graphics,2016,28(5):731-745. [25] TAO T,ZHAI C X.An exploration of proximity measures in information retrieval[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2007:295-302. [26] PANG L,LAN Y Y,GUO J F,et al.Deeprank:a new deep architecture for relevance ranking in information retrieval[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.New York: ACM,2017:257-266. [27] PANG L,LAN Y Y,GUO J F,et al.Text matching as image recognition[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.Palo Alto,CA:AAAI Press,2016:2793-2799. [28] LECUN Y,BOTTOU L.Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,1998,86(11):2278-2324. [29] LEVIN E.A recurrent neural network:limitations and training[J]. Neural Networks,1990,3(6):641-650. [30] DATAR M,Immorlica N,Indyk P,et al.Locality sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the Twentieth Annual Symposium on Computational Geometry.New York: ACM,2004:253-262. [31] DAI Z Y,XIONG C Y,CALLAN J,et al.Convolutional neural networks for soft-matching N-grams in Ad-hoc search[C]//Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining.New York:ACM,2018:126-134. [32] XIONG C Y,DAI Z Y,CALLAN J,et al.End-to-end neural ad-hoc ranking with kernel pooling[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2017:55-64. [33] HUI K,YATES A,BERBERICH K,et al.PACRR:a position-aware neural IR model for relevance atching[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Stroudburg,PA:ACL,2017:1049-1058. [34] PONTES J,JOÃO D,CARVALHO R A D,et al.Information retrieval to knowledge retrieval:reflections and proposals[J].Perspectives em Ciência da Informaco,2013,18(4):2-17. [35] GABRILOVICH E,MARKOVITCH S.Wikipedia-based semantic interpretation for natural language processing[J].Journal of Artificial Intelligence Research,2009,34:443-498. [36] WU H C,LUK R W P,WONG K F,et al.A retrospective study of a hybrid document-context based retrieval model[J].Information Processing &Management,2007 43(5): 1308-1331. |
[1] | 邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 32-40. |
[2] | 薛涛, 丘森辉, 陆豪, 秦兴盛. 基于经验模态分解和多分支LSTM网络汇率预测[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 41-50. |
[3] | 唐熔钗, 伍锡如. 基于改进YOLO-V3网络的百香果实时检测[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 32-39. |
[4] | 张明宇, 赵猛, 蔡夫鸿, 梁钰, 王鑫红. 基于深度学习的波浪能发电功率预测[J]. 广西师范大学学报(自然科学版), 2020, 38(3): 25-32. |
[5] | 葛奕飞, 郑彦斌. 带有纠删或纠错性质的隐私保护信息检索方案[J]. 广西师范大学学报(自然科学版), 2020, 38(3): 33-44. |
[6] | 李维勇, 柳斌, 张伟, 陈云芳. 一种基于深度学习的中文生成式自动摘要方法[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 51-63. |
[7] | 严浩, 许洪波, 沈英汉, 程学旗. 开放式中文事件检测研究[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 64-71. |
[8] | 刘英璇, 伍锡如, 雪刚刚. 基于深度学习的道路交通标志多目标实时检测[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 96-106. |
[9] | 范瑞,蒋品群,曾上游,夏海英,廖志贤,李鹏. 多尺度并行融合的轻量级卷积神经网络设计[J]. 广西师范大学学报(自然科学版), 2019, 37(3): 50-59. |
[10] | 张金磊, 罗玉玲, 付强. 基于门控循环单元神经网络的金融时间序列预测[J]. 广西师范大学学报(自然科学版), 2019, 37(2): 82-89. |
[11] | 黄丽明, 陈维政, 闫宏飞, 陈翀. 基于循环神经网络和深度学习的股票预测方法[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 13-22. |
[12] | 武文雅, 陈钰枫, 徐金安, 张玉洁. 基于高层语义注意力机制的中文实体关系抽取[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 32-41. |
[13] | 岳天驰, 张绍武, 杨亮, 林鸿飞, 于凯. 基于两阶段注意力机制的立场检测方法[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 42-49. |
[14] | 余传明, 李浩男, 安璐. 基于多任务深度学习的文本情感原因分析[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 50-61. |
[15] | 林原, 刘海峰, 林鸿飞, 许侃. 基于损失函数融合的组排序学习方法[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 62-70. |
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |