基于样本难度的神经机器翻译动态学习方法

doi:10.16088/j.issn.1001-6600.2020082602

广西师范大学学报（自然科学版） ›› 2021, Vol. 39 ›› Issue (2): 13-20.doi: 10.16088/j.issn.1001-6600.2020082602

基于样本难度的神经机器翻译动态学习方法

王素^1,2, 范意兴^1,2, 郭嘉丰^1,2*, 张儒清^1,2, 程学旗^1,2

1.中国科学院计算技术研究所网络数据科学与技术重点实验室, 北京 100190;
2.中国科学院大学, 北京 100049

收稿日期:2020-08-26 修回日期:2020-09-22 出版日期:2021-03-25 发布日期:2021-04-15
通讯作者: 郭嘉丰(1980—),男,江苏江阴人,中国科学院研究员,博导。E-mail:guojiafeng@ict.ac.cn
基金资助:
北京智源人工智能研究院(BAAI2019ZD0306);国家自然科学基金(61722211,61872338,61902381);中国科学院青年创新促进会(20144310);国家重点研发计划(2016QY02D0405);联想-中科院联合实验室青年科学家项目;重庆市基础科学与前沿技术研究专项(cstc2017jcjyBX0059);泰山学者工程专项(ts201511082)

Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty

WANG Su^1,2, FAN Yixing^1,2, GUO Jiafeng^1,2^*, ZHANG Ruqing^1,2, CHENG Xueqi^1,2

1. Key Laboratory of Network Data Science & Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China

Received:2020-08-26 Revised:2020-09-22 Online:2021-03-25 Published:2021-04-15

摘要/Abstract

摘要： 近年来,神经机器翻译模型已经成为机器翻译领域的主流模型,如何从大量的训练数据中快速、准确地学习翻译知识是一个值得探讨的问题。不同训练样本的难易程度不同,样本的难易程度对模型的收敛性有极大影响,但是传统的神经机器翻译模型在训练过程中并没有考虑这种差异性。本文探究样本的难易程度对神经机器翻译模型训练过程的影响,基于“课程学习”的思想,为神经机器翻译模型提出了一种基于样本难度的动态学习方法:分别从神经机器翻译模型的翻译效果和训练样本的句子长度2方面量化训练样本的难易程度;设计了由易到难和由难到易2种学习策略训练模型,并比较模型的翻译效果。

关键词: 神经机器翻译, 课程学习, 样本难度, 动态学习

Abstract: In recent years, neural machine translation model has become the mainstream model in the field of machine translation. How to learn translation knowledge quickly and accurately from a large amount of training data is a problem worthy of discussion. Different training samples have different degrees of difficulty. Some training samples are simpler and easy for model to learn, while others are more difficult and not easy for model to learn. The difficulty of the samples has a great influence on the convergence of the model, but the traditional neural machine translation model does not consider this difference in the training process. Therefore, this paper explores the influence of the difficulty of the samples on the training process of the neural machine translation model. Considering the sample difficulty for the neural machine translation mode, a dynamic learning method is proposed based on the idea of “curriculum learning”. The difficulty degree of the training samples is quantified from the aspects of the translation effect of the neural machine translation model and the sentence length of the training samples, respectively, then, two learning strategies are designed from-easy-to-difficult and from-difficult-to-easy to train the model. Finally, the translation effects of the model are compared. The experimental results show that both from-easy-to-difficult and from-difficult-to-easy dynamic learning methods can improve the translation effect of the neural machine translation model.

Key words: neural machine translation, curriculum learning, sample difficulty, dynamic learning

中图分类号:

TP391

王素, 范意兴, 郭嘉丰, 张儒清, 程学旗. 基于样本难度的神经机器翻译动态学习方法[J]. 广西师范大学学报（自然科学版）, 2021, 39(2): 13-20.

WANG Su, FAN Yixing, GUO Jiafeng, ZHANG Ruqing, CHENG Xueqi. Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty[J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 13-20.

参考文献

[1] 叶绍林.基于注意力机制编解码框架的神经机器翻译方法研究[D].合肥:中国科学技术大学,2019.
[2] WANG R,UTIYAMA M,SUMITA E.Dynamic sentence sampling for efficient training of neural machine translation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).Stroudsburg,PA:Association for Computational Linguistics,2018:298-304.
[3] BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York,NY:Association for Computing Machinery,2009:41-48.
[4] KALCHBRENNER N,BLUNSOM P.Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2013:1700-1709.
[5] CHO K,Van MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2014:1724-1734.
[6] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems:Volume 2.Cambridge,MA:MIT Press,2014:3104-3112.
[7] BAHDANAU D,CHO K H,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].(2016-05-19)[2020-08-26].https://arxiv.org/pdf/1409.0473.pdf.
[8] WU Y H,SCHUSTER M,CHEN Z F,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[EB/OL].(2016-10-08)[2020-08-26].https://arxiv.org/pdf/1609.08144.pdf.
[9] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[J].Proceedings of Machine Learning Research,2017,70:1243-1252.
[10] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[11] TSVETKOV Y,FARUQUI M,LING W,et al.Learning the curriculum with bayesian optimization for task-specific word representation learning[C]//Proceedings of the 54th Annual Meeting on Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg,PA:Association for Computational Linguistics,2016:130-139.DOI:10.18653/v1/P16-1013.
[12] CIRIK V,HOVY E,MORENCY L P.Visualizing and understanding curriculum learning for long short-term memory networks[EB/OL].(2016-11-18)[2020-08-26].https://arxiv.org/pdf/1611.06204.pdf.
[13] KOCMI T,BOJAR O.Curriculum learning and minibatch bucketing in neural machine translation[EB/OL].(2017-07-29)[2020-08-26].https://arxiv.org/pdf/1707.09533v1.pdf.
[14] ZHANG X,KUMAR G,KHAYRALLAH H,et al.An empirical exploration of curriculum learning for neural machine translation[EB/OL].(2018-11-02)[2020-08-26].https://arxiv.org/pdf/1811.00739.pdf.
[15] KUDO T,RICHARDSON J.SentencePiece:A simple and language independent subword tokenizer and detokenizer for neural text processing[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.Stroudsburg,PA:Association for Computational Linguistics,2018:66-71.DOI:10.18653/v1/D18-2012.
[16] PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:311-318.DOI:10.3115/1073083.1073135.

Metrics

Viewed

Full text

751

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	751

From	Others	local

Times	55	696
Rate	7%	93%

Abstract

213

Just accepted	Online first	Issue

0	0	213

From	Others	local

Times	211	2
Rate	99%	1%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

基于样本难度的神经机器翻译动态学习方法

Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 10

[1]	杨州, 范意兴, 朱小飞, 郭嘉丰, 王越. 神经信息检索模型建模因素综述[J]. 广西师范大学学报（自然科学版）, 2021, 39(2): 1-12.
[2]	禚明, 刘乐源, 周世杰, 杨鹏, 万思敏. 一种空间信息网络抗毁分析的新方法[J]. 广西师范大学学报（自然科学版）, 2021, 39(2): 21-31.
[3]	邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报（自然科学版）, 2021, 39(2): 32-40.
[4]	徐庆婷, 张兰芳, 朱新华. 综合语义技术与LSTM神经网络的主观题自适应评卷方法[J]. 广西师范大学学报（自然科学版）, 2021, 39(2): 51-61.
[5]	朱勇建, 罗坚, 秦运柏, 秦国峰, 唐楚柳. 基于光度立体和级数展开法的金属表面缺陷检测方法[J]. 广西师范大学学报（自然科学版）, 2020, 38(6): 21-31.
[6]	唐熔钗, 伍锡如. 基于改进YOLO-V3网络的百香果实时检测[J]. 广西师范大学学报（自然科学版）, 2020, 38(6): 32-39.
[7]	张灿龙, 李燕茹, 李志欣, 王智文. 基于核相关滤波与特征融合的分块跟踪算法[J]. 广西师范大学学报（自然科学版）, 2020, 38(5): 12-23.
[8]	王健, 郑七凡, 李超, 石晶. 基于ENCODER_ATT机制的远程监督关系抽取[J]. 广西师范大学学报（自然科学版）, 2019, 37(4): 53-60.
[9]	肖逸群, 宋树祥, 夏海英. 基于多特征的快速行人检测方法及实现[J]. 广西师范大学学报（自然科学版）, 2019, 37(4): 61-67.
[10]	王勋, 李廷会, 潘骁, 田宇. 基于改进模糊C均值聚类与Otsu的图像分割方法[J]. 广西师范大学学报（自然科学版）, 2019, 37(4): 68-73.
[11]	陈凤,蒙祖强. 基于BTM和加权K-Means的微博话题发现[J]. 广西师范大学学报（自然科学版）, 2019, 37(3): 71-78.
[12]	张随远, 薛源海, 俞晓明, 刘悦, 程学旗. 多文档短摘要生成技术研究[J]. 广西师范大学学报（自然科学版）, 2019, 37(2): 60-74.
[13]	孙容海, 施林甫, 黄丽艳, 唐振军, 俞春强. 基于图像插值和参考矩阵的可逆信息隐藏算法[J]. 广西师范大学学报（自然科学版）, 2019, 37(2): 90-104.
[14]	朱勇建, 彭柯, 漆广文, 夏海英, 宋树祥. 基于机器视觉的太阳能网版缺陷检测[J]. 广西师范大学学报（自然科学版）, 2019, 37(2): 105-112.
[15]	王祺, 邱家辉, 阮彤, 高大启, 高炬. 基于循环胶囊网络的临床语义关系识别研究[J]. 广西师范大学学报（自然科学版）, 2019, 37(1): 80-88.