广西师范大学学报(自然科学版) ›› 2021, Vol. 39 ›› Issue (2): 13-20.doi: 10.16088/j.issn.1001-6600.2020082602

• CCIR2020 • 上一篇    下一篇

基于样本难度的神经机器翻译动态学习方法

王素1,2, 范意兴1,2, 郭嘉丰1,2*, 张儒清1,2, 程学旗1,2   

  1. 1.中国科学院 计算技术研究所 网络数据科学与技术重点实验室, 北京 100190;
    2.中国科学院大学, 北京 100049
  • 收稿日期:2020-08-26 修回日期:2020-09-22 出版日期:2021-03-25 发布日期:2021-04-15
  • 通讯作者: 郭嘉丰(1980—),男,江苏江阴人,中国科学院研究员,博导。E-mail:guojiafeng@ict.ac.cn
  • 基金资助:
    北京智源人工智能研究院(BAAI2019ZD0306);国家自然科学基金(61722211,61872338,61902381);中国科学院青年创新促进会(20144310);国家重点研发计划(2016QY02D0405);联想-中科院联合实验室青年科学家项目;重庆市基础科学与前沿技术研究专项(cstc2017jcjyBX0059);泰山学者工程专项(ts201511082)

Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty

WANG Su1,2, FAN Yixing1,2 , GUO Jiafeng1,2* , ZHANG Ruqing1,2 , CHENG Xueqi1,2   

  1. 1. Key Laboratory of Network Data Science & Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-08-26 Revised:2020-09-22 Online:2021-03-25 Published:2021-04-15

摘要: 近年来,神经机器翻译模型已经成为机器翻译领域的主流模型,如何从大量的训练数据中快速、准确地学习翻译知识是一个值得探讨的问题。不同训练样本的难易程度不同,样本的难易程度对模型的收敛性有极大影响,但是传统的神经机器翻译模型在训练过程中并没有考虑这种差异性。本文探究样本的难易程度对神经机器翻译模型训练过程的影响,基于“课程学习”的思想,为神经机器翻译模型提出了一种基于样本难度的动态学习方法:分别从神经机器翻译模型的翻译效果和训练样本的句子长度2方面量化训练样本的难易程度;设计了由易到难和由难到易2种学习策略训练模型,并比较模型的翻译效果。

关键词: 神经机器翻译, 课程学习, 样本难度, 动态学习

Abstract: In recent years, neural machine translation model has become the mainstream model in the field of machine translation. How to learn translation knowledge quickly and accurately from a large amount of training data is a problem worthy of discussion. Different training samples have different degrees of difficulty. Some training samples are simpler and easy for model to learn, while others are more difficult and not easy for model to learn. The difficulty of the samples has a great influence on the convergence of the model, but the traditional neural machine translation model does not consider this difference in the training process. Therefore, this paper explores the influence of the difficulty of the samples on the training process of the neural machine translation model. Considering the sample difficulty for the neural machine translation mode, a dynamic learning method is proposed based on the idea of “curriculum learning”. The difficulty degree of the training samples is quantified from the aspects of the translation effect of the neural machine translation model and the sentence length of the training samples, respectively, then, two learning strategies are designed from-easy-to-difficult and from-difficult-to-easy to train the model. Finally, the translation effects of the model are compared. The experimental results show that both from-easy-to-difficult and from-difficult-to-easy dynamic learning methods can improve the translation effect of the neural machine translation model.

Key words: neural machine translation, curriculum learning, sample difficulty, dynamic learning

中图分类号: 

  • TP391
[1] 叶绍林.基于注意力机制编解码框架的神经机器翻译方法研究[D].合肥:中国科学技术大学,2019.
[2] WANG R,UTIYAMA M,SUMITA E.Dynamic sentence sampling for efficient training of neural machine translation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).Stroudsburg,PA:Association for Computational Linguistics,2018:298-304.
[3] BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York,NY:Association for Computing Machinery,2009:41-48.
[4] KALCHBRENNER N,BLUNSOM P.Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2013:1700-1709.
[5] CHO K,Van MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2014:1724-1734.
[6] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems:Volume 2.Cambridge,MA:MIT Press,2014:3104-3112.
[7] BAHDANAU D,CHO K H,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].(2016-05-19)[2020-08-26].https://arxiv.org/pdf/1409.0473.pdf.
[8] WU Y H,SCHUSTER M,CHEN Z F,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[EB/OL].(2016-10-08)[2020-08-26].https://arxiv.org/pdf/1609.08144.pdf.
[9] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[J].Proceedings of Machine Learning Research,2017,70:1243-1252.
[10] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[11] TSVETKOV Y,FARUQUI M,LING W,et al.Learning the curriculum with bayesian optimization for task-specific word representation learning[C]//Proceedings of the 54th Annual Meeting on Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg,PA:Association for Computational Linguistics,2016:130-139.DOI:10.18653/v1/P16-1013.
[12] CIRIK V,HOVY E,MORENCY L P.Visualizing and understanding curriculum learning for long short-term memory networks[EB/OL].(2016-11-18)[2020-08-26].https://arxiv.org/pdf/1611.06204.pdf.
[13] KOCMI T,BOJAR O.Curriculum learning and minibatch bucketing in neural machine translation[EB/OL].(2017-07-29)[2020-08-26].https://arxiv.org/pdf/1707.09533v1.pdf.
[14] ZHANG X,KUMAR G,KHAYRALLAH H,et al.An empirical exploration of curriculum learning for neural machine translation[EB/OL].(2018-11-02)[2020-08-26].https://arxiv.org/pdf/1811.00739.pdf.
[15] KUDO T,RICHARDSON J.SentencePiece:A simple and language independent subword tokenizer and detokenizer for neural text processing[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.Stroudsburg,PA:Association for Computational Linguistics,2018:66-71.DOI:10.18653/v1/D18-2012.
[16] PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:311-318.DOI:10.3115/1073083.1073135.
[1] 杨州, 范意兴, 朱小飞, 郭嘉丰, 王越. 神经信息检索模型建模因素综述[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 1-12.
[2] 禚明, 刘乐源, 周世杰, 杨鹏, 万思敏. 一种空间信息网络抗毁分析的新方法[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 21-31.
[3] 邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 32-40.
[4] 徐庆婷, 张兰芳, 朱新华. 综合语义技术与LSTM神经网络的主观题自适应评卷方法[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 51-61.
[5] 朱勇建, 罗坚, 秦运柏, 秦国峰, 唐楚柳. 基于光度立体和级数展开法的金属表面缺陷检测方法[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 21-31.
[6] 唐熔钗, 伍锡如. 基于改进YOLO-V3网络的百香果实时检测[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 32-39.
[7] 张灿龙, 李燕茹, 李志欣, 王智文. 基于核相关滤波与特征融合的分块跟踪算法[J]. 广西师范大学学报(自然科学版), 2020, 38(5): 12-23.
[8] 王健, 郑七凡, 李超, 石晶. 基于ENCODER_ATT机制的远程监督关系抽取[J]. 广西师范大学学报(自然科学版), 2019, 37(4): 53-60.
[9] 肖逸群, 宋树祥, 夏海英. 基于多特征的快速行人检测方法及实现[J]. 广西师范大学学报(自然科学版), 2019, 37(4): 61-67.
[10] 王勋, 李廷会, 潘骁, 田宇. 基于改进模糊C均值聚类与Otsu的图像分割方法[J]. 广西师范大学学报(自然科学版), 2019, 37(4): 68-73.
[11] 陈凤,蒙祖强. 基于BTM和加权K-Means的微博话题发现[J]. 广西师范大学学报(自然科学版), 2019, 37(3): 71-78.
[12] 张随远, 薛源海, 俞晓明, 刘悦, 程学旗. 多文档短摘要生成技术研究[J]. 广西师范大学学报(自然科学版), 2019, 37(2): 60-74.
[13] 孙容海, 施林甫, 黄丽艳, 唐振军, 俞春强. 基于图像插值和参考矩阵的可逆信息隐藏算法[J]. 广西师范大学学报(自然科学版), 2019, 37(2): 90-104.
[14] 朱勇建, 彭柯, 漆广文, 夏海英, 宋树祥. 基于机器视觉的太阳能网版缺陷检测[J]. 广西师范大学学报(自然科学版), 2019, 37(2): 105-112.
[15] 王祺, 邱家辉, 阮彤, 高大启, 高炬. 基于循环胶囊网络的临床语义关系识别研究[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 80-88.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 胡锦铭, 韦笃取. 分数阶永磁同步电机的广义同步研究[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 14 -20 .
[2] 朱勇建, 罗坚, 秦运柏, 秦国峰, 唐楚柳. 基于光度立体和级数展开法的金属表面缺陷检测方法[J]. 广西师范大学学报(自然科学版), 2020, 38(6): 21 -31 .
[3] 杨丽婷, 刘学聪, 范鹏来, 周岐海. 中国非人灵长类声音通讯研究进展[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 1 -9 .
[4] 宾石玉, 廖芳, 杜雪松, 许艺兰, 王鑫, 武霞, 林勇. 罗非鱼耐寒性能研究进展[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 10 -16 .
[5] 刘静, 边迅. 直翅目昆虫线粒体基因组的特征及应用[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 17 -28 .
[6] 李兴康, 钟恩主, 崔春艳, 周佳, 李小平, 管振华. 西黑冠长臂猿滇西亚种鸣叫行为监测[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 29 -37 .
[7] 和鑫明, 夏万才, 巴桑, 龙晓斌, 赖建东, 杨婵, 王凡, 黎大勇. 滇金丝猴主雄应对配偶雌性数量的理毛策略[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 38 -44 .
[8] 付文, 任宝平, 林建忠, 栾科, 王朋程, 王宾, 黎大勇, 周岐海. 济源太行山猕猴种群数量和保护现状[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 45 -52 .
[9] 郑景金, 梁霁鹏, 张克处, 黄爱面, 陆倩, 李友邦, 黄中豪. 基于木本植物优势度的白头叶猴食物选择研究[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 53 -64 .
[10] 杨婵, 万雅琼, 黄小富, 袁旭东, 周洪艳, 方浩存, 黎大勇, 李佳琦. 基于红外相机技术的小麂(Muntiacus reevesi)活动节律[J]. 广西师范大学学报(自然科学版), 2021, 39(1): 65 -70 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发