|
广西师范大学学报(自然科学版) ›› 2022, Vol. 40 ›› Issue (5): 59-71.doi: 10.16088/j.issn.1001-6600.2022030802
郝雅茹1, 董力1, 许可2*, 李先贤3
HAO Yaru1, DONG Li1, XU Ke2*, LI Xianxian3
摘要: 基于深度神经网络的大型预训练语言模型在众多自然语言处理任务上都取得了巨大的成功,如文本分类、阅读理解、机器翻译等,目前已经广泛应用于工业界。然而,这些模型的可解释性普遍较差,即难以理解为何特定的模型结构和预训练方式如此有效,亦无法解释模型做出决策的内在机制,这给人工智能模型的通用化带来不确定性和不可控性。因此,设计合理的方法来解释模型至关重要,它不仅有助于分析模型的行为,也可以指导研究者更好地改进模型。本文介绍近年来有关大型预训练语言模型可解释性的研究现状,对相关文献进行综述,并分析现有方法的不足和未来可能的发展方向。
中图分类号:
[1]VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. [2]DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423. [3]LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26)[2022-03-08]. https://arxiv.org/abs/1907.11692. DOI: 10.48550/arXiv.1907.11692. [4]YANG Z L, DAI Z H, YANG Y M, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 5753-5763. [5]DONG L, YANG N, WANG W H, et al. Unified language model pre-training for natural language understanding and generation[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 13063-13075. [6]CLARK K, LUONG M T, LE Q V, et al. ELECTRA: pre-training text encoders as discriminators rather than generators[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020: 1-18. [7]王乃钰, 叶育鑫, 刘露, 等. 基于深度学习的语言模型研究进展[J]. 软件学报, 2021, 32(4): 1082-1115. DOI: 10.13328/j.cnki.jos.006169. [8]RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [EB/OL]. (2018-06-09)[2022-03-08]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. [9]PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2227-2237. DOI: 10.18653/v1/N18-1202. [10]BELINKOV Y, GLASS J. Analysis methods in neural language processing: a survey[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 49-72. DOI: 10.1162/tacl_a_00254. [11]ROGERS A, KOVALEVA O, RUMSHISKY A. A primer in BERTology: what we know about how BERT works[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 842-866. DOI: 10.1162/tacl_a_00349. [12]MADSEN A, REDDY S, CHANDAR S. Post-hoc interpretability for neural NLP: a survey[EB/OL]. (2021-08-13)[2022-03-08]. https://arxiv.org/abs/2108.04840v2. DOI: 10.48550/arXiv.2108.04840. [13]SAJJAD H, DURRANI N, DALVI F. Neuron-level Interpretation of deep NLP models: a survey[EB/OL]. (2021-08-30)[2022-03-08]. https://arxiv.org/abs/2108.13138. DOI: 10.48550/arXiv.2108.13138. [14]侯中妮, 靳小龙, 陈剑赟, 等. 知识图谱可解释推理研究综述[J/OL]. 软件学报[2022-03-08]. http://www.jos.org.cn/jos/article/abstract/6522. DOI: 10.13328/j.cnki.jos.006522. [15]ABNAR S, ZUIDEMA W. Quantifying attention flow in transformers[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4190-4197. DOI: 10.18653/v1/2020.acl-main.385. [16]VOITA E, TALBOT D, MOISEEV F, et al. Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 5797-5808. DOI: 10.18653/v1/P19-1580. [17]MICHEL P, LEVY O, NEUBIG G. Are sixteen heads really better than one?[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 14014-14024. [18]CHEFER H, GUR S, WOLF L. Transformer interpretability beyond attention visualization[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 782-791. DOI: 10.1109/CVPR46437.2021.00084. [19]XIONG R B, YANG Y C, HE D, et al. On layer normalization in the transformer architecture[J]. Proceedings of Machine Learning Research, 2020, 119: 10524-10533. [20]XU J J, SUN X, ZHANG Z Y, et al. Understanding and improving layer normalization[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 4381-4391. [21]NGUYEN T Q, SALAZAR J. Transformers without tears: improving the normalization of self-attention[C]// Proceedings of the 16th International Conference on Spoken Language Translation. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1-9. [22]ZHANG B, SENNRICH R. Root mean square layer normalization[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 12381-12392. [23]LIU F L, REN X C, ZHANG Z Y, et al. Rethinking skip connection with layer normalization in transformers and ResNets[EB/OL]. (2021-05-15)[2022-03-08]. https://arxiv.org/abs/2105.07205v1. DOI: 10.48550/arXiv.2105.07205. [24]SAUNSHI N U, MALLADI S, ARORA S. A mathematical exploration of why language models help solve downstream tasks[C]// International Conference on Learning Representations 2021. Vienna: ICLR, 2021: 1-35. [25]WEI C, XIE S M, MA T Y. Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook, NY: Curran Associates Inc., 2021: 16158-16170. [26]TAMKIN A, SINGH T, GIOVANARDI D. Investigating transferability in pretrained language models[C]// Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: Association for Computational Linguistics, 2020: 1393-1401. DOI: 10.18653/v1/2020.findings-emnlp.125. [27]HAO Y R, DONG L, WEI F R, et al. Visualizing and understanding the effectiveness of BERT[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4143-4152. DOI: 10.18653/v1/D19-1424. [28]WETTIG A, GAO T Y, ZHONG Z X, et al. Should you mask 15% in masked language modeling?[EB/OL]. (2022-02-16)[2022-03-08]. https://arxiv.org/abs/2202.08005. DOI: 10.48550/arXiv.2202.08005. [29]CUI Y M, CHE W X, LIU T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514. DOI: 10.1109/TASLP.2021.3124365. [30]JOSHI M, CHEN D Q, LIU Y H, et al. SpanBERT: improving pre-training by representing and predicting spans[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 64-77. DOI: 10.1162/tacl_a_00300. [31]SUN Y, WANG S H, LI Y K, et al. ERNIE: enhanced representation through knowledge integration[EB/OL]. (2019-04-19)[2022-03-08]. https://arxiv.org/abs/1904.09223v1. DOI: 10.48550/arXiv.1904.09223. [32]DODGE J, ILHARCO G, SCHWARTZ R. Fine-tuning pretrained language models: weight initializations, data, and early stopping[EB/OL]. (2020-02-15)[2022-03-08]. https://arxiv.org/abs/2002.06305. DOI: 10.48550/arXiv.2002.06305. [33]MOSBACH M, ANDRIUSHCHENKO M, KLAKOW D. On the stability of fine-tuning BERT: misconceptions, explanations, and strong baselines[EB/OL]. (2021-03-25)[2022-03-08]. https://arxiv.org/abs/2006.04884. DOI: 10.48550/arXiv.2006.04884. [34]SHACHAF G, BRUTZKUS A, GLOBERSON A. A theoretical analysis of fine-tuning with linear teachers[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook, NY: Curran Associates Inc., 2021: 15382-15394. [35]KONG L P, de MASSON D’AUTUME C, YU L, et al. A mutual information maximization perspective of language representation learning[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020: 1-12. [36]JIANG H M, HE P C, CHEN W Z, et al. SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 2177-2190. DOI: 10.18653/v1/2020.acl-main.197. [37]BU Z Q, XU S Y, CHEN K. A dynamical view on optimization algorithms of overparameterized neural networks[C]// Proceedings of the 24th International Conference on Artificial Intelligence and Statistics: PMLR Volume 130. Virtual: PMLR, 2021: 3187-3195. [38]SERRANO S, SMITH N A. Is attention interpretable?[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 2931-2951. DOI: 10.18653/v1/P19-1282. [39]JAIN S, WALLACE B. Attention is not explanation[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 3543-3556. DOI: 10.18653/v1/N19-1357. [40]MEISTER C, LAZOV S, AUGENSTEIN I, et al. Is sparse attention moreinterpretable?[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2021: 122-129. DOI: 10.18653/v1/2021.acl-short.17. [41]PRUTHI D, GUPTA M, DHINGRA B, et al. Learning to deceive with Attention-Based explanations[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4782-4793. DOI: 10.18653/v1/2020.acl-main.432. [42] D, ROSA R. From balustrades to PierreVinken: looking for syntax in transformer self-attentions[C]// Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2019: 263-275. DOI: 10.18653/v1/W19-4827. [43]WIEGREFFE S, PINTER Y. Attention is notnot explanation[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 11-20. DOI: 10.18653/v1/D19-1002. [44]BRUNNER G, LIU Y, PASCUAL D, et al. On identifiability in transformers[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020:1-35. [45]CLARK K, KHANDELWAL U, LEVY O, et al. What does BERT look at? An analysis of BERT’s attention[C]// Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2019: 276-286. DOI: 10.18653/v1/W19-4828. [46]KOVALEVA O, ROMANOV A, ROGERS A, et al. Revealing the dark secrets of BERT[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4365-4374. DOI: 10.18653/v1/D19-1445. [47]HAO Y R, DONG L, WEI F R, et al. Self-attention attribution: interpreting information interactions inside transformer [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(14): 12963-12971. [48]MELIS D A, JAAKKOLA T. Towards robust interpretability with self-explaining neural networks[C]// Advances in Neural Information Processing Systems 31 (NeurIPS 2018). Red Hook: Curran Associates Inc., 2018: 7775-7784. [49]ZHAO W, SINGH R, JOSHI T, et al. Self-interpretable convolutional neural networks for textclassification[EB/OL]. (2021-07-09)[2022-03-08]. https://arxiv.org/abs/2105.08589v2. DOI: 10.48550/arXiv.2105.08589. [50]WANG Y P, WANG X Q. Self-interpretable model with transformation equivariant interpretation[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook: Curran Associates Inc., 2021: 2359-2372. [51]REIF E, YUAN A, WATTENBERG M, et al. Visualizing and measuring the geometry of BERT[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook: Curran Associates Inc., 2019: 8594-8603. [52]JAWAHAR G, SAGOT B, SEDDAH D. What does BERT learn about the structure oflanguage?[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 3651-3657. DOI: 10.18653/v1/P19-1356. [53]ROSA R, MARECEK D. Inducing syntactic trees from BERTrepresentations[EB/OL]. (2019-06-27)[2022-03-08]. https://arxiv.org/abs/1906.11511. DOI: 10.48550/arXiv.1906.11511. [54]HEWITT J, MANNING C. A structural probe for finding syntax in Word representations[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019:4129-4138. DOI: 10.18653/v1/N19-1419. [55]NIVEN T, KAO H Y. Probing neural network comprehension of natural language arguments[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4658-4664. DOI: 10.18653/v1/P19-1459. [56]SUNDARARAJAN M, TALY A, YAN Q Q. Axiomatic attribution for deep networks[C]// Proceedings of the 34th International Conference on Machine Learning: PMLR Volume 70. Sydney: PMLR, 2017: 3319-3328. [57]ARKHANGELSKAIA E, DUTTA S. Whatcha lookin’at? DeepLIFTing BERT’s attention in question answering[EB/OL]. (2019-10-14)[2022-03-08]. https://arxiv.org/abs/1910.06431. DOI: 10.48550/arXiv.1910.06431. [58]MONTAVON G, BINDER A, LAPUSCHKIN S, et al. Layer-wise relevance propagation: an overview[M]// SAMEK W, MONTAVON G, VEDALDI A, et al. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Cham: Springer, 2019: 193-209. DOI: 10.1007/978-3-030-28954-6_10. [59]SUNDARARAJAN M, NAJMI A. The many Shapley values for model explanation[C]// Proceedings of the 37th International Conference on Machine Learning: PMLR Volume 119. Virtual: PMLR, 2020: 9269-9278. [60]JIN X S, WEI Z Y, DU J Y, et al. Towards hierarchical importance attribution: explaining compositional semantics for neural sequence models[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020: 1-15. [61]BRUNA J, SZEGEDY C, SUTSKEVER I, et al. Intriguing properties of neural networks[C]// International Conference on Learning Representations 2014. Banff: ICLR, 2014: 1-10. [62]GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[C]// International Conference on Learning Representations 2015. San Diego: ICLR, 2015: 1-11. [63]ALZANTOT M, SHARMA Y, ELGOHARY A, et al. Generating natural language adversarial examples[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2890-2896.DOI: 10.18653/v1/D18-1316. [64]EBRAHIMI J, RAO A Y, LOWD D, et al. HotFlip: white-box adversarial examples for text classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 31-36. DOI: 10.18653/v1/P18-2006. [65]MUDRAKARTA P K, TALY A, SUNDARARAJAN M, et al. Did the model understand thequestion?[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1896-1906. DOI: 10.18653/v1/P18-1176. [66]WALLACE E, FENG S, KANDPAL N, et al. Universal adversarial triggers for attacking and analyzing NLP[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2153-2162. DOI: 10.18653/v1/D19-1221. [67]KARPATHY A, JOHNSON J, LI F F. Visualizing and understanding recurrent networks[EB/OL]. (2015-11-17)[2022-03-08]. https://arxiv.org/abs/1506.02078. DOI: 10.48550/arXiv.1506.02078. [68]LI J W, CHEN X L, HOVY E, et al. Visualizing and understanding neural models in NLP[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019: 681-691. DOI: 10.18653/v1/N16-1082. [69]VALIPOUR M, LEE E S A, JAMACARO J R, et al. Unsupervised transfer learning via BERT neuron selection[EB/OL]. (2019-12-10)[2022-03-08]. https://arxiv.org/abs/1912.05308. DOI: 10.48550/arXiv.1912.05308. |
[1] | 田晟, 宋霖. 基于CNN和Bagging集成的交通标志识别[J]. 广西师范大学学报(自然科学版), 2022, 40(4): 35-46. |
[2] | 李正光, 陈恒, 林鸿飞. 基于双向语言模型的社交媒体药物不良反应识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 40-48. |
[3] | 周圣凯, 富丽贞, 宋文爱. 基于深度学习的短文本语义相似度计算模型[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 49-56. |
[4] | 彭涛, 唐经, 何凯, 胡新荣, 刘军平, 何儒汉. 基于多步态特征融合的情感识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 104-111. |
[5] | 马新娜, 赵猛, 祁琳. 基于卷积脉冲神经网络的故障诊断方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 112-120. |
[6] | 段美玲, 潘巨龙. 基于双向LSTM神经网络可穿戴跌倒检测研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 141-150. |
[7] | 孔亚钰, 卢玉洁, 孙中天, 肖敬先, 侯昊辰, 陈廷伟. 面向强化当前兴趣的图神经网络推荐算法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 151-160. |
[8] | 林培群, 何伙华, 林旭坤. 基于系统关联性的高速公路大中型货车到达量多尺度预测[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 15-26. |
[9] | 马铖旭, 曾上游, 赵俊博, 陈红阳. 基于卷积神经网络的逆光图像增强研究[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 81-90. |
[10] | 谭凯, 李永杰, 潘海明, 黄可馨, 邱杰, 陈庆锋. 基于多信息集成的药物靶标预测方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 91-102. |
[11] | 朱恩文, 朱安麒, 王洁丹, 刘玉娇. 基于EEMD-GA-BP模型的风电功率短期预测研究[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 166-174. |
[12] | 田晟, 李成伟, 黄伟, 王蕾. 疫情下基于GC-rBPNN模型的公路货运量预测方法[J]. 广西师范大学学报(自然科学版), 2021, 39(6): 24-32. |
[13] | 陈文康, 陆声链, 刘冰浩, 李帼, 刘晓宇, 陈明. 基于改进YOLOv4的果园柑橘检测方法研究[J]. 广西师范大学学报(自然科学版), 2021, 39(5): 134-146. |
[14] | 杨州, 范意兴, 朱小飞, 郭嘉丰, 王越. 神经信息检索模型建模因素综述[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 1-12. |
[15] | 邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 32-40. |
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |