Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (5): 59-71.doi: 10.16088/j.issn.1001-6600.2022030802
Previous Articles Next Articles
HAO Yaru1, DONG Li1, XU Ke2*, LI Xianxian3
CLC Number:
[1]VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. [2]DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423. [3]LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26)[2022-03-08]. https://arxiv.org/abs/1907.11692. DOI: 10.48550/arXiv.1907.11692. [4]YANG Z L, DAI Z H, YANG Y M, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 5753-5763. [5]DONG L, YANG N, WANG W H, et al. Unified language model pre-training for natural language understanding and generation[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 13063-13075. [6]CLARK K, LUONG M T, LE Q V, et al. ELECTRA: pre-training text encoders as discriminators rather than generators[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020: 1-18. [7]王乃钰, 叶育鑫, 刘露, 等. 基于深度学习的语言模型研究进展[J]. 软件学报, 2021, 32(4): 1082-1115. DOI: 10.13328/j.cnki.jos.006169. [8]RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [EB/OL]. (2018-06-09)[2022-03-08]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. [9]PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2227-2237. DOI: 10.18653/v1/N18-1202. [10]BELINKOV Y, GLASS J. Analysis methods in neural language processing: a survey[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 49-72. DOI: 10.1162/tacl_a_00254. [11]ROGERS A, KOVALEVA O, RUMSHISKY A. A primer in BERTology: what we know about how BERT works[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 842-866. DOI: 10.1162/tacl_a_00349. [12]MADSEN A, REDDY S, CHANDAR S. Post-hoc interpretability for neural NLP: a survey[EB/OL]. (2021-08-13)[2022-03-08]. https://arxiv.org/abs/2108.04840v2. DOI: 10.48550/arXiv.2108.04840. [13]SAJJAD H, DURRANI N, DALVI F. Neuron-level Interpretation of deep NLP models: a survey[EB/OL]. (2021-08-30)[2022-03-08]. https://arxiv.org/abs/2108.13138. DOI: 10.48550/arXiv.2108.13138. [14]侯中妮, 靳小龙, 陈剑赟, 等. 知识图谱可解释推理研究综述[J/OL]. 软件学报[2022-03-08]. http://www.jos.org.cn/jos/article/abstract/6522. DOI: 10.13328/j.cnki.jos.006522. [15]ABNAR S, ZUIDEMA W. Quantifying attention flow in transformers[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4190-4197. DOI: 10.18653/v1/2020.acl-main.385. [16]VOITA E, TALBOT D, MOISEEV F, et al. Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 5797-5808. DOI: 10.18653/v1/P19-1580. [17]MICHEL P, LEVY O, NEUBIG G. Are sixteen heads really better than one?[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 14014-14024. [18]CHEFER H, GUR S, WOLF L. Transformer interpretability beyond attention visualization[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 782-791. DOI: 10.1109/CVPR46437.2021.00084. [19]XIONG R B, YANG Y C, HE D, et al. On layer normalization in the transformer architecture[J]. Proceedings of Machine Learning Research, 2020, 119: 10524-10533. [20]XU J J, SUN X, ZHANG Z Y, et al. Understanding and improving layer normalization[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 4381-4391. [21]NGUYEN T Q, SALAZAR J. Transformers without tears: improving the normalization of self-attention[C]// Proceedings of the 16th International Conference on Spoken Language Translation. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1-9. [22]ZHANG B, SENNRICH R. Root mean square layer normalization[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 12381-12392. [23]LIU F L, REN X C, ZHANG Z Y, et al. Rethinking skip connection with layer normalization in transformers and ResNets[EB/OL]. (2021-05-15)[2022-03-08]. https://arxiv.org/abs/2105.07205v1. DOI: 10.48550/arXiv.2105.07205. [24]SAUNSHI N U, MALLADI S, ARORA S. A mathematical exploration of why language models help solve downstream tasks[C]// International Conference on Learning Representations 2021. Vienna: ICLR, 2021: 1-35. [25]WEI C, XIE S M, MA T Y. Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook, NY: Curran Associates Inc., 2021: 16158-16170. [26]TAMKIN A, SINGH T, GIOVANARDI D. Investigating transferability in pretrained language models[C]// Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: Association for Computational Linguistics, 2020: 1393-1401. DOI: 10.18653/v1/2020.findings-emnlp.125. [27]HAO Y R, DONG L, WEI F R, et al. Visualizing and understanding the effectiveness of BERT[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4143-4152. DOI: 10.18653/v1/D19-1424. [28]WETTIG A, GAO T Y, ZHONG Z X, et al. Should you mask 15% in masked language modeling?[EB/OL]. (2022-02-16)[2022-03-08]. https://arxiv.org/abs/2202.08005. DOI: 10.48550/arXiv.2202.08005. [29]CUI Y M, CHE W X, LIU T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514. DOI: 10.1109/TASLP.2021.3124365. [30]JOSHI M, CHEN D Q, LIU Y H, et al. SpanBERT: improving pre-training by representing and predicting spans[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 64-77. DOI: 10.1162/tacl_a_00300. [31]SUN Y, WANG S H, LI Y K, et al. ERNIE: enhanced representation through knowledge integration[EB/OL]. (2019-04-19)[2022-03-08]. https://arxiv.org/abs/1904.09223v1. DOI: 10.48550/arXiv.1904.09223. [32]DODGE J, ILHARCO G, SCHWARTZ R. Fine-tuning pretrained language models: weight initializations, data, and early stopping[EB/OL]. (2020-02-15)[2022-03-08]. https://arxiv.org/abs/2002.06305. DOI: 10.48550/arXiv.2002.06305. [33]MOSBACH M, ANDRIUSHCHENKO M, KLAKOW D. On the stability of fine-tuning BERT: misconceptions, explanations, and strong baselines[EB/OL]. (2021-03-25)[2022-03-08]. https://arxiv.org/abs/2006.04884. DOI: 10.48550/arXiv.2006.04884. [34]SHACHAF G, BRUTZKUS A, GLOBERSON A. A theoretical analysis of fine-tuning with linear teachers[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook, NY: Curran Associates Inc., 2021: 15382-15394. [35]KONG L P, de MASSON D’AUTUME C, YU L, et al. A mutual information maximization perspective of language representation learning[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020: 1-12. [36]JIANG H M, HE P C, CHEN W Z, et al. SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 2177-2190. DOI: 10.18653/v1/2020.acl-main.197. [37]BU Z Q, XU S Y, CHEN K. A dynamical view on optimization algorithms of overparameterized neural networks[C]// Proceedings of the 24th International Conference on Artificial Intelligence and Statistics: PMLR Volume 130. Virtual: PMLR, 2021: 3187-3195. [38]SERRANO S, SMITH N A. Is attention interpretable?[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 2931-2951. DOI: 10.18653/v1/P19-1282. [39]JAIN S, WALLACE B. Attention is not explanation[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 3543-3556. DOI: 10.18653/v1/N19-1357. [40]MEISTER C, LAZOV S, AUGENSTEIN I, et al. Is sparse attention moreinterpretable?[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2021: 122-129. DOI: 10.18653/v1/2021.acl-short.17. [41]PRUTHI D, GUPTA M, DHINGRA B, et al. Learning to deceive with Attention-Based explanations[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4782-4793. DOI: 10.18653/v1/2020.acl-main.432. [42] D, ROSA R. From balustrades to PierreVinken: looking for syntax in transformer self-attentions[C]// Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2019: 263-275. DOI: 10.18653/v1/W19-4827. [43]WIEGREFFE S, PINTER Y. Attention is notnot explanation[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 11-20. DOI: 10.18653/v1/D19-1002. [44]BRUNNER G, LIU Y, PASCUAL D, et al. On identifiability in transformers[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020:1-35. [45]CLARK K, KHANDELWAL U, LEVY O, et al. What does BERT look at? An analysis of BERT’s attention[C]// Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2019: 276-286. DOI: 10.18653/v1/W19-4828. [46]KOVALEVA O, ROMANOV A, ROGERS A, et al. Revealing the dark secrets of BERT[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4365-4374. DOI: 10.18653/v1/D19-1445. [47]HAO Y R, DONG L, WEI F R, et al. Self-attention attribution: interpreting information interactions inside transformer [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(14): 12963-12971. [48]MELIS D A, JAAKKOLA T. Towards robust interpretability with self-explaining neural networks[C]// Advances in Neural Information Processing Systems 31 (NeurIPS 2018). Red Hook: Curran Associates Inc., 2018: 7775-7784. [49]ZHAO W, SINGH R, JOSHI T, et al. Self-interpretable convolutional neural networks for textclassification[EB/OL]. (2021-07-09)[2022-03-08]. https://arxiv.org/abs/2105.08589v2. DOI: 10.48550/arXiv.2105.08589. [50]WANG Y P, WANG X Q. Self-interpretable model with transformation equivariant interpretation[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook: Curran Associates Inc., 2021: 2359-2372. [51]REIF E, YUAN A, WATTENBERG M, et al. Visualizing and measuring the geometry of BERT[C]// Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook: Curran Associates Inc., 2019: 8594-8603. [52]JAWAHAR G, SAGOT B, SEDDAH D. What does BERT learn about the structure oflanguage?[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 3651-3657. DOI: 10.18653/v1/P19-1356. [53]ROSA R, MARECEK D. Inducing syntactic trees from BERTrepresentations[EB/OL]. (2019-06-27)[2022-03-08]. https://arxiv.org/abs/1906.11511. DOI: 10.48550/arXiv.1906.11511. [54]HEWITT J, MANNING C. A structural probe for finding syntax in Word representations[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019:4129-4138. DOI: 10.18653/v1/N19-1419. [55]NIVEN T, KAO H Y. Probing neural network comprehension of natural language arguments[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4658-4664. DOI: 10.18653/v1/P19-1459. [56]SUNDARARAJAN M, TALY A, YAN Q Q. Axiomatic attribution for deep networks[C]// Proceedings of the 34th International Conference on Machine Learning: PMLR Volume 70. Sydney: PMLR, 2017: 3319-3328. [57]ARKHANGELSKAIA E, DUTTA S. Whatcha lookin’at? DeepLIFTing BERT’s attention in question answering[EB/OL]. (2019-10-14)[2022-03-08]. https://arxiv.org/abs/1910.06431. DOI: 10.48550/arXiv.1910.06431. [58]MONTAVON G, BINDER A, LAPUSCHKIN S, et al. Layer-wise relevance propagation: an overview[M]// SAMEK W, MONTAVON G, VEDALDI A, et al. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Cham: Springer, 2019: 193-209. DOI: 10.1007/978-3-030-28954-6_10. [59]SUNDARARAJAN M, NAJMI A. The many Shapley values for model explanation[C]// Proceedings of the 37th International Conference on Machine Learning: PMLR Volume 119. Virtual: PMLR, 2020: 9269-9278. [60]JIN X S, WEI Z Y, DU J Y, et al. Towards hierarchical importance attribution: explaining compositional semantics for neural sequence models[C]// International Conference on Learning Representations 2020. Addis Ababa: ICLR, 2020: 1-15. [61]BRUNA J, SZEGEDY C, SUTSKEVER I, et al. Intriguing properties of neural networks[C]// International Conference on Learning Representations 2014. Banff: ICLR, 2014: 1-10. [62]GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[C]// International Conference on Learning Representations 2015. San Diego: ICLR, 2015: 1-11. [63]ALZANTOT M, SHARMA Y, ELGOHARY A, et al. Generating natural language adversarial examples[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2890-2896.DOI: 10.18653/v1/D18-1316. [64]EBRAHIMI J, RAO A Y, LOWD D, et al. HotFlip: white-box adversarial examples for text classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 31-36. DOI: 10.18653/v1/P18-2006. [65]MUDRAKARTA P K, TALY A, SUNDARARAJAN M, et al. Did the model understand thequestion?[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 1896-1906. DOI: 10.18653/v1/P18-1176. [66]WALLACE E, FENG S, KANDPAL N, et al. Universal adversarial triggers for attacking and analyzing NLP[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: Association for Computational Linguistics, 2019: 2153-2162. DOI: 10.18653/v1/D19-1221. [67]KARPATHY A, JOHNSON J, LI F F. Visualizing and understanding recurrent networks[EB/OL]. (2015-11-17)[2022-03-08]. https://arxiv.org/abs/1506.02078. DOI: 10.48550/arXiv.1506.02078. [68]LI J W, CHEN X L, HOVY E, et al. Visualizing and understanding neural models in NLP[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019: 681-691. DOI: 10.18653/v1/N16-1082. [69]VALIPOUR M, LEE E S A, JAMACARO J R, et al. Unsupervised transfer learning via BERT neuron selection[EB/OL]. (2019-12-10)[2022-03-08]. https://arxiv.org/abs/1912.05308. DOI: 10.48550/arXiv.1912.05308. |
[1] | LI Zhengguang, CHEN Heng, LIN Hongfei. Identification of Adverse Drug Reaction on Social Media Using Bi-directional Language Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 40-48. |
[2] | ZHOU Shengkai, FU Lizhen, SONG Wen’ai. Semantic Similarity Computing Model for Short Text Based on Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 49-56. |
[3] | LIN Peiqun, HE Huohua, LIN Xukun. Multi-scale Prediction of Expressways' Arrival Volume of Large and Medium-sized Trucks Based on System Relevance [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(2): 15-26. |
[4] | CHEN Wenkang, LU Shenglian, LIU Binghao, LI Guo, LIU Xiaoyu, CHEN Ming. Real-time Citrus Recognition under Orchard Environment by Improved YOLOv4 [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(5): 134-146. |
[5] | LUO Lan, ZHOU Nan, SI Jie. New Delay Partition Method for Robust Stability of Uncertain Cellular Neural Networks with Time-Varying Delays [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 45-52. |
[6] | WU Wenya,CHEN Yufeng,XU Jin’an,ZHANG Yujie. High-level Semantic Attention-based Convolutional Neural Networks for Chinese Relation Extraction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 32-41. |
[7] | LI Peng. Load Localization of Optical Fiber Smart Structures Based on Probabilistic Neural Networks [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(2): 223-226. |
[8] | ZHAO Hui-wei, LI Wen-hua, FENG Chun-hua, LUO Xiao-shu. Periodic Oscillation Analysis for a Recurrent Neural NetworksModel with Time Delays [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 29-34. |
[9] | CHENG Xian-yi, PAN Yan, ZHU Qian, SUN Ping. Automatic Generating Algorithm of Event-oriented Multi-documentSummarization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 147-150. |
|