基于多粒度的分词消歧和语义增强的情景剧幽默识别

doi:10.16088/j.issn.1001-6600.2021091505

摘要/Abstract

摘要： 在自然语言理解领域中,幽默计算逐渐成为重要的研究内容。中文的幽默语言表达千变万化,情景喜剧是一种特殊的幽默表达方式,其含有丰富的幽默表达。为了解决中文幽默计算的问题,本文在图注意力网络的基础上提出一种基于分词消歧以及语义增强的幽默识别算法DISA-SE-GAT,并构建了一个基于《爱情公寓》的幽默情景喜剧数据集。在《我爱我家》幽默数据集以及《爱情公寓》幽默数据集上的实验结果显示,本文提出的多粒度消歧和语义增强模型DISA-SE-GAT在对文本幽默表达的识别问题上表现优异。

关键词: 幽默计算, 情感分析, 多粒度, 语义增强

Abstract: In the field of natural language understanding, humorous computation has gradually become an important research content. Sitcom is a special form of humorous expression, which contains abundant humorous expressions. Chinese is so varied that it is a challenge for a computer to analyze the humor emotion. In order to solve the problem of Chinese humor calculation, the following work are done in this paper. First, a humor recognition algorithm, DISA-SE-GAT, based on segmentation enhancement and semantic enhancement, is proposed based on the graph attention network. Second, a humorous sitcom data set, ipartment, is constructed. Experimental results show that the model of word sense disambiguation and semantic enhancement, DISA-SE-GAT, performs well in the recognition of humorous expression in text.

Key words: humor computing, sentiment analysis, multi-granularity, semantic enhancement

中图分类号:

TP391.1

孙岩松, 杨亮, 林鸿飞. 基于多粒度的分词消歧和语义增强的情景剧幽默识别[J]. 广西师范大学学报（自然科学版）, 2022, 40(3): 57-65.

SUN Yansong, YANG Liang, LIN Hongfei. Humor Recognition of Sitcom Based on Multi-granularity of Segmentation Enhancement and Semantic Enhancement[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 57-65.

参考文献

[1]林鸿飞, 张冬瑜, 杨亮, 等. 幽默计算及其应用研究[J]. 山东大学学报(理学版), 2016, 51(7): 1-10. DOI: 10.6040/j.issn.1671-9352.0.2016.266.
[2]单新荣, 肖坤学. 框架语义学视域下会话幽默的成因[J]. 当代外语研究, 2014(5): 20-23, 77. DOI: 10.3969/j.issn.1674-8921.2014.05.005.
[3]吴勇. 试析幽默语言的模糊性[J]. 西南民族大学学报(人文社科版), 2004(12): 484-487.
[4]丁俊良. 英汉语言幽默表达的类似特点[J]. 河南大学学报(社会科学版), 1995(1): 77-81.
[5]孙建旺, 吕学强, 张雷瀚. 基于词典与机器学习的中文微博情感分析研究[J]. 计算机应用与软件, 2014, 31(7): 177-181. DOI: 10.3969/j.issn.1000-386x.2014.07.045.
[6]陈平平, 耿笑冉, 邹敏, 等. 基于机器学习的文本情感倾向性分析[J]. 计算机与现代化, 2020(3): 77-81, 92.
[7]徐健锋, 许园, 许元辰, 等. 基于语义理解和机器学习的混合的中文文本情感分类算法框架[J]. 计算机科学, 2015, 42(6): 61-66. DOI: 10.11896/j.issn.1002-137X.2015.6.014.
[8]NOBLE W S. What is a support vector machine?[J]. Nature Biotechnology, 2006, 24(12): 1565-1567. DOI: 10.1038/nbt1206-1565.
[9]BERK R A. Classification and regression trees (CART)[M]// Statistical Learning From a Regression Perspective. New York, NY: Springer, 2008: 1-65. DOI: 10.1007/978-0-387-77501-2_3.
[10]KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: Association for Computational Linguistics, 2014: 1746-1751. DOI: 10.3115/v1/D14-1181.
[11]刘龙飞, 杨亮, 张绍武,等. 基于卷积神经网络的微博情感倾向性分析[J]. 中文信息学报, 2015, 29(6): 159-165. DOI: 10.3969/j.issn.1003-0077.2015.06.021.
[12]李杰, 李欢. 基于深度学习的短文本评论产品特征提取及情感分类研究[J]. 情报理论与实践, 2018, 41(2): 143-148. DOI: 10.16353/j.cnki.1000-7490.2018.02.026.
[13]SUNDERMEYER M, SCHLÜTER R, NEY H. LSTM neural networks for language modeling[C]// Proceedings of Interspeech 2012. Portland, OR: The International Speech Communication Association, 2012: 194-197. DOI: 10.21437/interspeech.2012-65.
[14]任勉, 甘刚. 基于双向LSTM模型的文本情感分类[J]. 计算机工程与设计, 2018, 39(7): 2064-2068. DOI: 10.16208/j.issn1000-7024.2018.07.044.
[15]翟社平, 杨媛媛, 邱程, 等. 基于注意力机制Bi-LSTM算法的双语文本情感分析[J]. 计算机应用与软件, 2019, 36(12): 251-255.
[16]AMBARTSOUMIAN A, POPOWICH F. Self-attention: a better building block for sentiment analysis neural network classifiers[C]// Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Stroudsburg, PA: Association for Computational Linguistics, 2018: 130-139. DOI: 10.18653/v1/W18-6219.
[17]GUO Q P, QIU X P, LIU P F, et al. Multi-scale self-attention for text classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7847-7854. DOI: 10.1609/aaai.v34i05.6290.
[18]SUN C, QIU X P, XU Y G, et al. How to fine-tune BERT for text classification?[C]// Proceedings of 18th China National Conference on Chinese Computational Linguistics: Lecture Notes in Computer Science Volume 11856. Cham, Switzerland: Springer Nature Switzerland AG, 2019: 194-206. DOI: 10.1007/978-3-030-32381-3_16.
[19]LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26)[2021-09-15]. https://arxiv.org/pdf/1907.11692. DOI: 10.48550/arXiv.1907.11692.
[20]SUN Y, WANG S H, LI Y K, et al. ERNIE: Enhanced representation through knowledge integration[EB/OL]. (2019-04-19)[2021-09-15]. https://arxiv.org/pdf/1904.09223. DOI: 10.48550/arXiv.1904.09223.
[21]LAI Y X, FENG Y S, YU X H, et al. Lattice CNNs for matching based chinese question answering[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 6634-6641. DOI: 10.1609/aaai.v33i01.33016634
[22]DONG Z D, DONG Q. HowNet-a hybrid language and knowledge resource[C]// Proceedings of 2003 International Conference on Natural Language Processing and Knowledge Engineering. Piscataway, NJ: IEEE Press, 2003: 820-824. DOI: 10.1109/NLPKE.2003.1276017.
[23]QI F C, YANG C H, LIU Z Y, et al. OpenHowNet: an open sememe-based lexical knowledge base[EB/OL]. (2019-01-28)[2021-09-15]. https://arxiv.org/pdf/1901.09957. DOI: 10.48550/arXiv.1901.09957.
[24]YAO L, MAO C S, LUO Y. Graph convolutional networks for text classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7370-7377. DOI: 10.1609/aaai.v33i01.33017370.
[25]SHEN T, ZHOU T Y, LONG G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5446-5455.
[26]NIU Y L, XIE R B, LIU Z Y, et al. Improved word representation learning with sememes[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 2049-2058. DOI: 10.18653/v1/P17-1187.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed