Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (3): 31-39.doi: 10.16088/j.issn.1001-6600.2021091504

Previous Articles     Next Articles

Construction of Chinese Multimodal Knowledge Base

CHAO Rui, ZHANG Kunli*, WANG Jiajia, HU Bin, ZHANG Weicong, HAN Yingjie, ZAN Hongying   

  1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou Henan 450001, China
  • Received:2021-09-15 Revised:2021-12-28 Online:2022-05-25 Published:2022-05-27

Abstract: Multi-modal fusion aims to integrate multiple modal information to obtain a consistent and common model output, which is a basic problem in the multi-modal field. Through the fusion of multimodal information, more comprehensive features can be obtained and the robustness of the model can be improved. At present, multimodal fusion technology has become one of the core research topics in the field of multimodality. Based on Imagenet, HowNet and CCD, this paper constructs a new multimodal knowledge base through manual annotation. The calibration has completed the mapping of 21 455 noun concepts in ImageNet, effectively mapping the concepts in HowNet and CCD to ImageNet. The data set can be applied to natural language processing tasks and computer vision tasks, and improve the task effect through picture information and concept information. In image classification, by adding HowNet and ImageNet concepts, more image features can be integrated to assist classification. In semantic understanding, image information can be better understood by adding image information through mapping.

Key words: multimodal infomation, multimodal fusion, ImageNet, HowNet, CCD

CLC Number: 

  • TP391.1
[1]陈鹏, 李擎, 张德政, 等. 多模态学习方法综述[J]. 工程科学学报, 2020, 42(5): 557-569.
[2]RAMACHANDRAM D, TAYLORG W. Deep multimodal learning: a survey on recent advances and trends[J]. IEEE Signal Processing Magazine, 2017, 34(6): 96-108. DOI: 10.1109/MSP.2017.2738401.
[3]DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2009: 248-255. DOI: 10.1109/CVPR.2009.5206848.
[4]董振东, 董强. 知网和汉语研究[J]. 当代语言学, 2001, 3(1): 33-44.
[5]刘杨, 俞士汶, 于江生. CCD语义知识库的构造研究[J]. 小型微型计算机系统, 2005, 26(8): 1411-1415.
[6]赵京胜, 宋梦雪, 高祥. 自然语言处理发展及应用综述[J]. 信息技术与信息化, 2019(7): 142-145.
[7]XIE R B, LIU Z Y, LUAN H B, et al. Image-embodied knowledge representation learning[C]// Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI, 2017: 3140-3146. DOI: 10.24963/ijcai.2017/438.
[8]ZHANG Q, FU J, LIU X, et al. Adaptive co-attention network for named entity recognition in tweets[C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 5674-5681.
[9]李霞, 卢官明, 闫静杰, 等. 多模态维度情感预测综述[J]. 自动化学报, 2018, 44(12): 2142-2159.
[10]NIU Z X, ZHOU M, WANG L, et al. Hierarchical multimodal LSTM for dense visual-semantic embedding[C]// 2017 IEEE International conference on Computer Vision(ICCV). Los Alamitos, CA: IEEE Computer Society, 2017: 1899-1907. DOI: 10.1109/ICCV.2017.208.
[11]孙影影, 贾振堂, 朱昊宇. 多模态深度学习综述[J]. 计算机工程与应用, 2020, 56(21): 1-10.
[12]MROUEH Y, MARCHERET E, GOEL V. Deep multimodal learning for audio-visual speech recognition[C]// 2015 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Piscataway, NJ: IEEE, 2015: 2130-2134. DOI: 10.1109/ICASSP.2015.7178347.
[13]LEI J, WANG L W, SHEN Y L, et al. Mart: memory-augmented recurrent transformer for coherent video paragraph captioning[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 2603-2614. DOI: 10.18653/v1/2020.acl-main.233.
[14]CORNIA M, STEFANINI M, BARALDI L, et al. Meshed-memory transformer for image captioning[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Los Alamitos, CA: IEEE Computer Society, 2020: 10578-10587. DOI: 10.1109/CVPR42600.2020.01059.
[15]韩晶. 基于视听信息融合的语音识别研究[D]. 哈尔滨: 哈尔滨理工大学, 2011.
[16]邓佩,谭长庚. 基于转移变量的图文融合微博情感分析[J]. 计算机应用研究, 2018, 21(7): 124-127.
[17]HUANG F R, ZHANG X M, ZHAO Z H, et al. Image-text sentiment analysis via deep multimodal attentive fusion[J]. Knowledge-Based Systems, 2019, 167: 26-37. DOI: 10.1016/j.knosys.2019.01.019.
[18]TIAN F, WANG Q G, LI X, et al. Heterogeneous multimedia cooperative annotation based on multimodal correlation learning[J]. Journal of Visual Communication and Image Representation, 2019, 58: 544-553. DOI: 10.1016/j.jvcir.2018.12.028.
[19]CHEN C, JAFARI R, KEHTARNAVAZ N. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]// 2015 IEEE International conference on image processing(ICIP).Piscataway, NJ: IEEE, 2015: 168-172. DOI: 10.1109/ICIP.2015.7350781.
[20]RINGEVAL F, SONDEREGGER A, SAUER J, et al.Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions[C]// 2013 10th IEEE international conference and workshops on automatic face and gesture recognition(FG). Piscataway, NJ: IEEE, 2013: 1-8. DOI: 10.1109/FG.2013.6553805.
[21]AGRAWAL A, LU J S, ANTOL S, et al. VQA:visual question answering[J]. International Journal of Computer Vision, 2017, 123(1): 4-31. DOI: 10.1007/s11263-016-0966-6.
[22]NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]// 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. Los Alamitos, CA: IEEE Computer Society, 2008: 722-729. DOI: 10.1109/ICVGIP.2008.47.
[23]HEILBRON F C, ESCORCIA V, GHANEM B, et al. Activitynet: a large-scale video benchmark for human activity understanding[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway, NJ: IEEE, 2015: 961-970. DOI: 10.1109/CVPR.2015.7298698.
[24]HRIPCSAK G, ROTHSCHILD A S. Agreement, the F-measure, and reliability in information retrieval[J]. Journal of the American Medical Informatics Association, 2005, 12(3): 296-298. DOI: 10.1197/jamia.M1733.
[25]CARLETTA J. Assessing agreement on classification tasks: the kappa statistic[J].Computational Linguistics. 1996, 22(2): 249-254.
[26]ARTSTEIN R, POESIO M. Inter-coder agreement for computational linguistics[J]. Computational Linguistics. 2008, 34(4): 555-596. DOI: 10.1162/coli.07-034-R2.
[1] LI Zhengguang, CHEN Heng, LIN Hongfei. Identification of Adverse Drug Reaction on Social Media Using Bi-directional Language Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 40-48.
[2] ZHOU Shengkai, FU Lizhen, SONG Wen’ai. Semantic Similarity Computing Model for Short Text Based on Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 49-56.
[3] SUN Yansong, YANG Liang, LIN Hongfei. Humor Recognition of Sitcom Based on Multi-granularity of Segmentation Enhancement and Semantic Enhancement [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 57-65.
[4] WANG Jian, ZHENG Qifan, LI Chao, SHI Jing. Remote Supervision Relationship Extraction Based on Encoder and Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 53-60.
[5] SONG Jun, HAN Xiao-yu, HUANG Yu, HUANG Ting-lei, FU Kun. A Method for Entity-Oriented Timeline Summarization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2015, 33(2): 36-41.
[6] ZHANG Fen, QU Wei-guang, ZHAO Hong-yan, ZHOU Jun-sheng. Shallow Parsing Based on CRF and Transformation-basedError-driven Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(3): 147-150.
[7] ZHUO Guang-ping, SUN Jing-yu, LI Xian-hua, YU Xue-li. Personalized Recommendation Algorithm Based on CBR [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(3): 151-156.
[8] LIU Jinlong,GUO Yan, YU Zhihua, LIU Yue,YU Xiaoming,CHENGXueqi. A New Method to Detect Busty Events with Different Media Data Based on Word Clustering [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 23-31.
[9] ZHENG Kengtao, LIN Nankai, FU Yingwen, WANG Lianxi, JIANG Shengyi. Study on the Automatic Alignment of Mandarin-Indonesian Bilingual Texts [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 89-97.
[10] CHENG Xian-yi, PAN Yan, ZHU Qian, SUN Ping. Automatic Generating Algorithm of Event-oriented Multi-documentSummarization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 147-150.
[11] YANG Liang, PAN Feng-ming, LIN Hong-fei. Chunk-based Opinion Object Extraction and Application in OpinionAnalysis [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 151-156.
[12] ZHOU Xin, HAO Zhi-feng, CAI Rui-chu, WEN Wen. Text Clustering with Noise and It's Application in Anti-spam Systems [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(2): 156-160.
Full text



[1] AI Yan, JIA Nan, WANG Yuan, GUO Jing, PAN Dongdong. Review of Statistical Methods and Applications of Genetic Association Analysis for Multiple Traits and Multiple Locus[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 1 -14 .
[2] BAI Defa, XU Xin, WANG Guochang. Review of Generalized Linear Models and Classification for Functional Data[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 15 -29 .
[3] ZENG Qingfan, QIN Yongsong, LI Yufang. Empirical Likelihood Inference for a Class of Spatial Panel Data Models[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 30 -42 .
[4] ZHANG Zhifei, DUAN Qian, LIU Naijia, HUANG Lei. High-dimensional Nonlinear Regression Model Based on JMI[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 43 -56 .
[5] YANG Di, FANG Yangxin, ZHOU Yan. New Category Classification Research Based on MEB and SVM Methods[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 57 -67 .
[6] CHEN Zhongxiu, ZHANG Xingfa, XIONG Qiang, SONG Zefang. Estimation and Test for Asymmetric DAR Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 68 -81 .
[7] DU Jinfeng, WANG Hairong, LIANG Huan, WANG Dong. Progress of Cross-modal Retrieval Methods Based on Representation Learning[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 1 -12 .
[8] LI Muhang, HAN Meng, CHEN Zhiqiang, WU Hongxin, ZHANG Xilong. Survey of Algorithms Oriented to Complex High Utility Pattern Mining[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 13 -30 .
[9] LI Zhengguang, CHEN Heng, LIN Hongfei. Identification of Adverse Drug Reaction on Social Media Using Bi-directional Language Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 40 -48 .
[10] ZHOU Shengkai, FU Lizhen, SONG Wen’ai. Semantic Similarity Computing Model for Short Text Based on Deep Learning[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 49 -56 .