Journal of Guangxi Normal University(Natural Science Edition) ›› 2019, Vol. 37 ›› Issue (3): 71-78.doi: 10.16088/j.issn.1001-6600.2019.03.008

Previous Articles     Next Articles

Topic Discovery in Microblog Based on BTM and Weighting K-Means

CHEN Feng,MENG Zuqiang*   

  1. School of Computer,Electronics and Information, Guangxi University, Nanning Guangxi 530004,China
  • Online:2019-07-12 Published:2019-07-12

Abstract: In order to adapt to special features of microblogging data, such as short texts, low word frequency, and lack of semantic expression, improve accuracy of topic discovery, and help users obtain useful information, a method based on BTM and weighting K-Means is proposed to achieve topic discovery. Firstly, faced with the problem of data sparsity, the text model is built based on the BTM model to obtain the topic words. Secondly, aimed at defects of the traditional K-Means algorithm itself, the weighting K-Means algorithm is proposed to obtain microblogging topics. Finally, experiments are conducted to validate the method of this paper. The experimental results show that the BTM and weighting K-Means method can solve problems of high dimensionality and sparsity of microblogging data, and it improves the accuracy and effectiveness of topic discovery.

Key words: biterm topic model(BTM), weighting K-Means, microblogging data, topic discovery

CLC Number: 

  • TP391
[1] BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3: 993-1022.
[2] 谢昊,江红.一种面向微博主题挖掘的改进LDA模型[J].华东师范大学学报(自然科学版),2013(6):93-101.DOI: 10.3969/j.issn.1000-5641.2013.06.011.
[3] LIU Quanchao,HUANG Heyan,FENG Chong.Micro-blog post topic drift detection based on LDA model[C]// Behavior and Social Computing: LNCS Volume 8178,2013:106-118.DOI:10.1007/978-3-319-04048-6_10.
[4] GE Gaofei,CHEN Liping,DU Junping.The research on topic detection of microblog based on TC-LDA[C]//2013 15th IEEE International Conference on Communication Technology.Piscataway NJ:IEEE Press,2013:722-727.DOI:10.1109/ICCT.2013.6820469.
[5] YAN Xiaohui,GUO Jiafeng,LAN Yanyan,et al.A biterm topic model for short texts[C]//Proceedings of the 22nd International Conference on World Wide Web.New York,NY:ACM Press,2013:1445-1456.DOI:10.1145/ 2488388.2488514.
[6] CHENG Xueqi,YAN Xianhui,LAN Yanyan,et al.BTM:topic modeling over short texts[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(12):2928-2941.DOI:10.1109/TKDE.2014.2313872.
[7] 张佳明,王波,唐浩浩,等.基于Biterm主题模型的无监督微博情感倾向性分析[J].计算机工程,2015,41(7): 219-223,229.DOI:10.3969/j.issn.1000-3428.2015.07.042.
[8] LI Weijiang,FENG Yanming,LI Dongjun,et al.Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm[J]. Automatic Control and Computer Sciences,2016,50(4):271-277.DOI:10.3103/ S0146411616040040.
[9] 王亚民,胡悦.基于BTM的微博舆情热点发现[J].情报杂志,2016,35(11):119-124,140.DOI:10.3969/j.issn.1002-1965.2016.11.022.
[10]HE Xingwei,XU Hua,LI Jia,et al.FastBTM:reducing the sampling time for biterm topic model[J]. Knowledge-Based Systems,2017,132:11-20.DOI:10.1016/j.knosys.2017.06.005.
[11]ZHANG Peng,LI Bicheng,YANG Ruipeng.Research on the topic evolution of microblog based on BTM-LPA[C]// Proceedings of the International Conference on Computer Science and Technology.Singapore:World Scientific,2017:860-875.DOI:10.1142/9789813146426_0098.
[12]刘少鹏,印鉴,欧阳佳,等.基于MB-HDP模型的微博主题挖掘[J].计算机学报,2015,38(7):1408-1419.DOI: 10.11897/SP.J.1016.2015.01408.
[13]黄发良,冯时,王大玲,等.基于多特征融合的微博主题情感挖掘[J].计算机学报,2017,40(4):872-888. DOI:10.11897/SP.J.1016.2017.00872.
[14]GEMAN S,GEMAN D.Stochastic relaxation, gibbs distributions and the Bayesian restoration of images[J]. Journal of Applied Statistics,1993,20(5/6):25-62.DOI:10.1080/02664769300000058.
[15]FENG Jun,FANG Yu.Research on hot topic discovery technology of micro-blog based on biterm topic model[C]//Geo-Spatial Knowledge and Intelligence: 4th International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem.Berlin:Springer,2016:234-244.DOI:10.1007/978-981- 10-3969-0_27.
[16]谢修娟,李香菊,莫凌飞.基于改进K-means算法的微博舆情分析研究[J].计算机工程与科学,2018,40(1):155-158.DOI:10.3969/j.issn.1007-130X.2018.01.023.
[17]ZHANG Huaping,YU Hongkui,XIONG Deyi,et al.HHMM-based Chinese lexical analyzer ICTCLAS[C]// Proceedings of the second SIGHAN workshop on Chinese language processin:Volume 17.Stroudsburg,PA: Association for Computational Linguistics,2003:184-187.DOI:10.3115/1119250.1119280.
[18]刘泽锦,王洁.同主题词短文本分类算法中BTM的应用与改进[J].计算机系统应用,2017,26(11):213-219.DOI: 10.15888/j.cnki.csa.006071.
[19]李卫疆,王真真,余正涛.基于BTM和K-means的微博话题检测[J].计算机科学,2017,44(2):257-261,274.DOI: 10.11896/j.issn.1002-137X.2017.02.042.
[1] ZHANG Canlong, LI Yanru, LI Zhixin, WANG Zhiwen. Block Target Tracking Based on Kernel Correlation Filter and Feature Fusion [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(5): 12-23.
[2] WANG Jian, ZHENG Qifan, LI Chao, SHI Jing. Remote Supervision Relationship Extraction Based on Encoder and Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 53-60.
[3] XIAO Yiqun, SONG Shuxiang, XIA Haiying. Fast Pedestrian Detection Method Based on Multi-Features    and Implementation [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 61-67.
[4] WANG Xun, LI Tinghui, PAN Xiao, TIAN Yu. Image Segmentation Method Based on Improved Fuzzy C-means Clustering and Otsu Maximum Variance [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 68-73.
[5] ZHANG Suiyuan, XUE Yuanhai, YU Xiaoming, LIU Yue, CHENG Xueqi. Research on Short Summary Generation of Multi-Document [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 60-74.
[6] SUN Ronghai, SHI Linfu, HUANG Liyan, TANG Zhenjun, YU Chunqiang. Reversible Data Hiding Based on Image Interpolation and Reference Matrix [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 90-104.
[7] ZHU Yongjian, PENG Ke, QI Guangwen, XIA Haiying, SONG Shuxiang. Defect Detection of Solar Panel Based on Machine Vision [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 105-112.
[8] WANG Qi,QIU Jiahui,RUAN Tong,GAO Daqi,GAO Ju. Recurrent Capsule Network for Clinical Relation Extraction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 80-88.
[9] WU Wenya,CHEN Yufeng,XU Jin’an,ZHANG Yujie. High-level Semantic Attention-based Convolutional Neural Networks for Chinese Relation Extraction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 32-41.
[10] YUE Tianchi, ZHANG Shaowu, YANG Liang, LIN Hongfei, YU Kai. Stance Detection Method Based on Two-Stage Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 42-49.
[11] YU Chuanming,LI Haonan,AN Lu. Analysis of Text Emotion Cause Based on Multi-task Deep Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 50-61.
[12] LIN Yuan, LIU Haifeng, LIN Hongfei, XU Kan. Group Ranking Methods with Loss Function Incorporation [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 62-70.
[13] WAN Fucheng,MA Ning,HE Xiangzhen. Tibetan Information Extraction Technology Integrated with Event Feature and Semantic Role Labeling [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(2): 18-23.
[14] XIA Haiying,LIU Weitao,ZHU Yongjian. An Improved Fast SUSAN Chessboard Corner Detection Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(1): 44-52.
[15] LIANG Xiaoping, LUO Xiaoshu. The Adaptive Wiener Filtering Deblurring Based on the Genetic Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(4): 17-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] MENG Chunmei, LU Shiyin, LIANG Yonghong, MO Xiaomin, LI Weidong, HUANG Yuanjie, CHENG Xiaojing, SU Zhiheng, ZHENG Hua. Electron Microscopy Study on the Apoptosis and Autophagy of the Hepatic Stellate Cells Induced by Total Alkaloids[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 76 -79 .
[2] LI Yuhui, CHEN Zening, HUANG Zhonghao, ZHOU Qihai. Activity Time Budget of Assamese macaque (Macaca assamensis) during Rainy Season in Nonggang Nature Reserve, Guangxi, China[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 80 -86 .
[3] QIN Yingying, QI Guangchao, LIANG Shichu. Effects of Eichhornia crassipes Aqueous Extracts on Seed Germination of Ottelia acuminata var. jingxiensis[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 87 -92 .