广西师范大学学报(自然科学版) ›› 2015, Vol. 33 ›› Issue (4): 55-62.doi: 10.16088/j.issn.1001-6600.2015.04.010

• • 上一篇    下一篇

基于LPP和l2,1的KNN填充算法

苏毅娟1, 孙可2,3, 邓振云2,3, 尹科军2,3   

  1. 1.广西师范学院计算机与信息工程学院,广西南宁530023;
    2.广西师范大学计算机科学与信息工程学院,广西桂林541004;
    3.广西师范大学广西多源信息挖掘与安全重点实验室,广西桂林541004
  • 收稿日期:2015-03-16 出版日期:2015-12-25 发布日期:2018-09-21
  • 通讯作者: 苏毅娟(1976—),女,广西灵川人,广西师范学院副教授。E-mail: syj76@163.com
  • 基金资助:
    国家自然科学基金资助项目(61170131,61263035,61363009);国家863计划资助项目(2012AA011005);国家973计划资助项目(2013CB329404);广西自然科学基金资助项目(2012GXNSFGA060004,2015GXNSFAA139306);广西八桂创新团队和广西百人计划资助项目

KNN Imputation Algorithm Based on LPP and l2,1

SU Yi-juan1, SUN Ke2,3 , DENG Zhen-yun2,3, YIN Ke-jun2,3   

  1. 1.College of Computer and Information Engineering,Guangxi Teachers Education University,Nanning Guangxi 530023,China;
    2. College of Computer Science and Information Technology,Guangxi Normal University,Guilin Guangxi 541004, China;
    3.Guangxi Key Lab of Multi-sourceInformation Mining & Security, Guangxi Normal University,Guilin Guangxi 541004, China
  • Received:2015-03-16 Online:2015-12-25 Published:2018-09-21

摘要: 传统的KNN缺失值填充算法存在没有利用样本间属性的相关性,也没有考虑到保持样本数据本身的结构和去除噪声样本的问题。本文提出利用训练样本重构测试样本从而进行最近邻缺失值填充的方法,该方法重构过程充分利用样本间的相关性,也用到LPP(保局投影)保持数据结构在重构过程中不变,同时引入l2,1范式用于去除噪声样本。在UCI数据集上的仿真实验结果表明,该方法比传统的KNN填充算法以及基于属性信息熵的Entropy-KNN算法有更高的预测准确度。

关键词: 缺失值填充, K最近邻, 保局投影, 重构

Abstract: Traditional KNN missing data filling algorithm does not utilize the correlation between the properties of samples, Neither considers but also does not consider to maintain the sample structures and removes noise samples. In this paper, a method of using training samples to reconstruct the test sample is proposed, which is used for the nearest neighbor missing data imputation. The method makes full use of the correlation between samples, uses the LPP (locality preserving projection) to maintain the data structure in the process of reconstruction, and uses l2,1 norm to remove noise samples. Simulation experiments on UCI data sets show that the proposed method has higher prediction accuracy than the traditional KNN algorithm and Entropy-KNN algorithm based on attribute information entropy.

Key words: missing data imputation, KNN, LPP, reconstruction

中图分类号: 

  • TP181
[1] ZHANG Shi-chao, JIN Zhi, ZHU Xiao-feng. Missing data imputation by utilizing information within incomplete instances[J].Journal of Systems and Software,2011,84(3):452-459.
[2] ZHU Xiao-feng,ZHANG Shi-chao, JIN Zhi,et al.Missing value estimation for mixed-attribute data sets[J].IEEE Trans Knowl Data Eng,2011,23(1):110-121.
[3] ZHANG Shi-chao,ZHANG Cheng-qi.Propagating temporal relations of intervals by matrix[J].Applied Artificial Intelligence,2002,16(1):1-27.
[4] SILVA-RAMIREZ E L, PINO-MEJIAS R, LOPEZ-COELLO M,et al.Missing value imputation on missing completely at random data using multilayer perceptrons[J].Neural Networks,2011,24(1):121-129.
[5] BU Fan-yu,CHEN Zhi-kui ,ZHANG Qing-chen,et al.Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud[EB/OL]. (2015-05-06) [2015-06-22]. http://link.springer.com/article/10.1007/s11227-015-1433-9.
[6] RAHMAN M G,ISLAM M Z.FIMUS: a framework for imputing missing values using co-appearance,correlation and similarity analysis[J].Knowl Based Syst,2014,56:311-327.
[7] ZHU Xiao-feng,HUANG Zi,SHEN Heng-tao,et al.Dimensionality reduction by mixed kernel canonical correlation analysis[J].Pattern Recognition,2012,45(8):3003-3016.
[8] ZHU Xiao-feng,HUANG Zi,CHENG Hong,et al.Sparse hashing for fast multimedia search[J].ACM Trans Inf Syst,2013,31(2):9.
[9] ZHU Xiao-feng,HUANG Zi,YANG Yang,et al.Self-taught dimensionality reduction on the high-dimensional small-sized data[J].Pattern Recognition,2013,46(1):215-229.
[10] HE Xiao-fei,NIYOGI P.Locality preserving projections[C]//THRUN S, SAUL L K, SCHOLKOPF B. Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press, 2004:153-160.
[11] QIN Zhen-xing,WANG A T,ZHANG Cheng-qi,et al.Cost-sensitive classification with k-nearest neighbors[C]//Knowledge Science, Engineering and Management:LNCS Volume 8041.Berlin:Springer,2013:112-131.
[12] HAN Jia-wei,KAMBER M,PEI Jian.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2012:275-276.
[13] HASTIE T, TIBSHIRANI R, FRIEDMAN J.统计学习基础:数据挖掘、推理与预测[M].范明,柴玉梅,咎红英,译.北京:电子工业出版社2004:28-32.
[14] 王超学,潘正茂,马春森,等. 改进型加权KNN算法的不平衡数据集分类[J].计算机工程,2012,38(20):160-163, 168.
[15] ZHANG Shi-chao,ZHU Man-long.Weighting imputation methods and their evaluation under shell-neighbor machine[C]//Proceeding of 9th IEEE International Conference on Cognitive Informatics. Piscataway, NJ: IEEE Press, 2010:874-879.
[16] ZHU Xiao-feng,HUANG Zi, CUI Jiang-tao , et al.Video-to-shot tag propagation by graph sparse group lasso[J]. IEEE Transactions on Multimedia,2013,15(3): 633-646.
[17] 童先群,周忠眉.基于属性值信息熵的KNN改进算法[J].计算机工程与应用,2010,46(3):115-117.
[1] 宗鸣, 龚永红, 文国秋, 程德波, 朱永华. 基于稀疏学习的kNN分类[J]. 广西师范大学学报(自然科学版), 2016, 34(3): 39-45.
[2] 邓振云, 龚永红, 孙可, 张继连. 基于局部相关性的kNN分类算法[J]. 广西师范大学学报(自然科学版), 2016, 34(1): 52-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发