GS4: Generating synthetic samples for semi-supervised nearest neighbor classification

Panagiotis Moutafis, Ioannis A. Kakadiaris

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

In this paper, we propose a method to improve nearest neighbor classification accuracy under a semi-supervised setting. We call our approach GS4 (i.e., Generating Synthetic Samples Semi-Supervised). Existing self-training approaches classify unlabeled samples by exploiting local information. These samples are then incorporated into the training set of labeled data. However, errors are propagated and misclassifications at an early stage severely degrade the classification accuracy. To address this problem, the proposed method exploits the unlabeled data by using weights proportional to the classification confidence to generate synthetic samples. Specifically, our scheme is inspired by the Synthetic Minority Over-Sampling Technique. That is, each unlabeled sample is used to generate as many labeled samples as the number of classes represented by its k-nearest neighbors. In particular, the distance of each synthetic sample from its k-nearest neighbors of the same class is proportional to the classification confidence. As a result, the robustness to misclassification errors is increased and better accuracy is achieved. Experimental results using publicly available datasets demonstrate that statistically significant improvements are obtained when the proposed approach is employed.

Original languageEnglish (US)
Title of host publicationTrends and Applications in Knowledge Discovery and Data Mining - PAKDD 2014 International Workshops
Subtitle of host publicationDANTH, BDM, MobiSocial, BigEC, CloudSD, MSMV-MBI, SDA, DMDA-Health, ALSIP, SocNet, DMBIH, BigPMA, Revised Selected Papers
EditorsWen-Chih Peng, Haixun Wang, Zhi-Hua Zhou, Tu Bao Ho, Vincent S. Tseng, Arbee L.P. Chen, James Bailey
PublisherSpringer-Verlag
Pages393-403
Number of pages11
ISBN (Electronic)9783319131856
DOIs
StatePublished - 2014
EventInternational Workshops on Data Mining and Decision Analytics for Public Health, Biologically Inspired Data Mining Techniques, Mobile Data Management, Mining, and Computing on Social Networks, Big Data Science and Engineering on E-Commerce, Cloud Service Discovery, MSMV-MBI, Scalable Dats Analytics, Data Mining and Decision Analytics for Public Health and Wellness, Algorithms for Large-Scale Information Processing in Knowledge Discovery, Data Mining in Social Networks, Data Mining in Biomedical informatics and Healthcare, Pattern Mining and Application of Big Data in conjunction with 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2014 - Tainan, Taiwan, Province of China
Duration: May 13 2014May 16 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8643
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Workshops on Data Mining and Decision Analytics for Public Health, Biologically Inspired Data Mining Techniques, Mobile Data Management, Mining, and Computing on Social Networks, Big Data Science and Engineering on E-Commerce, Cloud Service Discovery, MSMV-MBI, Scalable Dats Analytics, Data Mining and Decision Analytics for Public Health and Wellness, Algorithms for Large-Scale Information Processing in Knowledge Discovery, Data Mining in Social Networks, Data Mining in Biomedical informatics and Healthcare, Pattern Mining and Application of Big Data in conjunction with 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2014
Country/TerritoryTaiwan, Province of China
CityTainan
Period5/13/145/16/14

Keywords

  • Classification
  • K-nearest neighbor
  • Semi-supervised learning
  • Synthetic samples

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'GS4: Generating synthetic samples for semi-supervised nearest neighbor classification'. Together they form a unique fingerprint.

Cite this