skip to main content

PERBANDINGAN SMOTE DAN ADASYN PADA DATA IMBALANCE UNTUK KLASIFIKASI RUMAH TANGGA MISKIN DI KABUPATEN TEMANGGUNG DENGAN ALGORITMA K-NEAREST NEIGHBOR

*Dinda Virrliana Ramadhanti  -  Departemen Statistika, Fakultas Sains dan Matematika, Undip, Indonesia
Rukun Santoso  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Tatik Widiharih  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Open Access Copyright 2023 Jurnal Gaussian under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract

Poverty is a global problem that has occurred in various countries with various impacts. Poverty conditions are characterized by the inability of a person or household to meet the basic needs of life. Socio-economic problems, such as poverty, can be handled using machine learning, one of which is classification. The classification of households based on poverty criteria is expected to assist the government in preparing programs that are right on target. K-Nearest Neighbor is one of the easy-to-use classification algorithms. this classification is based on the closest neighborliness. The problem that can be experienced when classifying is if the data used is imbalanced. The data imbalance will causing the classification process to focus more on the majority class. SMOTE and ADASYN are used to solve the problem of imbalanced data. This study resulted in the addition of  SMOTE and ADASYN to imbalanced data can improve classification performance, especially on the G-mean value. G-mean is a performance measure that is widely used in the case of imbalanced data. The result of this study is that SMOTE can increase the G-mean value to 58.5%, while ADASYN is 57.3%. Therefore, it can be concluded that SMOTE-KNN is the best classification model for household poverty classification.

Fulltext View|Download
Keywords: Household Poverty; K-Nearest Neighbor; Imbalanced data; SMOTE; ADASYN

Article Metrics:

  1. Badan Pusat Statistik (BPS). 2021. Kemiskinan dan Ketimpangan. (www.bps.go.id/subject/23/kemiskinan-dan-ketimpangan.html#subjekViewTab1)
  2. Myong Choi, J. 2010. A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines Recommended Citation "A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines’, pp. 4–17. Available at: https://core.ac.uk/download/pdf/38924689.pdf
  3. Han, J. dan Kamber, M. 2006. Data Mining Concepts and Techniques Second Edition. San Francisco: Morgan Kaufmann. ( http://hanj.cs.illinois.edu/bk2/bib/ch6bib.pdf)
  4. He, H., Bai, Y., Garcia, E. A., & Li, S. .2008. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE. ( https://ieeexplore.ieee.org/abstract/document/4633969)
  5. He, H., E.A. Gracia. 2009. Learning from Imbalanced Data, IEEE Trans. Knowl. Discov. 21(9) 1263–1284. ( https://ieeexplore.ieee.org/abstract/document/5128907)
  6. He, H and Y. Ma. 2013. Imbalanced Learning - Foundations, Algorithms, and Applications, 1st ed. New Jersey: The Institute of Electrical and Electronics Engineers, Inc
  7. Ispriyanti, D., Prahutama, A., & Mustafid, M. 2019. Analisis Klasifikasi Kemiskinan di Kota Semarang Menggunakan Algoritma Quest. Jurnal Statistika Universitas Muhammadiyah Semarang, 7(1)
  8. ( https://jurnal.unimus.ac.id/index.php/statistik/article/view/4805)
  9. Kubat, M., Holte, R. and Matwin, S.1997. Learning when Negative Examples Abound. In European Conference on Machine Learning (pp. 146-153). Springer, Berlin, Heidelberg. ( https://link.springer.com/chapter/10.1007/3-540-62858-4_79)
  10. Patgiri, C., & Ganguly, A. 2021. Adaptive thresholding technique based classification of red blood cell and sickle cell using Naïve Bayes Classifier and K-Nearest Neighbor classifier. Biomedical Signal Processing and Control, 68, 102745. ( https://www.sciencedirect.com/science/article/)
  11. Prasetyo, E. 2012. Data Mining Konsep dan Aplikasi Menggunakan MATLAB. Yogyakarta: ANDI Yogyakarta

Last update:

No citation recorded.

Last update:

No citation recorded.