skip to main content

PENANGANAN KLASIFIKASI KELAS DATA TIDAK SEIMBANG DENGAN RANDOM OVERSAMPLING PADA NAIVE BAYES (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal)

*Reza Dwi Fitriani  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Hasbi Yasin  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Tarno Tarno  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Open Access Copyright 2021 Jurnal Gaussian under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract

The Family Planning Program (KB) launched by the Government of Indonesia to address the problem of population control does not always produce the desired program results. In 2017, there were 7 users of the IUD contraceptive type of contraceptive who failed from 1,102 new IUD users in Kendal Regency so that the ratio of success and failure to the IUD KB program when compared to users of the new IUD KB is 0.64%: 99.36% . The ratio of success and failure of family planning programs which tend to be unbalanced makes it difficult to predict. One of the handling imbalanced data is oversampling, for example using Random Oversampling (ROS). Naive Bayes is used for classification because it’s easy and efficient learning model. The data in this study used 14 independent variables and 1 dependent variable. The results of this study indicate that the G-mean of Naive Bayes is less than 60%. The G-mean of ROS-Naive Bayes is 96.6%. It can be concluded that in this research, the ROS-Naive Bayes method is better than the Naive Bayes method for detecting the success status of IUD family planning in Kendal Regency.

 

Keywords: Naive Bayes, Random Oversampling, G-mean

Note: This article has supplementary file(s).

Fulltext View|Download |  Research Instrument
Untitled
Subject
Type Research Instrument
  View (8KB)    Indexing metadata
Keywords: Naive Bayes; Random Oversampling; G-mean

Article Metrics:

  1. BKKBN. (2017). Laporan Program KB Nasional, Dalap Tabel 8A Kumulatif. Tersedia di: http://aplikasi.bkkbn.go.id/sr/Klinik/Laporan2013/Bulanan/Faskes2013Tabel8aKumulatif.aspx (Diakses pada: 26 February 2021)
  2. Chawla, N. V (2003). C4.5 and Imbalanced Data Sets : Investigating the effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure. ICML Workshop Learning from Imbalanced Data Sets II. Washington D.C
  3. Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research Vol. 16, No. 2, Hal: 321–357
  4. Chen, L., Fang, B., Shang, Z., & Tang, Y. (2018). Tackling class overlap and imbalance problems in software defect prediction. Software Quality Journal Vol. 26, No. 1, Hal: 97-125 doi: 10.1007/s11219-016-9342-6
  5. García, V., Sánchez, J. S. & Mollineda, R. A. (2012). On the effectiveness of preprocessing
  6. methods when dealing with different levels of class imbalance. Knowledge-Based Systems, Vol: 25, No. 1, Hal: 13-21. doi: 10.1016/j.knosys.2011.06.013
  7. Hastuti, Y. (2016). Klasifikasi Karakteristik Mahasiswa Universitas Cokroaminoto Palopo Menggunakan Metode Naive Bayes dan Decision Tree. Jurnal Dinamika Vol. 07, No. 2, Hal: 34-41
  8. Kubat, M. & Matwin, S. (1997). Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. Fourteenth International Conference on Machine Learning Hal: 179-186
  9. Mutrofin, S., Mualif, A., Ginardi, R. V., & Fatichah, C. (2019). Solution of Class Imbalance of K-Nearest Neighbor for Data of New Student Admission Selection. International Journal of Artificial Intelligence Research Vol. 3, No. 2, Hal: 47-55. doi: 10.29099/ijair.v3i2.92
  10. Pangastuti, S. S., Fithriasari, K. & Irawan, N. (2018). Perbandingan Metode Ensemble Random Forest dengan Smote-Boosting dan Smote-Bagging pada Klasifikasi Data Mining untuk Kelas Imbalance. Surabaya: Institut Teknologi Sepuluh Nopember
  11. Prasetyo, E. (2012). Data Mining - Konsep dan Aplikasi Menggunakan MATLAB. Yogyakarta: ANDI
  12. Prasetyo, E. (2014). Data Mining - Mengolah Data Menjadi Informasi Menggunakan Matlab. Yogyakarta: ANDI
  13. Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning
  14. Saifudin, A. & Wahono, R. P. (2015). Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software. Journal of Software Engineering Vol. 3, No. 2, Hal: 47-55
  15. Tan, P., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston: Pearson Education
  16. Ustyannie, W., & Suprapto, S. (2020). Oversampling Method to Handling Imbalanced Dataset Problem in Binary Logistic Regression Algorithm. Indonesian Journal of Computing and Cybernetics Systems Vol. 14, No.1, Hal: 1-10. doi: 10.22146/ijccs.37415

Last update:

No citation recorded.

Last update:

No citation recorded.