skip to main content

K-NEAREST NEIGHBOR DENGAN ADAPTIVE BOOSTING DAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE UNTUK KLASIFIKASI DATA TIDAK SEIMBANG

*Ria Sulistyo Yuliani  -  Departemen Statistika, FSM, Universitas Diponegoro, Indonesia
Agus Rusgiyono  -  Departemen Statistika, FSM, Universitas Diponegoro, Indonesia
Rukun Santoso  -  Departemen Statistika, FSM, Universitas Diponegoro, Indonesia
Open Access Copyright 2023 Jurnal Gaussian under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract
Breast cancer is non-skin cancer that is caused by several factors, including glandular ducts, cells, and breast support tissue, except for the skin of the breast. Breast cancer if not treated immediately will be fatal for the sufferer, so early detection of breast cancer is important for the patient's safety. The success of breast cancer detection depends on the right diagnosis. Measurement of the accuracy of a breast cancer diagnosis can be assisted by statistical methods, namely classification. K-Nearest Neighbor is a classification algorithm based on the nearest neighbor that is easy to implement. In the classification process, there are several problems including when faced with imbalanced data. Imbalanced data can cause classification algorithms to tend to focus on the majority class. Data imbalance can be overcome by using Synthetic Minority Oversampling Technique (SMOTE). Ensemble methods can be applied to improve the performance of imbalanced data classification, one of which is Adaptive Boosting. This study applies K-Nearest Neighbor combined with Adaptive Boosting and SMOTE for handling imbalanced data classification. The results of this study are, SMOTE can handle the problem of imbalanced data and the application of K-Nearest Neighbor with Adaptive Boosting can produce an accuracy of 80%, a sensitivity of 83,33%, a specificity of 66,67%, and a G-Mean value of 74,54%. So it can be concluded that K-Nearest Neighbor combined with Adaptive Boosting and SMOTE can be applied for handling imbalanced data classification.

 

Note: This article has supplementary file(s).

Fulltext View|Download |  Research Instrument
Untitled
Subject
Type Research Instrument
  Download (548KB)    Indexing metadata
Keywords: Breast Cancer, K-Nearest Neighbor, Imbalanced Data, Synthetic Minority Oversampling Technique, Adaptive Boosting

Article Metrics:

  1. Bromberg, S.E., Moraes, P.R.A.d.F, dan Ades, F., 2018. Prime incision: A minimally invasive approach to breast cancer surgical treatment-A 2 cohort retrospective comparison with conventional breast conserving surgery. Tersedia: https://doi.org/10.1371/journal.pone.0191056 (diakses pada tanggal 21 juni 2022)
  2. Cahyanti, D., Rahmayani, A., dan Husnair, S. 2020. Analisis Performa Metode KNN pada Dataset Pasien Pengidap Kanker Payudara. Indonesian Journal of Data Science Vol.1, No. 2, Hal: 39-43
  3. Chawla, N., Bowyer, K., Hall, L., dan Kegelmeyer, W. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research Vol. 16, No. 1, Hal: 321–357
  4. Choi, J. M. 2010. A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines. Graduate Theses and Dissertations, Paper 11529
  5. Farahdiba, B., dan Nugroho, Y. 2016. Klasifikasi Kanker Payudara Menggunakan Algoritma Gain Ratio. Jurnal 1 Teknik Elektro Vol. 8, No. 2
  6. Fitriani, R. D., Yasin, H., dan Tarno, T. 2021. Penanganan Klasifikasi Kelas Data
  7. Tidak Seimbang dengan Random Oversampling Pada Naive Bayes (Studi
  8. Kasus: Status Peserta KB IUD di Kabupaten Kendal). Jurnal Gaussian
  9. Vol. 10, No. 2, Hal:11-20
  10. Gorunescu, F. 2011. Data mining: Concepts, Model, and Technique. Jerman:
  11. Springer
  12. Han, J., Kamber, M., dan Pei, J. 2006. Data mining: Concept and Techniques. Waltham: Morgan Kaufmann Publisher
  13. Han, J., Kamber, dan M., Pei, J. 2012. Data mining Concepts and Techniques 3rd Edition. Kaufman Publisher, USA
  14. Khasanah, A., Muladi, Pujianto, U. 2019. Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi Vol.3, No. 2, Hal: 196-201
  15. Nikhitha, M. dan Jabbar, M.A. 2019. K Nearest Neighbor Based Model For Intrusion Detection System. International Journal of Recent Technology and Engineering (IJRTE) Vol. 8, No. 2, Hal: 2277-3878
  16. Novakovic, Jasmina. 2010. The Impact of Feature Selection on the Accuracy of Naïve Bayes Classifier Vol 2, Hal: 1113–16
  17. Nugraha, A. F., dan Rahman, L. 2019. Meta-algorithms for improving classification performance in the web-phishing detection process. 4th Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE Vol. 6, Hal: 271–275
  18. Nurmasani, A., dan Pristyanto, Y. 2021. Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class. Pseudocode Vol. 8, No. 1, Hal: 21–26
  19. Prianti, A., Santoso, R., Hakim, A. 2020. Perbandingan Metode K-Nearest Neighbor dan Adaptive Boosting pada Kasus Klasifikasi Multi Kelas. Jurnal Gaussian Vol. 9, No. 3, Hal: 346-354
  20. Rais, N., Subekti, A. 2019. Integrasi SMOTE dan Ensemble Adaboost untuk Mengatasi Imbalance Class pada Data Bank Direct Marketing. Jurnal Informatika Vol. 6, No. 3, Hal: 278-285
  21. Raschka, S. 2018. Model evaluation, model selection, and algorithm selection in
  22. machine learning. arXiv preprint arXiv:1811.12808
  23. Rejani, Y. dan Selvi, S. 2009. Early Detection of Breast Cancer Using SVM Classifier Technique Vol. 1, No. 3, Hal: 127–130
  24. Versaggi, S.L., dan Leucio, A. d. 2022. Breast Biopsy National Library of Medicine in StatPearls Publishing. Tersedia: https://www.ncbi.nlm.nih.gov/books/NBK559147/ (diakses pada tanggal 21 Juni 2022)
  25. Wimmer, H. 2018. Effect of Normalization Techniques on Logistic Regression in Data Science. Proceeding of the Conference on Information Systems Applied Research Hal: 1-9

Last update:

No citation recorded.

Last update:

No citation recorded.