K-NEAREST NEIGHBOR DENGAN ADAPTIVE BOOSTING DAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE UNTUK KLASIFIKASI DATA TIDAK SEIMBANG

Ria Sulistyo Yuliani; Agus Rusgiyono; Rukun Santoso

doi:10.14710/j.gauss.12.2.231-241

DOI: https://doi.org/10.14710/j.gauss.12.2.231-241

K-NEAREST NEIGHBOR DENGAN ADAPTIVE BOOSTING DAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE UNTUK KLASIFIKASI DATA TIDAK SEIMBANG

*Ria Sulistyo Yuliani - Departemen Statistika, FSM, Universitas Diponegoro, Indonesia

Agus Rusgiyono - Departemen Statistika, FSM, Universitas Diponegoro, Indonesia

Rukun Santoso - Departemen Statistika, FSM, Universitas Diponegoro, Indonesia

BibTex Citation Data :

@article{J.Gauss36494,
    author = {Ria Yuliani and Agus Rusgiyono and Rukun Santoso},
    title = {K-NEAREST NEIGHBOR DENGAN ADAPTIVE BOOSTING DAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE UNTUK KLASIFIKASI DATA TIDAK SEIMBANG},
    journal = {Jurnal  Gaussian},
  volume = {12},
    number = {2},
    year = {2023},
    keywords = {Breast Cancer, K-Nearest Neighbor, Imbalanced Data, Synthetic Minority Oversampling Technique, Adaptive Boosting},
    abstract = {Breast cancer is non-skin cancer that is caused by several factors, including glandular ducts, cells, and breast support tissue, except for the skin of the breast. Breast cancer if not treated immediately will be fatal for the sufferer, so early detection of breast cancer is important for the patient's safety. The success of breast cancer detection depends on the right diagnosis. Measurement of the accuracy of a breast cancer diagnosis can be assisted by statistical methods, namely classification. K-Nearest Neighbor is a classification algorithm based on the nearest neighbor that is easy to implement. In the classification process, there are several problems including when faced with imbalanced data. Imbalanced data can cause classification algorithms to tend to focus on the majority class. Data imbalance can be overcome by using Synthetic Minority Oversampling Technique (SMOTE). Ensemble methods can be applied to improve the performance of imbalanced data classification, one of which is Adaptive Boosting. This study applies K-Nearest Neighbor combined with Adaptive Boosting and SMOTE for handling imbalanced data classification. The results of this study are, SMOTE can handle the problem of imbalanced data and the application of K-Nearest Neighbor with Adaptive Boosting can produce an accuracy of 80%, a sensitivity of 83,33%, a specificity of 66,67%, and a G-Mean value of 74,54%. So it can be concluded that K-Nearest Neighbor combined with Adaptive Boosting and SMOTE can be applied for handling imbalanced data classification.   },
   issn = {2339-2541},   pages = {231--241}  doi = {10.14710/j.gauss.12.2.231-241},
    url = {https://ejournal3.undip.ac.id/index.php/gaussian/article/view/36494}
}

Citation Format:

Abstract

Breast cancer is non-skin cancer that is caused by several factors, including glandular ducts, cells, and breast support tissue, except for the skin of the breast. Breast cancer if not treated immediately will be fatal for the sufferer, so early detection of breast cancer is important for the patient's safety. The success of breast cancer detection depends on the right diagnosis. Measurement of the accuracy of a breast cancer diagnosis can be assisted by statistical methods, namely classification. K-Nearest Neighbor is a classification algorithm based on the nearest neighbor that is easy to implement. In the classification process, there are several problems including when faced with imbalanced data. Imbalanced data can cause classification algorithms to tend to focus on the majority class. Data imbalance can be overcome by using Synthetic Minority Oversampling Technique (SMOTE). Ensemble methods can be applied to improve the performance of imbalanced data classification, one of which is Adaptive Boosting. This study applies K-Nearest Neighbor combined with Adaptive Boosting and SMOTE for handling imbalanced data classification. The results of this study are, SMOTE can handle the problem of imbalanced data and the application of K-Nearest Neighbor with Adaptive Boosting can produce an accuracy of 80%, a sensitivity of 83,33%, a specificity of 66,67%, and a G-Mean value of 74,54%. So it can be concluded that K-Nearest Neighbor combined with Adaptive Boosting and SMOTE can be applied for handling imbalanced data classification.

Note: This article has supplementary file(s).

Fulltext View|Download | Research Instrument

Untitled

Subject
Type	Research Instrument
	Download (548KB) Indexing metadata

Email colleagues

Keywords: Breast Cancer, K-Nearest Neighbor, Imbalanced Data, Synthetic Minority Oversampling Technique, Adaptive Boosting

Article Metrics:

Article Info

Section: Articles

Language : EN

In Vol 12, No 2 (2023): Jurnal Gaussian

Bromberg, S.E., Moraes, P.R.A.d.F, dan Ades, F., 2018. Prime incision: A minimally invasive approach to breast cancer surgical treatment-A 2 cohort retrospective comparison with conventional breast conserving surgery. Tersedia: https://doi.org/10.1371/journal.pone.0191056 (diakses pada tanggal 21 juni 2022)
Cahyanti, D., Rahmayani, A., dan Husnair, S. 2020. Analisis Performa Metode KNN pada Dataset Pasien Pengidap Kanker Payudara. Indonesian Journal of Data Science Vol.1, No. 2, Hal: 39-43
Chawla, N., Bowyer, K., Hall, L., dan Kegelmeyer, W. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research Vol. 16, No. 1, Hal: 321–357
Choi, J. M. 2010. A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines. Graduate Theses and Dissertations, Paper 11529
Farahdiba, B., dan Nugroho, Y. 2016. Klasifikasi Kanker Payudara Menggunakan Algoritma Gain Ratio. Jurnal 1 Teknik Elektro Vol. 8, No. 2
Fitriani, R. D., Yasin, H., dan Tarno, T. 2021. Penanganan Klasifikasi Kelas Data
Tidak Seimbang dengan Random Oversampling Pada Naive Bayes (Studi
Kasus: Status Peserta KB IUD di Kabupaten Kendal). Jurnal Gaussian
Vol. 10, No. 2, Hal:11-20
Gorunescu, F. 2011. Data mining: Concepts, Model, and Technique. Jerman:
Springer
Han, J., Kamber, M., dan Pei, J. 2006. Data mining: Concept and Techniques. Waltham: Morgan Kaufmann Publisher
Han, J., Kamber, dan M., Pei, J. 2012. Data mining Concepts and Techniques 3rd Edition. Kaufman Publisher, USA
Khasanah, A., Muladi, Pujianto, U. 2019. Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi Vol.3, No. 2, Hal: 196-201
Nikhitha, M. dan Jabbar, M.A. 2019. K Nearest Neighbor Based Model For Intrusion Detection System. International Journal of Recent Technology and Engineering (IJRTE) Vol. 8, No. 2, Hal: 2277-3878
Novakovic, Jasmina. 2010. The Impact of Feature Selection on the Accuracy of Naïve Bayes Classifier Vol 2, Hal: 1113–16
Nugraha, A. F., dan Rahman, L. 2019. Meta-algorithms for improving classification performance in the web-phishing detection process. 4th Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE Vol. 6, Hal: 271–275
Nurmasani, A., dan Pristyanto, Y. 2021. Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class. Pseudocode Vol. 8, No. 1, Hal: 21–26
Prianti, A., Santoso, R., Hakim, A. 2020. Perbandingan Metode K-Nearest Neighbor dan Adaptive Boosting pada Kasus Klasifikasi Multi Kelas. Jurnal Gaussian Vol. 9, No. 3, Hal: 346-354
Rais, N., Subekti, A. 2019. Integrasi SMOTE dan Ensemble Adaboost untuk Mengatasi Imbalance Class pada Data Bank Direct Marketing. Jurnal Informatika Vol. 6, No. 3, Hal: 278-285
Raschka, S. 2018. Model evaluation, model selection, and algorithm selection in
machine learning. arXiv preprint arXiv:1811.12808
Rejani, Y. dan Selvi, S. 2009. Early Detection of Breast Cancer Using SVM Classifier Technique Vol. 1, No. 3, Hal: 127–130
Versaggi, S.L., dan Leucio, A. d. 2022. Breast Biopsy National Library of Medicine in StatPearls Publishing. Tersedia: https://www.ncbi.nlm.nih.gov/books/NBK559147/ (diakses pada tanggal 21 Juni 2022)
Wimmer, H. 2018. Effect of Normalization Techniques on Logistic Regression in Data Science. Proceeding of the Conference on Information Systems Applied Research Hal: 1-9

Last update:

No citation recorded.

Last update:

No citation recorded.

The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to Media Statistika journal and Department of Statistics, Universitas Diponegoro as the publisher of the journal. Copyright encompasses the rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.

Jurnal Gaussian and Department of Statistics, Universitas Diponegoro and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Gaussian journal are the sole and exclusive responsibility of their respective authors and advertisers.

The Copyright Transfer Form can be downloaded here: [Copyright Transfer Form Jurnal Gaussian]. The copyright form should be signed originally and send to the Editorial Office in the form of original mail, scanned document or fax :

Dr. Rukun Santoso (Editor-in-Chief)
Editorial Office of Jurnal Gaussian
Department of Statistics, Universitas Diponegoro
Jl. Prof. Soedarto, Kampus Undip Tembalang, Semarang, Central Java, Indonesia 50275
Telp./Fax: +62-24-7474754
Email: jurnalgaussian@gmail.com