skip to main content

KLASIFIKASI KUALITAS KOPI ARABIKA DENGAN METODE RANDOM FOREST DAN K-NEAREST NEIGHBOR PADA IMBALANCED DATASET

*Hagi Afdal Fatan  -  Departemen Statistika, Fakultas Sains dan Matematika, Undip, Indonesia
Tatik Widiharih  -  Departemen Statistika, Fakultas Sains dan Matematika, Undip, Indonesia
Sudarno Sudarno  -  Departemen Statistika, Fakultas Sains dan Matematika, Indonesia
Open Access Copyright 2025 Jurnal Gaussian under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract

Coffee is a superior plantation commodity in the export sector with high economic value. Coffee quality is the most important factor affecting the selling price, so coffee quality assessment is the main key in setting market prices and determining the export potential of coffee-producing countries. Coffee quality is divided into specialty, premium and regular based on bean defects and taste test values. Coffee quality prediction is needed to find out which coffee has the best quality. This study compares the Random Forest and K-Nearest Neighbor (KNN) methods to find out which algorithm is most effective in predicting coffee quality. The working principle of Random Forest is to build more than one decision tree and then determine the estimated value based on majority voting. KNN classifies data based on the distance between the data and other data. The coffee dataset used is sourced from the Coffee Quality Institute (CQI) Database. The data has problems to match resulting in a small recall value in the minority class, the SMOTE oversampling algorithm is used to improve classification performance. The advantage of oversampling compared to undersampling is that it does not lose data information. The results showed that the Random Forest method after SMOTE produced the best classification performance with accuracy and memory values of 80.26% and 80.59%, respectively.

Note: This article has supplementary file(s).

Fulltext View|Download |  Research Instrument

Subject
Type Research Instrument
  Download (34KB)    Indexing metadata
Keywords: Classification, Arabica Coffee, SMOTE, K-Nearest Neighbor, Random Forest

Article Metrics:

  1. Arifin, O., dan Sasongko, T. B. 2018. Analisa Perbandingan Tingkat Performansi Metode Support Vector Machine dan Naive Bayes Classifier Untuk Klasifikasi Jalur Minat SMA. Seminar Nasional Teknologi Informasi dan Multimedia 2018, 6(1), 67–72
  2. Breiman, L. dan Cutler, A. 2003. Manual on Setting Up, Using, and Understanding Random Forest V4.0. Tersedia di: Using_random_forests_v4.0.pdf (berkeley.edu) (diakses pada 1 Februari 2023)
  3. Breiman, Leo. 2001. “Random forests” in Machine Learning. 45, 5–32
  4. Chawla, N. V., Bowyer, K. W., Hall, L. 0., dan Kegelmeyer, W. P. 2002. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Research Vol. 16, Hal. 321-357
  5. Davis, A. P., Tosh, J., Ruch, N., dan Fay, M. F. 2011. Growing Coffee: Psilanthus (Rubiaceae) Subsumed on The Basis of Molecular and Morphological Data; Implications for The Size, Morphology, Distribution and Evolutionary History of Coffea. Botanical Journal of The Linnean Society, 167(4), 357–377
  6. Deolika, A., Kusrini, dan Luthfi, ET. 2019. Analisis Pembobotan Kata pada Klasifikasi Text Mining. Jurnal Teknologi Informasi Vol.3, No.2, Hal: 179-184
  7. El Houby, E. M., Yassin, N. I., dan Omran, S. 2017. A Hybrid Approach from Ant Colony Optimization and K-Nearest Neighbor for Classifying Datasets Using Selected Features. Informatica, Vol. 41, No. 4
  8. Han, J., Kamber, M., dan Pei, J. 2011. Data Mining: Concepts and Techniques (3rd ed.). Elsevier. https://doi.org/10.1016/B978-0-12-381479-1.00001-0 (diakses pada tanggal 20 Januari 2023)
  9. Hassanat, A. B., Abbadi, M. A., dan Altarawneh, G. A. 2014. Solving the Problem of the K Parameter in the KNN Classifier Using an Ensemble Learning Approach. International Journal of Computer Science and Information Security (IJCSIS), Vol. 12, No. 8
  10. Mutrofin, S., Izzah, A., Kurniawardhani, A., dan Masrur, M. 2014. Optimasi Teknik Klasifikasi Modified K-Nearest Neighbor Menggunakan Algoritma Genetika. Jombang: Jurnal GAMMA, ISSN 0216-9037
  11. Nofriansyah, D., dan Nurcahyom G.W. 2015. Algoritma Data Mining dan Pengujian. Sleman: Deepublish
  12. Raschka, S. 2018. Model evaluation, model selection, and algorithm selection in machine learning. arXiv
  13. Sutton C.D. 2005. Classification and Regression Trees, Bagging, and Boosting. Handbook of Statistics 24:303-329
  14. Tolessa, K., Rademaker, M., De Baets, B., dan Boeckx, P. 2016. Prediction of Specialty Coffee Cup Quality Based on Near Infrared Spectra of Green Coffee Beans. Talanta, 150, 367-374. Https://Doi.Org/10.1016/J.Talanta.2015.12.039 (diakses pada tanggal 1 Februari 2023)

Last update:

No citation recorded.

Last update:

No citation recorded.