KLASIFIKASI KUALITAS KOPI ARABIKA DENGAN METODE RANDOM FOREST DAN K-NEAREST NEIGHBOR PADA IMBALANCED DATASET

Hagi Afdal Fatan; Tatik Widiharih; Sudarno Sudarno

doi:10.14710/j.gauss.14.1.107-117

DOI: https://doi.org/10.14710/j.gauss.14.1.107-117

KLASIFIKASI KUALITAS KOPI ARABIKA DENGAN METODE RANDOM FOREST DAN K-NEAREST NEIGHBOR PADA IMBALANCED DATASET

*Hagi Afdal Fatan - Departemen Statistika, Fakultas Sains dan Matematika, Undip, Indonesia

Tatik Widiharih - Departemen Statistika, Fakultas Sains dan Matematika, Undip, Indonesia

Sudarno Sudarno - Departemen Statistika, Fakultas Sains dan Matematika, Indonesia

BibTex Citation Data :

@article{J.Gauss40168,
    author = {Hagi Afdal Fatan and Tatik Widiharih and Sudarno Sudarno},
    title = {KLASIFIKASI KUALITAS KOPI ARABIKA DENGAN METODE RANDOM FOREST DAN K-NEAREST NEIGHBOR PADA IMBALANCED DATASET},
    journal = {Jurnal  Gaussian},
  volume = {14},
    number = {1},
    year = {2025},
    keywords = {Classification, Arabica Coffee, SMOTE, K-Nearest Neighbor, Random Forest},
    abstract = { Coffee is a superior plantation commodity in the export sector with high economic value. Coffee quality is the most important factor affecting the selling price, so coffee quality assessment is the main key in setting market prices and determining the export potential of coffee-producing countries. Coffee quality is divided into specialty, premium and regular based on bean defects and taste test values. Coffee quality prediction is needed to find out which coffee has the best quality. This study compares the Random Forest and K-Nearest Neighbor (KNN) methods to find out which algorithm is most effective in predicting coffee quality. The working principle of Random Forest is to build more than one decision tree and then determine the estimated value based on majority voting. KNN classifies data based on the distance between the data and other data. The coffee dataset used is sourced from the Coffee Quality Institute (CQI) Database. The data has problems to match resulting in a small recall value in the minority class, the SMOTE oversampling algorithm is used to improve classification performance. The advantage of oversampling compared to undersampling is that it does not lose data information. The results showed that the Random Forest method after SMOTE produced the best classification performance with accuracy and memory values of 80.26% and 80.59%, respectively. },
   issn = {2339-2541},   pages = {107--117}  doi = {10.14710/j.gauss.14.1.107-117},
    url = {https://ejournal3.undip.ac.id/index.php/gaussian/article/view/40168}
}

Citation Format:

Abstract

Coffee is a superior plantation commodity in the export sector with high economic value. Coffee quality is the most important factor affecting the selling price, so coffee quality assessment is the main key in setting market prices and determining the export potential of coffee-producing countries. Coffee quality is divided into specialty, premium and regular based on bean defects and taste test values. Coffee quality prediction is needed to find out which coffee has the best quality. This study compares the Random Forest and K-Nearest Neighbor (KNN) methods to find out which algorithm is most effective in predicting coffee quality. The working principle of Random Forest is to build more than one decision tree and then determine the estimated value based on majority voting. KNN classifies data based on the distance between the data and other data. The coffee dataset used is sourced from the Coffee Quality Institute (CQI) Database. The data has problems to match resulting in a small recall value in the minority class, the SMOTE oversampling algorithm is used to improve classification performance. The advantage of oversampling compared to undersampling is that it does not lose data information. The results showed that the Random Forest method after SMOTE produced the best classification performance with accuracy and memory values of 80.26% and 80.59%, respectively.

Note: This article has supplementary file(s).

Fulltext View|Download | Research Instrument

Subject
Type	Research Instrument
	Download (34KB) Indexing metadata

Email colleagues

Keywords: Classification, Arabica Coffee, SMOTE, K-Nearest Neighbor, Random Forest

Article Metrics:

Article Info

Section: Articles

Language : ID

In Vol 14, No 1 (2025): Jurnal Gaussian

Arifin, O., dan Sasongko, T. B. 2018. Analisa Perbandingan Tingkat Performansi Metode Support Vector Machine dan Naive Bayes Classifier Untuk Klasifikasi Jalur Minat SMA. Seminar Nasional Teknologi Informasi dan Multimedia 2018, 6(1), 67–72
Breiman, L. dan Cutler, A. 2003. Manual on Setting Up, Using, and Understanding Random Forest V4.0. Tersedia di: Using_random_forests_v4.0.pdf (berkeley.edu) (diakses pada 1 Februari 2023)
Breiman, Leo. 2001. “Random forests” in Machine Learning. 45, 5–32
Chawla, N. V., Bowyer, K. W., Hall, L. 0., dan Kegelmeyer, W. P. 2002. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Research Vol. 16, Hal. 321-357
Davis, A. P., Tosh, J., Ruch, N., dan Fay, M. F. 2011. Growing Coffee: Psilanthus (Rubiaceae) Subsumed on The Basis of Molecular and Morphological Data; Implications for The Size, Morphology, Distribution and Evolutionary History of Coffea. Botanical Journal of The Linnean Society, 167(4), 357–377
Deolika, A., Kusrini, dan Luthfi, ET. 2019. Analisis Pembobotan Kata pada Klasifikasi Text Mining. Jurnal Teknologi Informasi Vol.3, No.2, Hal: 179-184
El Houby, E. M., Yassin, N. I., dan Omran, S. 2017. A Hybrid Approach from Ant Colony Optimization and K-Nearest Neighbor for Classifying Datasets Using Selected Features. Informatica, Vol. 41, No. 4
Han, J., Kamber, M., dan Pei, J. 2011. Data Mining: Concepts and Techniques (3rd ed.). Elsevier. https://doi.org/10.1016/B978-0-12-381479-1.00001-0 (diakses pada tanggal 20 Januari 2023)
Hassanat, A. B., Abbadi, M. A., dan Altarawneh, G. A. 2014. Solving the Problem of the K Parameter in the KNN Classifier Using an Ensemble Learning Approach. International Journal of Computer Science and Information Security (IJCSIS), Vol. 12, No. 8
Mutrofin, S., Izzah, A., Kurniawardhani, A., dan Masrur, M. 2014. Optimasi Teknik Klasifikasi Modified K-Nearest Neighbor Menggunakan Algoritma Genetika. Jombang: Jurnal GAMMA, ISSN 0216-9037
Nofriansyah, D., dan Nurcahyom G.W. 2015. Algoritma Data Mining dan Pengujian. Sleman: Deepublish
Raschka, S. 2018. Model evaluation, model selection, and algorithm selection in machine learning. arXiv
Sutton C.D. 2005. Classification and Regression Trees, Bagging, and Boosting. Handbook of Statistics 24:303-329
Tolessa, K., Rademaker, M., De Baets, B., dan Boeckx, P. 2016. Prediction of Specialty Coffee Cup Quality Based on Near Infrared Spectra of Green Coffee Beans. Talanta, 150, 367-374. Https://Doi.Org/10.1016/J.Talanta.2015.12.039 (diakses pada tanggal 1 Februari 2023)

Last update:

No citation recorded.

Last update:

No citation recorded.

The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to Media Statistika journal and Department of Statistics, Universitas Diponegoro as the publisher of the journal. Copyright encompasses the rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.

Jurnal Gaussian and Department of Statistics, Universitas Diponegoro and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Gaussian journal are the sole and exclusive responsibility of their respective authors and advertisers.

The Copyright Transfer Form can be downloaded here: [Copyright Transfer Form Jurnal Gaussian]. The copyright form should be signed originally and send to the Editorial Office in the form of original mail, scanned document or fax :

Dr. Rukun Santoso (Editor-in-Chief)
Editorial Office of Jurnal Gaussian
Department of Statistics, Universitas Diponegoro
Jl. Prof. Soedarto, Kampus Undip Tembalang, Semarang, Central Java, Indonesia 50275
Telp./Fax: +62-24-7474754
Email: jurnalgaussian@gmail.com