skip to main content

PERBANDINGAN METODE SMOTE RANDOM FOREST DAN SMOTE XGBOOST UNTUK KLASIFIKASI TINGKAT PENYAKIT HEPATITIS C PADA IMBALANCE CLASS DATA

*Muhamad Syukron  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Rukun Santoso  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Tatik Widiharih  -  Departemen Statistika, Fakultas Sains dan Matematika, Universitas Diponegoro, Indonesia
Open Access Copyright 2020 Jurnal Gaussian under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract

Hepatitis causes around 1.4 million people die every year. This number makes hepatitis to be the largest contagious disease in the number of deaths after tuberculosis. Liver biopsy is still the best method for diagnosing the stage of hepatitis C, but this method is an invasive, painful, expensive, and can cause complications. Non-invasively method needs to be developed, one of non-invasif method is machine learning. Random Forest and XGboost are classification methods that are often used, since they have many advantages over classical classification methods. The SMOTE algorithm can be used to improve the accuracy of predictions from imbalanced data. the data in this study have 24 independent variables in the form of patients self-data, hepatitis C symptoms, and laboratory test results. The dependent variable in this study is a binary category, namely the level of hepatitis C disease (fibrosis and cirrhosis). The results showed that the random forest and XGboost had an accuracy of around 74% but the recall value was less than 2%. SMOTE random forest dan SMOTE XGboost have an accuracy & recall value more than 75%. SMOTE random forest has a higher accuracy for predicting fibrosis class while SMOTE XGboost is better in cirrhosis class. Variables that are more influental in determining hepatitis C stage are variables from laboratory test.

 

Keyword : Fibrosis, Cirrhosis, Random Forest, SMOTE, XGboost

Fulltext View|Download
Keywords: Fibrosis, Cirrhosis, Random Forest, SMOTE, XGboost

Article Metrics:

  1. Barakat, N. H., Barakat, S. H., & Ahmed, N., 2019. Prediction and Staging of Hepatic Fibrosis in Children with Hepatitis C Virus: A Machine Learning Approach. Healthcare Informatics Research, Volume 25,p. 173
  2. Chawla, N.V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P., 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, Volume 16, p. 321-357
  3. Chen, T. & Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System, Knowledge Discovery and Data Mining
  4. Jajo, N. & Matawie, K. M., 2019. Outlier Detection Using Boxplot. International Journal of Ecology and Development, Volume 13, pp. 116-122
  5. John, T. M. S., 2008. Signs and Symptomps that May be Associated with Hepatitis C. Hepatitis C Choices. Caring Ambassadors Program, Inc., pp. 71-80
  6. Kotsiantis, S., Pintelas, P. E., & Kanellopoulus, D., 2006. Data Preprocessing for Supervised Learning. International Journal of Computer Science, Volume 1, pp. 111-117
  7. Raschka, S., 2018. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning
  8. Sandt, L., 2008. Understanding Hepatitis C disease. Hepatitis C Choices. Caring Ambassadors Program, Inc., pp. 23-42
  9. World Health Organization (WHO), 2019. Hepatitis C. Retrieved from https://www.who.int/news=room/fact-sheets/detail/hepatitis-c

Last update:

No citation recorded.

Last update:

No citation recorded.