Penerapan Random Forest untuk Klasifikasi Diagnosis Kanker Payudara Berbasis Dataset WBCD
Abstract
Breast cancer is one of the most critical global health challenges, with Indonesia recording 66,271 new cases in 2022 according to GLOBOCAN data published by the International Agency for Research on Cancer (IARC/WHO). Early and accurate detection is essential to improving patient survival rates, yet conventional diagnosis remains time-consuming and dependent on expert availability. This study implements the Random Forest algorithm to classify breast cancer diagnosis using the Wisconsin Breast Cancer Diagnostic (WBCD) dataset from the UCI Machine Learning Repository. The dataset consists of 569 samples with 30 numerical features extracted from fine-needle aspirate (FNA) cell images, labeled as benign or malignant. Data preprocessing involved removing non-predictive columns, converting categorical labels to binary format, handling outliers using IQR Clipping, and applying StandardScaler normalization. The dataset was split into 80% training and 20% testing using stratified splitting, with the Random Forest Classifier configured using 100 decision trees and class_weight=balanced to handle class imbalance. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics alongside confusion matrix analysis and 5-Fold Stratified Cross Validation. The model achieved 97.37% accuracy on the test set, with zero False Positive predictions, meaning no benign patient was misdiagnosed as malignant. Cross-validation confirmed generalization ability with a mean accuracy of 96.31%, indicating no overfitting. Feature importance analysis identified area_worst, concave points_worst, and perimeter_worst as the most dominant features, consistent with the clinical morphological characteristics of malignant cancer cells. These findings demonstrate the strong potential of Random Forest as a reliable and interpretable tool for supporting breast cancer diagnosis.
References
S. M. Khoirunnisa, D. Setiawan, M. J. Postma, and L. A. De Jong, “Cancer Treatment and Research Communications Trends in breast cancer in Indonesia from 2017 to 2020 : A national-level analysis by age and disease severity,” vol. 45, no. September, 2025.
S. Zakareya, H. Izadkhah, and J. Karimpour, “A New Deep-Learning-Based Model for Breast Cancer Diagnosis from Medical Images,” pp. 1–23, 2023.
S. Devi, R. K. Ghanekar, J. A. Pande, D. Dumbre, R. Chavan, and H. Gupta, “Prediction and Diagnosis of Breast Cancer Using Machine and Modern Deep Learning Models,” vol. 25, pp. 1077–1085, 2024, doi: 10.31557/APJCP.2024.25.3.1077.
T. Islam et al., “Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI,” Sci. Rep., pp. 1–17, 2024, doi: 10.1038/s41598-024-57740-5.
A. A. Balasubramanian et al., “Ensemble Deep Learning-Based Image Classification for Breast Cancer Subtype and Invasiveness Diagnosis from Whole Slide Image Histopathology,” 2024.
A. Jafari, “Computer Methods in Biomechanics and Biomedical Engineering : Imaging & Visualization Machine-learning methods in detecting breast cancer and related therapeutic issues : a review a review,” Comput. Methods Biomech. Biomed. Eng. Imaging Vis., vol. 12, no. 1, 2024, doi: 10.1080/21681163.2023.2299093.
C. Chandra, D. P. Mulya, and Faradika, “Deteksi Serangan Siber Menggunakan Machine Learning : Studi Pada Sistem Informasi Akademik,” vol. 3, no. 2, pp. 106–110, 2025.
M. D. Desriansyah, I. U. Sari, and Zulfahmi, “Analisis Efektivitas Algoritma Machine Learning dalam Deteksi Hoaks : Pada Berita Digital Berbahasa Indonesia,” vol. 3, no. 2, pp. 63–69, 2025.
A. Yaqoob, N. K. Verma, M. A. Mir, G. G. Tejani, H. M. H. O. Nashwa Hassan Babiker Eisa, and M. A. Shah, “SGA-Driven feature selection and random forest classification for enhanced breast cancer diagnosis : A comparative study,” pp. 1–23, 2025.
J. Ganesan et al., “Enhancing breast cancer detection accuracy through machine learning , deep learning and transfer learning techniques for clinical practice,” vol. 3, 2026.
Copyright (c) 2026 Jurnal Sistem Informasi Dan Informatika

This work is licensed under a Creative Commons Attribution 4.0 International License.











