Optimasi Algoritma Random Forest menggunakan Principal Component Analysis untuk Deteksi Malware

  • Fauzi Adi Rafrastaraa Universitas Dian Nuswantoro
  • Ricardus Anggi Pramunendar Universitas Dian Nuswantoro
  • Dwi Puji Prabowo Universitas Dian Nuswantoro
  • Etika Kartikadarma Universitas Dian Nuswantoro
  • Usman Sudibyo Universitas Dian Nuswantoro
Keywords: Random forest, principal component analysis, features reduction, malware detection

Abstract

Malware is a type of software designed to harm various devices. As malware evolves and diversifies, traditional signature-based detection methods have become less effective against advanced types such as polymorphic, metamorphic, and oligomorphic malware. To address this challenge, machine learning-based malware detection has emerged as a promising solution. In this study, we evaluated the performance of several machine learning algorithms in detecting malware and applied Principal Component Analysis (PCA) to the best-performing algorithm to reduce the number of features and improve performance. Our results showed that the Random Forest algorithm outperformed Adaboost, Neural Network, Support Vector Machine, and k-Nearest Neighbor algorithms with an accuracy and recall rate of 98.3%. By applying PCA, we were able to further improve the performance of Random Forest to 98.7% for both accuracy and recall while reducing the number of features from 1084 to 32.

Downloads

Download data is not yet available.

References

O. Aslan and R. Samet, “A Comprehensive Review on Malware Detection Approaches,” IEEE Access, vol. 8, pp. 6249–6271, 2020, doi: 10.1109/ACCESS.2019.2963724.

F. A. Rafrastara, C. Supriyanto, C. Paramita, and Y. P. Astuti, “Deteksi Malware menggunakan Metode Stacking berbasis Ensemble,” JPIT, vol. 8, no. 1, pp. 11–16, 2023.

F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” JPIT, vol. 8, no. 2, pp. 113–118, 2023.

N. Shahid et al., “Mathematical analysis and numerical investigation of advection-reaction-diffusion computer virus model,” Results in Physics, vol. 26, p. 104294, Jul. 2021, doi: 10.1016/j.rinp.2021.104294.

F. A. Rafrastara and F. M. A., “Advanced Virus Monitoring and Analysis System,” IJCSIS, vol. 9, no. 1, 2011.

F. A. Rafrastara, Belajar Membuat Virus Komputer Mulai dari NOL. Semarang, Indonesia: NeomediaPress, 2007.

H. Shah and D. M. G. Comissiong, “Computer Virus Model with Stealth Viruses and Antivirus Renewal in a Network with Fast Infectors,” SN Computer Science, vol. 2, no. 5, pp. 1–8, 2021, doi: 10.1007/s42979-021-00780-9.

A. Pratama and F. A. Rafrastara, “Computer Worm Classification,” International Journal of Computer Science and Information Security (IJCSIS), vol. 10, no. 4, pp. 21–24, 2012.

A. Nugraha and F. A. Rafrastara, “TAXONOMY BOTNET DAN STUDI KASUS: CONFICKER,” in Seminar Nasional Teknologi Informasi & Komunikasi Terapan 2011 (Semantik 2011), Semarang, Indonesia, 2011.

A. Nugraha and F. A. Rafrastara, “BOTNET DETECTION SURVEY,” in Seminar Nasional Teknologi Informasi & Komunikasi Terapan 2011 (Semantik 2011), Semarang, Indonesia, 2011.

A. Adriyendi and Y. Melia, “KLASIFIKASI MENGGUNAKAN NAÏVE BAYES DAN K-NEAREST NEIGHBOR PADA MANAJEMEN LAYANAN TEKNOLOGI INFORMASI,” JTEKSIS, vol. 2, no. 2, pp. 99–107, Jul. 2020, doi: 10.47233/jteksis.v2i2.121.

M. Afdhal, V. Ariandi, and R. Rita, “Memprediksi Penjualan Pada Toko Hanifah Metode C.45,” JTEKSIS, vol. 4, no. 2, pp. 248–255, Jul. 2022, doi: 10.47233/jteksis.v4i1.460.

F. Hidayat and T. M. S. Astsauri, “Applied random forest for parameter sensitivity of low salinity water Injection (LSWI) implementation on carbonate reservoir,” Alexandria Engineering Journal, vol. 61, no. 3, pp. 2408–2417, 2022, doi: 10.1016/j.aej.2021.06.096.

F. C. C. Garcia and F. P. Muga, “Random Forest for Malware Classification,” pp. 1–4, 2016.

H. J. Zhu, T. H. Jiang, B. Ma, Z. H. You, W. L. Shi, and L. Cheng, “HEMD: a highly efficient random forest-based malware detection framework for Android,” Neural Computing and Applications, vol. 30, no. 11, pp. 3353–3361, 2018, doi: 10.1007/s00521-017-2914-y.

B. M. Khammas, “Ransomware Detection using Random Forest Technique,” ICT Express, vol. 6, no. 4, pp. 325–331, 2020, doi: 10.1016/j.icte.2020.11.001.

A. Mishra, A. M. K. Cheng, and Y. Zhang, “Intrusion Detection Using Principal Component Analysis and Support Vector Machines,” in 2020 IEEE 16th International Conference on Control & Automation (ICCA), Singapore: IEEE, Oct. 2020, pp. 907–912. doi: 10.1109/ICCA51439.2020.9264568.

G. H. M., T. B. Adji, and N. A. Setiawan, “Penggunaan Metodologi Analisa Komponen Utama (PCA) untuk Mereduksi Faktor-Faktor yang Mempengaruhi Penyakit Jantung Koroner,” in Proceeding Seminar Nasional SciETec (Science, Engineering and Technology) 2012, Malang: Program Magister dan Doktor, Fakultas Teknik, Universitas Brawijaya, Feb. 2012.

E. Kartikadarma, S. Wijayanti, S. A. Wulandari, and F. A. Rafrastara, “Principle Component Analysis for Classification of the Quality of Aromatic Rice,” IJCSIS, vol. 15, no. 8, pp. 315–319, 2017.

R. A. Johnson and D. W. Wichern, Applied multivariate statistical analysis, 5th ed. Upper Saddle River, N.J: Prentice Hall, 2002.

E. Kaloyanova, “What Is Principal Components Analysis?,” 365 DataScience, Oct. 20, 2021. https://365datascience.com/tutorials/python-tutorials/principal-components-analysis/ (accessed May 29, 2023).

Published
2023-07-03
How to Cite
Rafrastaraa, F. A., Pramunendar, R. A., Prabowo, D. P., Kartikadarma, E., & Sudibyo, U. (2023). Optimasi Algoritma Random Forest menggunakan Principal Component Analysis untuk Deteksi Malware. Jurnal Teknologi Dan Sistem Informasi Bisnis, 5(3), 217-223. https://doi.org/10.47233/jteksis.v5i3.854
Section
Articles