Comparison of Feature Selection Methods for Anomaly Detection on the CIC-IDS-2018 Dataset

  • Anne Devia Binus University
  • Benfano Soewito Universitas Bina Nusantara

Abstract

As internet usage continues to increase, the risk of various cyber attacks originating from suspicious activities in networks is also growing. This study focuses on designing a machine learning model for anomaly detection, with a comparative analysis of feature selection methods including filter, wrapper, and hybrid approaches. Subsequently, to evaluate the outcomes of these feature selection methods, classification is performed on the selected features using classifier algorithms such as Random Forest, XGBoost, and MLP. Metrics used for analyzing the methods and classifier algorithms include accuracy level and processing time. The research demonstrates that feature selection can alleviate computational load without compromising accuracy. The filter feature selection method using Information Gain and the XGBoost classifier algorithm exhibit the best performance in terms of accuracy and short execution time.

Downloads

Download data is not yet available.

References

S. Maza and M. Touahria, “Feature selection algorithms in intrusion detection system: A survey,” KSII Transactions on Internet and Information Systems, vol. 12, no. 10, pp. 5079–5099, 2018, doi: 10.3837/tiis.2018.10.024.

S. Sheenam and S. Dhiman, “Comprehensive Review: Intrusion Detection System and Techniques,” IOSR J Comput Eng, vol. 18, no. 04, pp. 20–25, Apr. 2016, doi: 10.9790/0661-1804032025.

M. Mehmood et al., “A hybrid approach for network intrusion detection,” Computers, Materials and Continua, vol. 70, no. 1, pp. 91–107, 2021, doi: 10.32604/cmc.2022.019127.

S. A. Zahra Mghames and A. Abdu Ibrahim, “Intrusion detection system for detecting distributed denial of service attacks using machine learning algorithms,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 32, no. 1, p. 304, Oct. 2023, doi: 10.11591/ijeecs.v32.i1.pp304-311.

K. Anusha and E. Sathiyamoorthy, “Comparative study for feature selection algorithms in intrusion detection system,” Automatic Control and Computer Sciences, vol. 50, no. 1, pp. 1–9, Jan. 2016, doi: 10.3103/S0146411616010028.

M. M. Sakr, M. A. Tawfeeq, and A. B. El-Sisi, “Filter Versus Wrapper Feature Selection for Network Intrusion Detection System,” in Proceedings - 2019 IEEE 9th International Conference on Intelligent Computing and Information Systems, ICICIS 2019, Institute of Electrical and Electronics Engineers Inc., Dec. 2019, pp. 209–214. doi: 10.1109/ICICIS46948.2019.9014797.

F. A. Rafrastaraa, R. A. Pramunendar, D. P. Prabowo, E. Kartikadarma, and U. Sudibyo, “Optimasi Algoritma Random Forest menggunakan Principal Component Analysis untuk Deteksi Malware,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 3, pp. 217–223, Jul. 2023, doi: 10.47233/jteksis.v5i3.854.

T. A. Assegie, R. L. Tulasi, V. Elanangai, and N. K. Kumar, “Exploring the performance of feature selection method using breast cancer dataset,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 25, no. 1, pp. 232–237, Jan. 2022, doi: 10.11591/ijeecs.v25.i1.pp232-237.

S. S. Ahmadi, S. Rashad, and H. Elgazzar, “Efficient Feature Selection for Intrusion Detection Systems,” in 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2019, pp. 1029–1034. doi: 10.1109/UEMCON47517.2019.8992960.

A. Thakkar and R. Lohiya, “A Review of the Advancement in Intrusion Detection Datasets,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 636–645. doi: 10.1016/j.procs.2020.03.330.

Kurniabudi, D. Stiawan, Darmawijoyo, M. Y. Bin Bin Idris, A. M. Bamhdi, and R. Budiarto, “CICIDS-2017 Dataset Feature Analysis with Information Gain for Anomaly Detection,” IEEE Access, vol. 8, pp. 132911–132921, 2020, doi: 10.1109/ACCESS.2020.3009843.

G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers and Electrical Engineering, vol. 40, no. 1, pp. 16–28, Jan. 2014, doi: 10.1016/j.compeleceng.2013.11.024.

B. Akkaya, “The Effect of Recursive Feature Elimination with Cross-Validation Method on Classification Performance with Different Sizes of Datasets Recommendation System for Spotify View project The Effect of Recursive Feature Elimination with Cross-Validation Method on Classification Performance with Different Sizes of Datasets,” 2021. [Online]. Available: https://www.researchgate.net/publication/354253728

M. Awad and S. Fraihat, “Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems,” Journal of Sensor and Actuator Networks, vol. 12, no. 5, p. 67, Sep. 2023, doi: 10.3390/jsan12050067.

M. Jansi Rani and D. Devaraj, “Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification,” J Med Syst, vol. 43, no. 8, Aug. 2019, doi: 10.1007/s10916-019-1372-8.

R. and J. V. K. Jain Samkit and Maheshwari, “RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural Network for Gene Selection and Cancer Classification,” in Proceedings of International Conference on Computational Intelligence, M. F. and R. N. R. Tiwari Ritu and Pavone, Ed., Singapore: Springer Nature Singapore, 2023, pp. 85–97.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.

M. Nielsen, “Neural Networks and Deep Learning,” 2018. [Online]. Available: http://neuralnetworksanddeeplearning.com

T. Teoh, P. Ng, G. Chiew, E. J. Franco, and de Y. Goh, “Anomaly detection in cyber security attacks on networks using MLP deep learning,” Shah Alam, Malaysia, 2018.

M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf Process Manag, vol. 45, pp. 427–437, Jul. 2009, doi: 10.1016/j.ipm.2009.03.002.

N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intell. Data Anal., vol. 6, pp. 429–449, Nov. 2002, doi: 10.3233/IDA-2002-6504.

Published
2023-11-01
How to Cite
Devia, A., & Soewito, B. (2023). Comparison of Feature Selection Methods for Anomaly Detection on the CIC-IDS-2018 Dataset. Jurnal Teknologi Dan Sistem Informasi Bisnis, 5(4), 572-578. https://doi.org/10.47233/jteksis.v5i4.1069
Section
Articles