Comparison of Feature Selection Methods for Anomaly Detection on the CIC-IDS-2018 Dataset
Abstract
As internet usage continues to increase, the risk of various cyber attacks originating from suspicious activities in networks is also growing. This study focuses on designing a machine learning model for anomaly detection, with a comparative analysis of feature selection methods including filter, wrapper, and hybrid approaches. Subsequently, to evaluate the outcomes of these feature selection methods, classification is performed on the selected features using classifier algorithms such as Random Forest, XGBoost, and MLP. Metrics used for analyzing the methods and classifier algorithms include accuracy level and processing time. The research demonstrates that feature selection can alleviate computational load without compromising accuracy. The filter feature selection method using Information Gain and the XGBoost classifier algorithm exhibit the best performance in terms of accuracy and short execution time.
Downloads
References
S. Maza and M. Touahria, “Feature selection algorithms in intrusion detection system: A survey,” KSII Transactions on Internet and Information Systems, vol. 12, no. 10, pp. 5079–5099, 2018, doi: 10.3837/tiis.2018.10.024.
S. Sheenam and S. Dhiman, “Comprehensive Review: Intrusion Detection System and Techniques,” IOSR J Comput Eng, vol. 18, no. 04, pp. 20–25, Apr. 2016, doi: 10.9790/0661-1804032025.
M. Mehmood et al., “A hybrid approach for network intrusion detection,” Computers, Materials and Continua, vol. 70, no. 1, pp. 91–107, 2021, doi: 10.32604/cmc.2022.019127.
S. A. Zahra Mghames and A. Abdu Ibrahim, “Intrusion detection system for detecting distributed denial of service attacks using machine learning algorithms,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 32, no. 1, p. 304, Oct. 2023, doi: 10.11591/ijeecs.v32.i1.pp304-311.
K. Anusha and E. Sathiyamoorthy, “Comparative study for feature selection algorithms in intrusion detection system,” Automatic Control and Computer Sciences, vol. 50, no. 1, pp. 1–9, Jan. 2016, doi: 10.3103/S0146411616010028.
M. M. Sakr, M. A. Tawfeeq, and A. B. El-Sisi, “Filter Versus Wrapper Feature Selection for Network Intrusion Detection System,” in Proceedings - 2019 IEEE 9th International Conference on Intelligent Computing and Information Systems, ICICIS 2019, Institute of Electrical and Electronics Engineers Inc., Dec. 2019, pp. 209–214. doi: 10.1109/ICICIS46948.2019.9014797.
F. A. Rafrastaraa, R. A. Pramunendar, D. P. Prabowo, E. Kartikadarma, and U. Sudibyo, “Optimasi Algoritma Random Forest menggunakan Principal Component Analysis untuk Deteksi Malware,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 3, pp. 217–223, Jul. 2023, doi: 10.47233/jteksis.v5i3.854.
T. A. Assegie, R. L. Tulasi, V. Elanangai, and N. K. Kumar, “Exploring the performance of feature selection method using breast cancer dataset,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 25, no. 1, pp. 232–237, Jan. 2022, doi: 10.11591/ijeecs.v25.i1.pp232-237.
S. S. Ahmadi, S. Rashad, and H. Elgazzar, “Efficient Feature Selection for Intrusion Detection Systems,” in 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2019, pp. 1029–1034. doi: 10.1109/UEMCON47517.2019.8992960.
A. Thakkar and R. Lohiya, “A Review of the Advancement in Intrusion Detection Datasets,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 636–645. doi: 10.1016/j.procs.2020.03.330.
Kurniabudi, D. Stiawan, Darmawijoyo, M. Y. Bin Bin Idris, A. M. Bamhdi, and R. Budiarto, “CICIDS-2017 Dataset Feature Analysis with Information Gain for Anomaly Detection,” IEEE Access, vol. 8, pp. 132911–132921, 2020, doi: 10.1109/ACCESS.2020.3009843.
G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers and Electrical Engineering, vol. 40, no. 1, pp. 16–28, Jan. 2014, doi: 10.1016/j.compeleceng.2013.11.024.
B. Akkaya, “The Effect of Recursive Feature Elimination with Cross-Validation Method on Classification Performance with Different Sizes of Datasets Recommendation System for Spotify View project The Effect of Recursive Feature Elimination with Cross-Validation Method on Classification Performance with Different Sizes of Datasets,” 2021. [Online]. Available: https://www.researchgate.net/publication/354253728
M. Awad and S. Fraihat, “Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems,” Journal of Sensor and Actuator Networks, vol. 12, no. 5, p. 67, Sep. 2023, doi: 10.3390/jsan12050067.
M. Jansi Rani and D. Devaraj, “Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification,” J Med Syst, vol. 43, no. 8, Aug. 2019, doi: 10.1007/s10916-019-1372-8.
R. and J. V. K. Jain Samkit and Maheshwari, “RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural Network for Gene Selection and Cancer Classification,” in Proceedings of International Conference on Computational Intelligence, M. F. and R. N. R. Tiwari Ritu and Pavone, Ed., Singapore: Springer Nature Singapore, 2023, pp. 85–97.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
M. Nielsen, “Neural Networks and Deep Learning,” 2018. [Online]. Available: http://neuralnetworksanddeeplearning.com
T. Teoh, P. Ng, G. Chiew, E. J. Franco, and de Y. Goh, “Anomaly detection in cyber security attacks on networks using MLP deep learning,” Shah Alam, Malaysia, 2018.
M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf Process Manag, vol. 45, pp. 427–437, Jul. 2009, doi: 10.1016/j.ipm.2009.03.002.
N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intell. Data Anal., vol. 6, pp. 429–449, Nov. 2002, doi: 10.3233/IDA-2002-6504.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under an Attribution 4.0 International (CC BY 4.0) that allows others to share — copy and redistribute the material in any medium or format and adapt — remix, transform, and build upon the material for any purpose, even commercially with an acknowledgment of the work's authorship and initial publication in this journal.