Improved Network Intrusion Detection System Using Hybridized Feature Selection Methods

No Thumbnail Available

Date

2024-12

Journal Title

Journal ISSN

Volume Title

Publisher

Lead City University, Ibadan

Abstract

The usage of Machine Learning (ML) and Feature Selection have been implemented in the development of Intrusion Detection System (IDS). From the review of the literature, developing an effective IDS requires large amount of data with many features. Some of these features are not important in the operation of the IDS which slows down the detection of threats. Therefore in this thesis, an IDS which can detect threat, has reduced features and is able to obtain result was developed. Machine learning was incorporated in training of the model using three machine learning algorithm; hybrid decision trees, Naives Bayes (NB) and Random Forest (RF). This was categorized into 3; Dataset Loading and Preprocessing, improved Intrusion Detection System and testing and evaluating the developed system. These three stages saw the total number of columns to 143 in number, after some processes were carried out on it, such as the hot-encoding category features and the SelectKBest techniques which reduced the columns to 15 best columns. After the correlation matix was conducted on the final sub dataset, it shows that features with NaN values have zero correlation with other related features in each of the sub dataset. Features with near zero variance, missing values >25% and those that has high correlation between two numerical variables. With these features having minimal discriminatory power, they were therefore removed from both sub dataset. This reduced columns shows that logistic regression model built was approximately 0.8377, the accuracy score of the K-nearest model was approximately 0.7538, the accuracy score of the DecisionTreeClassifier model was approximately 0.8127, the accuracy score of the LinearSVC model was approximately 0.8101. The developed IDS using Feature Selection technique significantly improved the performance of the Network Intrusion Detection System towards learning accuracy, reduce learning time, and simplify learning results. Keywords: Machine Learning, hybrid decision tree, hot-encoding category feature Word Count: 295 words

Description

Keywords

Machine Learning, hybrid decision tree, hot-encoding category feature

Citation

Kate Turabia