Oluwaseye Abayomi ADEYEMI2024-06-112024-06-112023-12M.Schttps://repository.lcu.edu.ng/handle/123456789/492As human resources try to break into networks, control systems, and steal information with the help of expanding data communication paths and protocols, cyber intrusions are currently on the rise. The majority of typical online attack methods are thoroughly researched and documented. Countries, corporations, people, and vital infrastructures that depend on information technology for daily operations have suffered financial losses, the loss of personal information, and economic harm as a result of web-based intrusion. However, foreseeing an attack before it happens can aid in its prevention. This research proposes a predictive model for web-based attacks and a performance comparison of random forest with and without feature selection to secure the availability, integrity, and secrecy of networks, computer systems, and their data. The CIC-Bell-IDS2017 dataset, which includes typical and contemporary intrusion attacks, served as the raw data source for the proposed model. A python-based programming environment and interface for Anaconda Navigator, Jupyter Notebook, was used to create the predictive models. Performance evaluation and comparative analysis were conducted, and the results demonstrate that, once big data analytics (feature scaling and feature selection) were applied to the dataset, the models' prediction accuracies improved, creating a potential intrusion detection system. The outcome yielded excellent accuracy and model development times in both cases, with 97% and 98% precision for both sets and model development times of 35 seconds for the raw set and 15 seconds for the reduced set, which is an important factor when deploying machine learning models in a real-time setting. Random Forest is more computationally expensive than Correlation feature Selection-based classifiers, but having higher predictive accuracy, according to a comparison. Both of these methods work well and each has advantages and disadvantages. The use of big data analytics (PySpark) was found to help machine learning models perform better, resulting in better intrusion detection system. Keywords: Web Based Attacks, Random Forest, Correlation Feature Selection, Word Count: 300enWeb Based AttacksRandom ForestCorrelation Feature SelectionComparative Performance Evaluation of Random Forest on Web-based AttacksThesis