Evaluation of Machine Learning-Based Algorithm to Predicting Loan Default in Nigeria

Kingsley Oghenekaro EFEKODO2025-07-012025-07-012024-12Kate TurabiaM.Schttps://repository.lcu.edu.ng/handle/123456789/1041In the financial sector, accurately predicting loan defaults is critical. Traditional creditworthiness assessment methods, while thorough, often do not capture the dynamic and complex interactions within financial data. This necessitates advanced solutions like machine learning (ML). Traditional credit scoring systems are frequently unable to handle high-dimensional, non-linear data effectively, leading to significant financial losses due to inaccurate predictions of loan defaults. This study aims to harness advanced machine learning techniques to enhance the accuracy of predicting loan defaults, aiming to outperform traditional statistical models. Various machine learning algorithms including Logistic Regression, Decision Trees, Gradient Boosting Classifiers, Random Forest, and Gaussian Naive Bayes were applied to a dataset comprising diverse borrower characteristics and loan details. The selected dataset was an open source containing different datasets for both train and test Demographic data, Performance data and Previous loans data. It contained 3 different datasets for both train and test. The sample submission has 2 outcomes- good (1) or bad (0). The dataset systematically divided into two. 70% for the training set, 30% was the test set. These models underwent rigorous training and validation processes to ensure their robustness and reliability. The Gradient Boosting Classifier emerged as the most effective model, with an accuracy of 78.8%. This model significantly outperformed others by effectively capturing complex patterns in the dataset, thereby substantially reducing both false positives and false negatives. The study confirms that machine learning models, particularly the Gradient Boosting Classifier, offer superior predictive power in the context of loan default risk assessments. Financial institutions should consider integrating these models into their credit evaluation processes to enhance decision-making accuracy and minimize risks. Additionally, future research should explore the integration of more diverse data sources, including non-traditional variables that could affect credit risk assessments, and the application of deep learning techniques to further refine prediction accuracies. Keywords: Accuracy, Classifier, Defaults, Financial, Machine Learning Models, Predicting, Cross- Validation, Data Imputation, Customer Segmentation, Nigerian Lending Market, Class Imbalance Word Count: 300enAccuracyClassifierDefaultsFinancialMachine Learning ModelsPredictingCross- ValidationData ImputationCustomer SegmentationNigerian Lending MarketClass ImbalanceEvaluation of Machine Learning-Based Algorithm to Predicting Loan Default in NigeriaThesis