A metaheuristic feature selection model using bat optimization for malicious URL attack detection

Sci Rep. 2026 May 6. doi: 10.1038/s41598-026-51981-2. Online ahead of print.

ABSTRACT

The malicious URLs have been a constant threat to cybersecurity because hackers are constantly creating phishing, malware, spam, and defacement links that resemble authentic Web layouts and bypass static security measures. Despite very promising results of machine learning (ML) and deep learning (DL) models in URL classification, the effectiveness of these models is usually limited by high dimensional spaces of features that have redundant and irrelevant qualities, which leads to increased computation costs and potentially less generalization ability. To cope with this, this study will present a wrapper-based Bat Algorithm (BA) feature selection model to determine small and discriminative subsets of features in detecting malicious URLs. The bio-inspired metaheuristic BA offers a good tradeoff of exploration and exploitation in high dimensional optimization issues and thus is useful in feature subset selection. The proposed BA model is tested on ensemble ML (XGBoost, AdaBoost, Gradient Boosting, CatBoost and LightGBM) and DL (CNN, RNN, LSTM and CNN-LSTM) architectures with two datasets the multi-class ISCX-URL-2016 dataset and the more recent URL Phishing (2026) dataset. Experiments results indicate that BA has a significant dimensionality reduction: It reduces original feature space on ISCX-URL-2016 by 51.90% in the case of Defacement, by 67.09% in the case of Malware, by 49.37% in the case of Phishing, by 59.49% in the case of Spam, and 45.91% in the case of Phishing on URL Phishing (2026). This reduction notwithstanding, BA shows consistent improvements in the classification of both datasets. BA-enhanced LightGBM had the best overall results of all the tested models, with an accuracy of 99.92% on ISCX-URL-2016 and 98.17% on URL Phishing (2026), and high values of ROC-AUC and good computational efficiency. A statistical analysis also supports the fact that the improvements noticed are significant. Altogether, the proposed BA-based feature selection model is an efficient, scalable, and reliable solution to malicious URL detection intelligent, with good possibilities of being implemented into real-world systems in terms of cybersecurity.

PMID:42092159 | DOI:10.1038/s41598-026-51981-2

By Nevin Manimala