Email Spam Filtering Using Artificial Intelligence Techniques
DOI:
https://doi.org/10.24237/djes.2026.19102Keywords:
Phishing detection, Machine learning, Naive Bayes, Email classification, URL classificationAbstract
Email phishing and spam pose considerable cybersecurity risks. They require trustworthy, effective, and feasible detection methods. This research work proposes a model-based methodology for e-mail spam detection and phishing is based on artificial intelligence (AI). It works with a binary classification system with two phases. At the first stage, the system classifies email contents into malicious and non-malicious. In the next stage, it scans embedded URLs, which may or may not be phishing hooks. This modular design reduces the complexity of the feature space and enables separate optimizations for the email and the URL analysis. The system is trained with 18650 email samples and 549346 url samples from publicly accessible datasets, with 70% for training and 30% for testing. The preprocessing step consisted in eliminating duplicates and null values, text normalizing, balancing classes, stemming and feature extraction using TF-IDF for email and CountVectorizer for url. Four lightweight ML algorithms were evaluated: Naive Bayes, Decision Tree, Random Forest and K-Nearest Neighbors. The result indicated that the Naive Bayes achieved the highest baseline accuracy of 96% in email classification and 97% in URL classification. Random Forest, on the other hand, was more resilient to adversarial attacks and demonstrated better generalization. The selected model was deployed with Gmail for real time inbox detection with an accuracy of 85% in real world applications. The results demonstrate that by integrating lightweight machine learning, modular design, and relatively clean pre-processing, a new generation of effective, scalable detectors for both phishing and spam e-mail can be constructed.
Downloads
References
[1] E. H. Tusher, M. A. Ismail, M. A. Rahman et al., “Email spam: A comprehensive review of optimized detection methods, challenges, and open research problems,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3467996.
[2] L. N. Vejendla, B. Bysani, A. Mundru et al., “Score-based support vector machine for spam mail detection,” in Proc. 7th Int. Conf. Trends in Electronics and Informatics (ICOEI), 2023, pp. 915–920, doi: 10.1109/ICOEI56765.2023.10125718.
[3] A. A. Abdo, K. Alhajri, A. Alyami et al., “AI-based spam detection techniques for online social networks: Challenges and opportunities,” Journal of Internet Services and Information Security, pp. 78–103, 2023, doi: 10.58346/JISIS.2023.I3.006.
[4] S. K. Birthriya, P. Ahlawat, and A. K. Jain, “Detection and prevention of spear phishing attacks: A comprehensive survey,” Computers & Security, vol. 151, Art. no. 104317, 2025. doi: 10.1016/j.cose.2025.104317
[5] F. Jáñez-Martino, R. Alaiz-Rodríguez, V. González-Castro, E. Fidalgo, and E. Alegre, “Spam email classification based on cybersecurity potential risk using natural language processing,” Knowledge-Based Systems, vol. 310, Art. no. 112939, 2025. doi: 10.1016/j.knosys.2024.112939
[6] K. S. N. Sushma, C. Viji, N. Rajkumar, J. Ravi, M. Stalin, and H. Najmusher, “Healthcare 4.0: A review of phishing attacks in cyber security,” Procedia Computer Science, vol. 230, pp. 874–878, 2023. doi: 10.1016/j.procs.2023.12.045
[7] H. Yang, Q. Liu, S. Zhou, and Y. Luo, “A spam filtering method based on multi-modal fusion,” Applied Sciences, vol. 9, no. 6, Art. no. 1152, 2019, doi: 10.3390/app9061152.
[8] C. Wang, Q. Li, T.-Y. Ren, X.-H. Wang, and G.-X. Guo, “High efficiency spam filtering: A manifold learning-based approach,” Mathematical Problems in Engineering, vol. 2021, pp. 1–7, 2021, doi: 10.1155/2021/2993877.
[9] S. Zavrak and S. Yilmaz, “Email spam detection using hierarchical attention hybrid deep learning method,” Expert Systems with Applications, vol. 233, Art. no. 120977, 2023, doi: 10.1016/j.eswa.2023.120977.
[10] T. O. Omotehinwa and D. O. Oyewola, “Hyperparameter optimization of ensemble models for spam email detection,” Applied Sciences, vol. 13, no. 3, Art. no. 1971, 2023, doi: 10.3390/app13031971.
[11] J. Mythili, B. Deebeshkumar, T. Eshwaramoorthy, and J. Ajay, “Enhancing email spam detection with temporal naive Bayes classifier,” in Proc. 2024 Int. Conf. Communication, Computing and Internet of Things (IC3IoT), Chennai, India, 2024, pp. 1–6, doi: 10.1109/IC3IoT60841.2024.10550229.
[12] D. Lee, M. Ahn, H. Kwak, J. B. Hong, and H. Kim, “BlindFilter: Privacy-preserving spam email detection using homomorphic encryption,” in Proc. 42nd Int. Symp. Reliable Distributed Systems (SRDS), Marrakesh, Morocco, 2023, pp. 35–45, doi: 10.1109/SRDS60354.2023.00014.
[13] Y. Guo, Z. Mustafaoglu, and D. Koundal, “Spam detection using bidirectional transformers and machine learning classifier algorithms,” Journal of Computational and Cognitive Engineering, vol. 2, pp. 5–9, 2023, doi: 10.47852/bonviewJCCE2202192.
[14] A. Ghourabi and M. Alohaly, “Enhancing spam message classification and detection using transformer-based embedding and ensemble learning,” Sensors, vol. 23, no. 8, Art. no. 3861, 2023, doi: 10.3390/s23083861.
[15] P. P. Ghogare, H. H. Dawoodi, and M. P. Patil, “Enhancing spam email classification using effective preprocessing strategies and optimal machine learning algorithms,” Indian Journal of Science and Technology, vol. 17, no. 15, pp. 1545–1556, 2023, doi: 10.17485/IJST/v17i15.2979.
[16] A. B. Majgave and N. L. Gavankar, “Automatic phishing website detection and prevention model using transformer deep belief network,” Computers & Security, vol. 147, Art. no. 104071, 2024. doi: 10.1016/j.cose.2024.104071
[17] A. Al-Subaiey, M. Al-Thani, N. A. Alam, K. F. Antora, A. Khandakar, and S. M. A. Uz Zaman, “Novel interpretable and robust web-based AI platform for phishing email detection,” Computers & Electrical Engineering, vol. 120, Art. no. 109625, 2024. doi: 10.1016/j.compeleceng.2024.109625
[18] J. Zraqou, A. H. Al-Helali, W. Maqableh, H. Fakhouri, and W. Alkhadour, “Robust email spam filtering using a hybrid of grey wolf optimiser and naive Bayes classifier,” Cybernetics and Information Technologies, vol. 23, no. 1, pp. 79–90, 2023, doi: 10.2478/cait-2023-0037.
[19] H. AlZeyadi, R. Sert, and F. Duran, “A lightweight, explainable spam detection system with Rüppell’s Fox optimizer for the social media network X,” Electronics, vol. 14, no. 21, Art. no. 4153, 2025, doi: 10.3390/electronics14214153.
[20] A. Dhar, K. V. Anusha, A. Kataria, and M. A. Khan, “Comparative analysis of deep learning, SVM, random forest, and XGBoost for email spam detection: A socio-network analysis approach,” in Proc. 2023 Int. Conf. Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2023, pp. 701–707, doi: 10.1109/ICCCIS60361.2023.10425771.
[21] U. A. Butt, R. Amin, H. Aldabbas, S. Mohan, B. Alouffi, and A. Ahmadian, “Cloud-based email phishing attack detection using machine and deep learning algorithms,” Complex & Intelligent Systems, vol. 9, pp. 3043–3070, 2023, doi: 10.1007/s40747-022-00760-3.
[22] F. E. Ayo, L. A. Ogundele, S. Olakunle, J. B. Awotunde, and F. A. Kasali, “A hybrid correlation-based deep learning model for email spam classification using fuzzy inference system,” Decision Analytics Journal, vol. 10, Art. no. 100390, 2024, doi: 10.1016/j.dajour.2023.100390.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 B.Dhanalakshmi, Rajeshwari R R, Sanju A N, Chintureena Thingom, Vipul Devendra Punjabi, Vijayakumar B, Shailendra Madansing Pardeshi, P.Venkatesan

This work is licensed under a Creative Commons Attribution 4.0 International License.









