Generalized Explainable AI Framework for Phishing Detection on Heterogeneous Textual Data

Lea Mansour; Nour Hilal; Nadine Abbas; Seifedine Kadry

doi:10.59543/comdem.v3i.18329

Authors

Lea Mansour Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon https://orcid.org/0009-0002-6057-2396
Nour Hilal Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon https://orcid.org/0009-0003-8473-5126
Nadine Abbas Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon https://orcid.org/0000-0003-3028-326X
Seifedine Kadry Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon https://orcid.org/0000-0002-1939-4842

DOI:

https://doi.org/10.59543/comdem.v3i.18329

Keywords:

phishing detection; machine learning; deep learning; explainable AI; binary classification; heterogeneous text

Abstract

Phishing remains one of the most pervasive cybersecurity threats, exploiting human
and technical vulnerabilities and targeting users through deceptive Emails,
URLs, and SMS messages. Artificial Intelligence (AI) and Machine Learning (ML)
techniques have been widely used to improve phishing detection accuracy. However,
most existing studies have focused on specific data types, thereby limiting
the scope of their applicability, and lacking a generalized framework integrating
heterogeneous data sources within phishing context. In this study, we propose a
generalized phishing detection framework that leverages classical machine learning
(Random Forest and Logistic Regression) and deep learning (Convolutional
Neural Network) to identify phishing attempts across heterogeneous textual data,
such as Emails, URLs, and SMS messages. Moreover, we integrate interpretability
into model decisions using Explainable AI, particularly SHapley Additive exPlanations
(SHAP), to enhance transparency and trustworthiness. The framework is
evaluated based on both predictive performance and inference efficiency. Experimental
results show that Random Forest achieves the highest accuracy (93%)
and F1-score (85%), highlighting the efficiency of the classifier on tabular data for
the binary classification task at hand, while SHAP local and global explanations
reveal semantically relevant features influencing model decisions, where words
such as “admin” and “login” are identified as strong phishing indicators. These results
demonstrate the promise of our unified, interpretable approach in advancing
adaptive and trustworthy generalized phishing detection systems.

Generalized Explainable AI Framework for Phishing Detection on Heterogeneous Textual Data

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Journal Information

APC Infomration

Information

Lockss

Make a Submission

index

The Journal is indexed or abstracted in:

Other information:

Keywords