Generalized Explainable AI Framework for Phishing Detection on Heterogeneous Textual Data
DOI:
https://doi.org/10.59543/comdem.v3i.18329Keywords:
phishing detection; machine learning; deep learning; explainable AI; binary classification; heterogeneous textAbstract
Phishing remains one of the most pervasive cybersecurity threats, exploiting human
and technical vulnerabilities and targeting users through deceptive Emails,
URLs, and SMS messages. Artificial Intelligence (AI) and Machine Learning (ML)
techniques have been widely used to improve phishing detection accuracy. However,
most existing studies have focused on specific data types, thereby limiting
the scope of their applicability, and lacking a generalized framework integrating
heterogeneous data sources within phishing context. In this study, we propose a
generalized phishing detection framework that leverages classical machine learning
(Random Forest and Logistic Regression) and deep learning (Convolutional
Neural Network) to identify phishing attempts across heterogeneous textual data,
such as Emails, URLs, and SMS messages. Moreover, we integrate interpretability
into model decisions using Explainable AI, particularly SHapley Additive exPlanations
(SHAP), to enhance transparency and trustworthiness. The framework is
evaluated based on both predictive performance and inference efficiency. Experimental
results show that Random Forest achieves the highest accuracy (93%)
and F1-score (85%), highlighting the efficiency of the classifier on tabular data for
the binary classification task at hand, while SHAP local and global explanations
reveal semantically relevant features influencing model decisions, where words
such as “admin” and “login” are identified as strong phishing indicators. These results
demonstrate the promise of our unified, interpretable approach in advancing
adaptive and trustworthy generalized phishing detection systems.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
COMDEM is published Open Access under a Creative Commons CC-BY 4.0 license. Authors retain full copyright, with the first publication right granted to the journal.








