Hallucination Detection in Large Language Models via Multi-Granular Uncertainty Quantification
DOI:
https://doi.org/10.59543/comdem.v3i.17665Keywords:
hallucination detection, uncertainty quantification, large language models, temporal entropy dynamics, calibration, XGBoostAbstract
Hallucination, when large language models (LLMs) produce plausible but factually incorrect output, is a major challenge in high-stakes applications such as medicine, law, and education. Current detection methods involve a trade-off between accuracy and efficiency: multi-generation methods (e.g., semantic entropy) are effective but impose 5-10x increased latency, while single-pass methods are faster but attain only 63-68% AUROC. To balance these trade-offs, we propose a framework that aggregates 12 uncertainty features across token-level, sequence-level, temporal, and distributional granularities from a single autoregressive generation. The framework operates in Full Mode (12 features, open-source models with attention access) or API Mode (10 features, any model exposing token log-probabilities). The most novel component is F9, temporal entropy dynamics, which measures how the entropy of generated segments changes across four quarters of the generation process. On Llama-3-8B, the framework attains 89.27% AUROC on HaluEval, surpassing semantic entropy by 2.15 percentage points while reducing latency by 8.2x. Across four open-source model families and five benchmarks, Full Mode consistently improves over semantic entropy by 1.71 to 2.47 pp. On GPT-3.5-Turbo, API Mode achieves 88.63% AUROC, falling below semantic entropy (90.81%) on this model. These results demonstrate that a suitably chosen combination of single-pass uncertainty features can approach the discrimination offered by more computationally intensive multi-generation methods.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Abdullah Önden

This work is licensed under a Creative Commons Attribution 4.0 International License.
COMDEM is published Open Access under a Creative Commons CC-BY 4.0 license. Authors retain full copyright, with the first publication right granted to the journal.






