Hallucination Detection in Large Language Models via Multi-Granular Uncertainty Quantification

Abdullah Önden

doi:10.59543/comdem.v3i.17665

Authors

Abdullah Önden Department of Computer Engineering, Faculty of Computer and Information Technologies, Istanbul University, Istanbul, Türkiye https://orcid.org/0000-0003-3769-8193

DOI:

https://doi.org/10.59543/comdem.v3i.17665

Keywords:

hallucination detection, uncertainty quantification, large language models, temporal entropy dynamics, calibration, XGBoost

Abstract

Hallucination, when large language models (LLMs) produce plausible but factually incorrect output, is a major challenge in high-stakes applications such as medicine, law, and education. Current detection methods involve a trade-off between accuracy and efficiency: multi-generation methods (e.g., semantic entropy) are effective but impose 5-10x increased latency, while single-pass methods are faster but attain only 63-68% AUROC. To balance these trade-offs, we propose a framework that aggregates 12 uncertainty features across token-level, sequence-level, temporal, and distributional granularities from a single autoregressive generation. The framework operates in Full Mode (12 features, open-source models with attention access) or API Mode (10 features, any model exposing token log-probabilities). The most novel component is F9, temporal entropy dynamics, which measures how the entropy of generated segments changes across four quarters of the generation process. On Llama-3-8B, the framework attains 89.27% AUROC on HaluEval, surpassing semantic entropy by 2.15 percentage points while reducing latency by 8.2x. Across four open-source model families and five benchmarks, Full Mode consistently improves over semantic entropy by 1.71 to 2.47 pp. On GPT-3.5-Turbo, API Mode achieves 88.63% AUROC, falling below semantic entropy (90.81%) on this model. These results demonstrate that a suitably chosen combination of single-pass uncertainty features can approach the discrimination offered by more computationally intensive multi-generation methods.

Hallucination Detection in Large Language Models via Multi-Granular Uncertainty Quantification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Journal Information

APC Infomration

Information

Lockss

Make a Submission

index

The Journal is indexed or abstracted in:

Other information:

Keywords