I’ve been exploring why LLMs "break" during inference. Most current hallucination detection methods look at the final text (semantic analysis) or use another LLM to double-check (self-consistency). These are effective but extremely slow and expensive.
SIB-ENGINE is my attempt to solve this at the geometric layer. By monitoring the "Anchor Drift" (how hidden states deviate from the prompt’s latent trajectory), I found that hallucinations often manifest as a structural instability before the token is even sampled.
The Numbers:
Recall: 53.89% (It catches about half, but it's consistent)
Precision: 88.52% (Low false-alarm rate is my priority)
Overhead: <1% (Running on an RTX 3050 with 4GB VRAM)
AUC: 0.8995
I've released a Lite version (1-axis) on GitHub so you can see the fundamental logic and run it on your own machine. I’ve also included the raw_logs.csv from my N=1000 test run on Gemma-2B for full transparency.
I’m particularly curious if anyone here has experimented with similar geometric approaches or has thoughts on how this might scale to 70B+ models where the latent space is significantly denser.
The geometric approach is interesting precisely because it's
model-agnostic at the content level — you're detecting structural
collapse in latent space before it surfaces as text, which means
you don't need to know what a hallucination looks like semantically.
The 54% recall is the honest number to focus on. At 88% precision
you're catching real problems when you flag them, but you're missing
roughly half of all hallucinations entirely. For a suppression layer
in a regulated context that's a meaningful gap — a compliance team
can't tell a regulator "we caught most of them."
The complementary approach worth considering: deterministic
post-generation checks on the output layer. Geometric drift catches
structural collapse during generation. Rule-based output validation
catches semantic violations after generation — banned claims,
unattributed statistics, absolute guarantees. Neither approach alone
is sufficient. Together they cover different failure modes.
Good work publishing the raw_logs.csv. Reproducibility at this layer
is rare and matters.
Thanks for the precise critique.
You are right: Recall 54% is the "danger zone." In a regulated or production environment, missing half of the structural collapses is functionally equivalent to zero protection. The 88% precision proves the signal exists, but the threshold for "collapse" in latent space is currently too rigid.
The "Geometric approach (SIB) + Rule-based output validation" hybrid you suggested is the most logical path forward.
• Geometric Drift (Layer-Internal): Catches the "process" of losing logical coherence (structural entropy).
• Rule-based (Output-Layer): Catches the "result" of semantic violations (pre-defined constraints).
My next focus is analyzing the "Silent Failures" — the 46% we missed. If the latent space doesn't show geometric collapse but the output is still a hallucination, it suggests the model is confidently drifting into a "parallel" but structurally stable manifold. That's a different failure mode that geometry alone can't catch.
Reproducibility is the only way to move this out of "voodoo AI" territory. Glad the raw_logs.csv helped.
I’ve been exploring why LLMs "break" during inference. Most current hallucination detection methods look at the final text (semantic analysis) or use another LLM to double-check (self-consistency). These are effective but extremely slow and expensive.
SIB-ENGINE is my attempt to solve this at the geometric layer. By monitoring the "Anchor Drift" (how hidden states deviate from the prompt’s latent trajectory), I found that hallucinations often manifest as a structural instability before the token is even sampled.
The Numbers:
Recall: 53.89% (It catches about half, but it's consistent)
Precision: 88.52% (Low false-alarm rate is my priority)
Overhead: <1% (Running on an RTX 3050 with 4GB VRAM)
AUC: 0.8995
I've released a Lite version (1-axis) on GitHub so you can see the fundamental logic and run it on your own machine. I’ve also included the raw_logs.csv from my N=1000 test run on Gemma-2B for full transparency.
I’m particularly curious if anyone here has experimented with similar geometric approaches or has thoughts on how this might scale to 70B+ models where the latent space is significantly denser.
Happy to dive into the technical details!
The geometric approach is interesting precisely because it's model-agnostic at the content level — you're detecting structural collapse in latent space before it surfaces as text, which means you don't need to know what a hallucination looks like semantically.
The 54% recall is the honest number to focus on. At 88% precision you're catching real problems when you flag them, but you're missing roughly half of all hallucinations entirely. For a suppression layer in a regulated context that's a meaningful gap — a compliance team can't tell a regulator "we caught most of them."
The complementary approach worth considering: deterministic post-generation checks on the output layer. Geometric drift catches structural collapse during generation. Rule-based output validation catches semantic violations after generation — banned claims, unattributed statistics, absolute guarantees. Neither approach alone is sufficient. Together they cover different failure modes.
Good work publishing the raw_logs.csv. Reproducibility at this layer is rare and matters.
Thanks for the precise critique. You are right: Recall 54% is the "danger zone." In a regulated or production environment, missing half of the structural collapses is functionally equivalent to zero protection. The 88% precision proves the signal exists, but the threshold for "collapse" in latent space is currently too rigid. The "Geometric approach (SIB) + Rule-based output validation" hybrid you suggested is the most logical path forward. • Geometric Drift (Layer-Internal): Catches the "process" of losing logical coherence (structural entropy). • Rule-based (Output-Layer): Catches the "result" of semantic violations (pre-defined constraints). My next focus is analyzing the "Silent Failures" — the 46% we missed. If the latent space doesn't show geometric collapse but the output is still a hallucination, it suggests the model is confidently drifting into a "parallel" but structurally stable manifold. That's a different failure mode that geometry alone can't catch. Reproducibility is the only way to move this out of "voodoo AI" territory. Glad the raw_logs.csv helped.