In a typical LLM application, the output from the LLM can be either from the context provided to the LLM or from the LLM's pre-trained data or from the data that the LLM was fine-tuned with. LLM's can provide incorrect results from all of these different sources of data. This is popularly known as "LLM hallucinations".
We built a model that can identify hallucinations from both a closed-book (internal data that was "learnt" by the LLM as part of the pre-training or fine-tuning process) and an open-book setting (data explicitly provided to the LLM as part of its context). The model is able to provide phrase level attribution. We open sourced the model (available on HuggingFace) and a new benchmark dataset that can be used to check how well any Judge performs on the hallucination detection task.
Try the Google Collab notebook linked in the model card.
This is a much-needed direction, especially as LLMs are increasingly used in high-stakes settings. Phrase-level attribution for hallucination detection has real potential to improve transparency and trust in model outputs. The fact that both the model and benchmark are open source makes it even more valuable—enabling reproducibility and inviting broader collaboration.
In a typical LLM application, the output from the LLM can be either from the context provided to the LLM or from the LLM's pre-trained data or from the data that the LLM was fine-tuned with. LLM's can provide incorrect results from all of these different sources of data. This is popularly known as "LLM hallucinations".
We built a model that can identify hallucinations from both a closed-book (internal data that was "learnt" by the LLM as part of the pre-training or fine-tuning process) and an open-book setting (data explicitly provided to the LLM as part of its context). The model is able to provide phrase level attribution. We open sourced the model (available on HuggingFace) and a new benchmark dataset that can be used to check how well any Judge performs on the hallucination detection task.
Try the Google Collab notebook linked in the model card.
This is a much-needed direction, especially as LLMs are increasingly used in high-stakes settings. Phrase-level attribution for hallucination detection has real potential to improve transparency and trust in model outputs. The fact that both the model and benchmark are open source makes it even more valuable—enabling reproducibility and inviting broader collaboration.
Absolutely! Check out the benchmark here: https://huggingface.co/datasets/AimonLabs/HDM-Bench You can use it to check how well your LLM judges perform on the hallucination detection task.