Lynx 2.0
Lynx v2.0 is an 8B State-of-the-Art RAG hallucination detection model 🚀
Lynx 2.0 was trained on long context data from real world domains like finance and medicine.
- Lynx (8B) outperforms Claude-3.5-Sonnet as a judge on HaluBench by 2.2%
- Lynx (8B) shows 3.4% higher accuracy than Lynx v1.1 on HaluBench
- First hallucination guardrail trained on long context financial data
- Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more
Read more about Lynx and the hallucination detection problem in the Research section.
Hallucination Taxonomy
Lynx 2.0 supports 8 kinds of hallucinations.
Hallucination Type | Definition |
---|---|
Predicate Error | The predicate in the model output is inconsistent with the retrieved context. |
Entity Error | The subject/object of a model output is inconsistent with the retrieved context. |
Circumstance Error | Time, duration, or location of an event in the model output is wrong |
Coreference Error | A pronoun/reference with wrong or nonexistent antecedent. |
Calculation Errors | The calculation to arrive at a numerical answer is incorrect. |
Chain of Thought Hallucinations | The chain of thought reasoning in a model output is unfaithful to the retrieved context. |
Partially grounded answers | Part of the answer is grounded in the retrieved context but the other part of the answer is not supported by the retrieved context. |
Unanswerable Questions | The question is not answerable using the retrieved context. |
Benchmark Performance
We extend Halubench to include three additional datasets that capture the different types of hallucinations mentioned above. We include a long context dataset, QuALITY to capture long-context performance of the model. BUMP and squad capture additional types of hallucinations.
Model | BUMP | CovidQA | DROP | PubmedQA | QuALITY | RAGTruth | FinanceBench | squad | Average accuracy |
---|---|---|---|---|---|---|---|---|---|
meta-llama/Llama-3.2-3B-Instruct | 32.40% | 44.70% | 47.40% | 64.60% | 36.60% | 46.22% | 47.90% | 60.20% | 47.50% |
meta-llama/Llama-3.1-8B-Instruct | 64.20% | 83.00% | 65.30% | 80.50% | 54.60% | 76.67% | 59.70% | 86.00% | 71.26% |
GPT-4o mini | 73.00% | 87.20% | 80.30% | 84.20% | 59.60% | 81.88% | 81.60% | 81.80% | 78.71% |
Claude-3.5-Sonnet | 77.20% | 88.17% | 81.82% | 73.26% | 62.33% | 82.77% | 82.40% | 95.00% | 80.37% |
Lynx v1.1 (8B) | 75.00% | 96.90% | 77.80% | 88.90% | 61.00% | 80.11% | 76.70% | 76.80% | 79.15% |
Lynx v2.0 (8B) | 77.50% | 96.00% | 76.90% | 85.30% | 68.40% | 85.67% | 72.10% | 98.60% | 82.56% |
Lynx v1.0 (70B) | 71.00% | 97.50% | 86.40% | 90.40% | 63.20% | 80.22% | 81.40% | 87.60% | 82.22% |
How to use Lynx 2.0
Python SDK
Install the patronus sdk:
pip install patronus
Query Lynx via the SDK:
from patronus import Client
client = Client(api_key="<PROVIDE YOUR API KEY>")
result = client.evaluate(
evaluator="lynx-small",
criteria="patronus:hallucination",
evaluated_model_input="What is the car insurance policy?",
evaluated_model_output="To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028.",
evaluated_model_retrieved_context="To qualify for our car insurance policy, you need a way to show competence in driving which can be accomplished through a valid driver's license. You must have multiple years of experience and cannot be graduating from driving school before or on 2028.",
)
print(result)
cURL Request
curl --request POST \
--url "https://api.patronus.ai/v1/evaluate" \
--header "X-API-KEY: <PROVIDE YOUR API KEY>" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"evaluators": [
{
"evaluator": "lynx-small",
"criteria": "patronus:hallucination"
}
],
"evaluated_model_input": "What is the car insurance policy?",
"evaluated_model_output": "To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028.",
"evaluated_model_retrieved_context": "To qualify for our car insurance policy, you need a way to show competence in driving which can be accomplished through a valid driver's license. You must have multiple years of experience and cannot be graduating from driving school before or on 2028."
}'
Updated 21 days ago