Lynx 2.0 Guide
Lynx v2.0 is an 8B State-of-the-Art RAG hallucination detection model 🚀
Lynx 2.0 was trained on long context data from real world domains like finance and medicine.
- Lynx (8B) outperforms Claude-3.5-Sonnet as a judge on HaluBench by 2.2%
- Lynx (8B) shows 3.4% higher accuracy than Lynx v1.1 on HaluBench
- First hallucination guardrail trained on long context financial data
- Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more
Hallucination Taxonomy
Lynx 2.0 supports 8 kinds of hallucinations.
Hallucination Type | Definition |
---|---|
Predicate Error | The predicate in the model output is inconsistent with the retrieved context. |
Entity Error | The subject/object of a model output is inconsistent with the retrieved context. |
Circumstance Error | Time, duration, or location of an event in the model output is wrong |
Coreference Error | A pronoun/reference with wrong or nonexistent antecedent. |
Calculation Errors | The calculation to arrive at a numerical answer is incorrect. |
Chain of Thought Hallucinations | The chain of thought reasoning in a model output is unfaithful to the retrieved context. |
Partially grounded answers | Part of the answer is grounded in the retrieved context but the other part of the answer is not supported by the retrieved context. |
Unanswerable Questions | The question is not answerable using the retrieved context. |
Benchmark Performance
We extend Halubench to include three additional datasets that capture the different types of hallucinations mentioned above. We include a long context dataset, QuALITY to capture long-context performance of the model. BUMP and squad capture additional types of hallucinations.
Model | BUMP | CovidQA | DROP | PubmedQA | QuALITY | RAGTruth | FinanceBench | squad | Average accuracy |
---|---|---|---|---|---|---|---|---|---|
meta-llama/Llama-3.2-3B-Instruct | 32.40% | 44.70% | 47.40% | 64.60% | 36.60% | 46.22% | 47.90% | 60.20% | 47.50% |
meta-llama/Llama-3.1-8B-Instruct | 64.20% | 83.00% | 65.30% | 80.50% | 54.60% | 76.67% | 59.70% | 86.00% | 71.26% |
GPT-4o mini | 73.00% | 87.20% | 80.30% | 84.20% | 59.60% | 81.88% | 81.60% | 81.80% | 78.71% |
Claude-3.5-Sonnet | 77.20% | 88.17% | 81.82% | 73.26% | 62.33% | 82.77% | 82.40% | 95.00% | 80.37% |
Lynx v1.1 (8B) | 75.00% | 96.90% | 77.80% | 88.90% | 61.00% | 80.11% | 76.70% | 76.80% | 79.15% |
Lynx v2.0 (8B) | 77.50% | 96.00% | 76.90% | 85.30% | 68.40% | 85.67% | 72.10% | 98.60% | 82.56% |
Lynx v1.0 (70B) | 71.00% | 97.50% | 86.40% | 90.40% | 63.20% | 80.22% | 81.40% | 87.60% | 82.22% |
How to use Lynx 2.0
Python SDK
Install the patronus sdk:
pip install patronus
Query Lynx via the SDK:
from patronus import Client
client = Client(api_key="<PROVIDE YOUR API KEY>")
result = client.evaluate(
evaluator="lynx-small",
criteria="patronus:hallucination",
evaluated_model_input="What is the car insurance policy?",
evaluated_model_output="To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028.",
evaluated_model_retrieved_context="To qualify for our car insurance policy, you need a way to show competence in driving which can be accomplished through a valid driver's license. You must have multiple years of experience and cannot be graduating from driving school before or on 2028.",
)
print(result)
cURL Request
curl --request POST \
--url "https://api.patronus.ai/v1/evaluate" \
--header "X-API-KEY: <PROVIDE YOUR API KEY>" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"evaluators": [
{
"evaluator": "lynx-small",
"criteria": "patronus:hallucination"
}
],
"evaluated_model_input": "What is the car insurance policy?",
"evaluated_model_output": "To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028.",
"evaluated_model_retrieved_context": "To qualify for our car insurance policy, you need a way to show competence in driving which can be accomplished through a valid driver's license. You must have multiple years of experience and cannot be graduating from driving school before or on 2028."
}'
Updated 4 days ago