Lynx 2.0 Guide

Lynx v2.0 is an 8B State-of-the-Art RAG hallucination detection model 🚀

Lynx 2.0 was trained on long context data from real world domains like finance and medicine.

  • Lynx (8B) outperforms Claude-3.5-Sonnet as a judge on HaluBench by 2.2%
  • Lynx (8B) shows 3.4% higher accuracy than Lynx v1.1 on HaluBench
  • First hallucination guardrail trained on long context financial data
  • Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more

Hallucination Taxonomy

Lynx 2.0 supports 8 kinds of hallucinations.

Hallucination TypeDefinition
Predicate ErrorThe predicate in the model output is inconsistent with the retrieved context.
Entity ErrorThe subject/object of a model output is inconsistent with the retrieved context.
Circumstance ErrorTime, duration, or location of an event in the model output is wrong
Coreference ErrorA pronoun/reference with wrong or nonexistent antecedent.
Calculation ErrorsThe calculation to arrive at a numerical answer is incorrect.
Chain of Thought HallucinationsThe chain of thought reasoning in a model output is unfaithful to the retrieved context.
Partially grounded answersPart of the answer is grounded in the retrieved context but the other part of the answer is not supported by the retrieved context.
Unanswerable QuestionsThe question is not answerable using the retrieved context.

Benchmark Performance

We extend Halubench to include three additional datasets that capture the different types of hallucinations mentioned above. We include a long context dataset, QuALITY to capture long-context performance of the model. BUMP and squad capture additional types of hallucinations.

ModelBUMPCovidQADROPPubmedQAQuALITYRAGTruthFinanceBenchsquadAverage accuracy
meta-llama/Llama-3.2-3B-Instruct32.40%44.70%47.40%64.60%36.60%46.22%47.90%60.20%47.50%
meta-llama/Llama-3.1-8B-Instruct64.20%83.00%65.30%80.50%54.60%76.67%59.70%86.00%71.26%
GPT-4o mini73.00%87.20%80.30%84.20%59.60%81.88%81.60%81.80%78.71%
Claude-3.5-Sonnet77.20%88.17%81.82%73.26%62.33%82.77%82.40%95.00%80.37%
Lynx v1.1 (8B)75.00%96.90%77.80%88.90%61.00%80.11%76.70%76.80%79.15%
Lynx v2.0 (8B)77.50%96.00%76.90%85.30%68.40%85.67%72.10%98.60%82.56%
Lynx v1.0 (70B)71.00%97.50%86.40%90.40%63.20%80.22%81.40%87.60%82.22%

How to use Lynx 2.0

Python SDK

Install the patronus sdk:

pip install patronus

Query Lynx via the SDK:

from patronus import Client

client = Client(api_key="<PROVIDE YOUR API KEY>")
result = client.evaluate(
  evaluator="lynx-small",
  criteria="patronus:hallucination",
  evaluated_model_input="What is the car insurance policy?",
  evaluated_model_output="To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028.",
  evaluated_model_retrieved_context="To qualify for our car insurance policy, you need a way to show competence in driving which can be accomplished through a valid driver's license. You must have multiple years of experience and cannot be graduating from driving school before or on 2028.",
)
  
print(result)

cURL Request

curl --request POST \
  --url "https://api.patronus.ai/v1/evaluate" \
  --header "X-API-KEY: <PROVIDE YOUR API KEY>" \
  --header "accept: application/json" \
  --header "content-type: application/json" \
  --data '
    {
      "evaluators": [
        {
          "evaluator": "lynx-small",
          "criteria": "patronus:hallucination"
        }
      ],
      "evaluated_model_input": "What is the car insurance policy?",
      "evaluated_model_output": "To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028.",
      "evaluated_model_retrieved_context": "To qualify for our car insurance policy, you need a way to show competence in driving which can be accomplished through a valid driver's license. You must have multiple years of experience and cannot be graduating from driving school before or on 2028."
    }'