Patronus Evaluators

Patronus Evaluators are powerful LLM judges that are benchmarked for human alignment and quality. Patronus evaluators are accessible via the Patronus API.

You can run an eval with a Patronus evaluator with one line of code:

from patronus import Client

client = Client(api_key="YOUR_API_KEY")
result = client.evaluate(
    evaluator="lynx",
    criteria="patronus:hallucination",
    evaluated_model_input="What is the largest animal in the world?",
    evaluated_model_output="The giant sandworm.",
    evaluated_model_retrieved_context="The blue whale is the largest known animal.",
    tags={"scenario": "onboarding"},
)
const apiKey = "YOUR_API_KEY";

fetch('https://api.patronus.ai/v1/evaluate', {
    method: 'POST',
    headers: {
        'X-API-KEY': apiKey,
        'accept': 'application/json',
        'content-type': 'application/json'
    },
    body: JSON.stringify({
        evaluators: [{ evaluator: "lynx", criteria: "patronus:hallucination" }],
        evaluated_model_input: "What is the largest animal in the world?",
        evaluated_model_output: "The giant sandworm.",
        evaluated_model_retrieved_context: "The blue whale is the largest known animal.",
        tags={"scenario": "onboarding"}
    })
})
    .then(response => response.json())
    .then(data => console.log(data))
    .catch(error => console.error(error));
curl --location 'https://api.patronus.ai/v1/evaluate' \
--header 'X-API-KEY: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "evaluators": [
    {
      "evaluator": "retrieval-hallucination-lynx",
      "explain_strategy": "always"
    }
  ],
  "evaluated_model_input": "What is the largest animal in the world?",
  "evaluated_model_output": "The giant sandworm.",
  "evaluated_model_retrieved_context": ["The blue whale is the largest known animal."],
  "tags": {"scenario": "onboarding"}
}'

Unlike function and class based evaluators defined via the SDK, Patronus evaluators execute evals remotely on Patronus infrastructure. The above code will call the https://api.patronus.ai/v1/evaluate endpoint and return an Evaluation Result containing scores and associated metadata.

The patronus.Client.evaluate(...) method accepts several fields:

  • evaluator: The evaluator that will execute the evaluation. For a full set of Patronus evaluators you can reference, see the Evaluators Reference Guide.
  • criteria: Criteria that is used to configure the evaluator. For most evaluators, the criteria field is optional. It is required for judge and judge-mm evaluators.

Patronus Evaluators in Experiments

To use Patronus evaluators in experiments, you must first load the evaluator with patronus.Client.remote_evaluator(...). This evaluator can then be passed in the evaluators array. For example,

fuzzy_match = client.remote_evaluator("judge-small", "patronus:fuzzy-match")

client.experiment(
    "Tutorial",
    data=dataset,
    task=call_gpt,
    evaluators=[fuzzy_match],
    tags={"dataset_type": "example", "model": "gpt_4o_mini"},
    experiment_name="Example Experiment",
)

In the above example, judge-small specifies the evaluator, and patronus:fuzzy-match specifies the criteria.

Check out the Patronus API section for more tutorials and examples for Patronus Evaluators.

Patronus evaluators can be finetuned for various use cases. Reach out to our team to learn more about how to customize Patronus evaluators for your evals!