Patronus Evaluators
Patronus Evaluators are powerful LLM judges that are benchmarked for human alignment and quality. Patronus evaluators are accessible via the Patronus API.
You can run an eval with a Patronus evaluator with one line of code:
from patronus import Client
client = Client(api_key="YOUR_API_KEY")
result = client.evaluate(
evaluator="lynx",
criteria="patronus:hallucination",
evaluated_model_input="What is the largest animal in the world?",
evaluated_model_output="The giant sandworm.",
evaluated_model_retrieved_context="The blue whale is the largest known animal.",
tags={"scenario": "onboarding"},
)
const apiKey = "YOUR_API_KEY";
fetch('https://api.patronus.ai/v1/evaluate', {
method: 'POST',
headers: {
'X-API-KEY': apiKey,
'accept': 'application/json',
'content-type': 'application/json'
},
body: JSON.stringify({
evaluators: [{ evaluator: "lynx", criteria: "patronus:hallucination" }],
evaluated_model_input: "What is the largest animal in the world?",
evaluated_model_output: "The giant sandworm.",
evaluated_model_retrieved_context: "The blue whale is the largest known animal.",
tags={"scenario": "onboarding"}
})
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error(error));
curl --location 'https://api.patronus.ai/v1/evaluate' \
--header 'X-API-KEY: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"evaluators": [
{
"evaluator": "retrieval-hallucination-lynx",
"explain_strategy": "always"
}
],
"evaluated_model_input": "What is the largest animal in the world?",
"evaluated_model_output": "The giant sandworm.",
"evaluated_model_retrieved_context": ["The blue whale is the largest known animal."],
"tags": {"scenario": "onboarding"}
}'
Unlike function and class based evaluators defined via the SDK, Patronus evaluators execute evals remotely on Patronus infrastructure. The above code will call the https://api.patronus.ai/v1/evaluate
endpoint and return an Evaluation Result containing scores and associated metadata.
The patronus.Client.evaluate(...)
method accepts several fields:
evaluator
: The evaluator that will execute the evaluation. For a full set of Patronus evaluators you can reference, see the Evaluators Reference Guide.criteria
: Criteria that is used to configure the evaluator. For most evaluators, thecriteria
field is optional. It is required forjudge
andjudge-mm
evaluators.
Patronus Evaluators in Experiments
To use Patronus evaluators in experiments, you must first load the evaluator with patronus.Client.remote_evaluator(...)
. This evaluator can then be passed in the evaluators
array. For example,
fuzzy_match = client.remote_evaluator("judge-small", "patronus:fuzzy-match")
client.experiment(
"Tutorial",
data=dataset,
task=call_gpt,
evaluators=[fuzzy_match],
tags={"dataset_type": "example", "model": "gpt_4o_mini"},
experiment_name="Example Experiment",
)
In the above example, judge-small
specifies the evaluator, and patronus:fuzzy-match
specifies the criteria.
Check out the Patronus API section for more tutorials and examples for Patronus Evaluators.
Patronus evaluators can be finetuned for various use cases. Reach out to our team to learn more about how to customize Patronus evaluators for your evals!
Updated 26 days ago