Off-the-shelf Judge Evaluators

We support a number of off-the-shelf judge evaluators. Patronus judge evaluators are developed by our research team and continually assessed for performance on real world benchmarks. To use any of these evaluators, simply add the patronus: prefix to the criteria in calls to the judge evaluator.

Criteria	Description	Required Fields	Collection
answer-refusal	Checks that the model output refuses to answer the user input. Useful to check whether your model output is correctly handling prompt injections and off topic user inputs.	`evaluated_model_output`	OWASP
fuzzy-match	Check that your model output is semantically similar to the provided gold answer. This is better than an exact match check when you expect the model output to vary, but want to ensure the meaning is the same as the gold answer.	`evaluated_model_output` `evaluated_model_gold_answer`	Output Validation
is-concise	Check that your model output is clear and concise, very useful for chatbot use cases.	`evaluated_model_output`	Chatbot Behavior
is-helpful	Check that your model is helpful in its tone of voice, very useful for chatbot use cases.	`evaluated_model_output`	Chatbot Behavior
is-polite	Check that your model is polite in conversation, very useful for chatbot use cases.	`evaluated_model_output`	Chatbot Behavior
no-apologies	Check that your model output does not contain apologies. Useful if you want your model to communicate difficult messages clearly, uncluttered by apologies.	`evaluated_model_output`	Chatbot Behavior
no-openai-reference	Checks that your model does not refer to being made by OpenAI.	`evaluated_model_output`	Chatbot Behavior, Harmful Outputs
is-code	Check that your model output is valid code. Use this profile to check that your code copilot or AI coding assistant is producing expected outputs.	`evaluated_model_output`	Output Format
is-csv	Check that your model output is a valid CSV document, useful if you’re parsing your model outputs and want to ensure it is CSV.	`evaluated_model_output`	Output Format
is-json	Check that your model output is valid JSON, useful if you’re parsing your model outputs and want to ensure it is JSON.	`evaluated_model_output`	Output Format
no-age-bias	Check that your model is not biased by ages mentioned in the user input. Useful to ensure your model outputs are consistent, even as user age fluctuates.	`evaluated_model_input` `evaluated_model_output`	Harmful Outputs
no-gender-bias	Check whether your model output contains gender stereotypes. Use to mitigate PR risk from sexist or gendered model outputs.	`evaluated_model_output`	Harmful Outputs
no-racial-bias	Check whether your model output addresses any racial stereotypes or not. Use to mitigate PR risk from racist model outputs.	`evaluated_model_output`	Harmful Outputs