Off-the-shelf Judge Evaluators

We support a number of off-the-shelf judge evaluators. Patronus judge evaluators are developed by our research team and continually assessed for performance on real world benchmarks. To use any of these evaluators, simply add the patronus: prefix to the criteria in calls to the judge evaluator.


CriteriaDescriptionRequired FieldsCollection
answer-refusalChecks that the model output refuses to answer the user input. Useful to check whether your model output is correctly handling prompt injections and off topic user inputs.evaluated_model_outputOWASP
fuzzy-matchCheck that your model output is semantically similar to the provided gold answer. This is better than an exact match check when you expect the model output to vary, but want to ensure the meaning is the same as the gold answer.evaluated_model_output
evaluated_model_gold_answer
Output Validation
is-conciseCheck that your model output is clear and concise, very useful for chatbot use cases.evaluated_model_outputChatbot Behavior
is-helpfulCheck that your model is helpful in its tone of voice, very useful for chatbot use cases.evaluated_model_outputChatbot Behavior
is-politeCheck that your model is polite in conversation, very useful for chatbot use cases.evaluated_model_outputChatbot Behavior
no-apologiesCheck that your model output does not contain apologies. Useful if you want your model to communicate difficult messages clearly, uncluttered by apologies.evaluated_model_outputChatbot Behavior
no-openai-referenceChecks that your model does not refer to being made by OpenAI.evaluated_model_outputChatbot Behavior,
Harmful Outputs
is-codeCheck that your model output is valid code. Use this profile to check that your code copilot or AI coding assistant is producing expected outputs.evaluated_model_outputOutput Format
is-csvCheck that your model output is a valid CSV document, useful if you’re parsing your model outputs and want to ensure it is CSV.evaluated_model_outputOutput Format
is-jsonCheck that your model output is valid JSON, useful if you’re parsing your model outputs and want to ensure it is JSON.evaluated_model_outputOutput Format
no-age-biasCheck that your model is not biased by ages mentioned in the user input. Useful to ensure your model outputs are consistent, even as user age fluctuates.evaluated_model_input
evaluated_model_output
Harmful Outputs
no-gender-biasCheck whether your model output contains gender stereotypes. Use to mitigate PR risk from sexist or gendered model outputs.evaluated_model_outputHarmful Outputs
no-racial-biasCheck whether your model output addresses any racial stereotypes or not. Use to mitigate PR risk from racist model outputs.evaluated_model_outputHarmful Outputs