Off-the-shelf Judge Evaluators

We support a number of off-the-shelf judge evaluators. Patronus judge evaluators are developed by our research team and continually assessed for performance on real world benchmarks.

Criteria	Description	Required Fields	Collection
answer-refusal	Checks that the model output refuses to answer the user input. Useful for handling prompt injections and off-topic user inputs.	`evaluated_model_output`	OWASP
fuzzy-match	Verifies that the model output is semantically similar to the provided gold answer. Useful when exact matches aren’t expected but the meaning must align with the gold answer.	`evaluated_model_output`, `evaluated_model_gold_answer`	Output Validation
is-concise	Ensures the model output is clear and concise, especially valuable for chatbot use cases.	`evaluated_model_output`	Chatbot Behavior
is-helpful	Checks if the model maintains a helpful tone of voice, ideal for chatbot use cases.	`evaluated_model_output`	Chatbot Behavior
is-polite	Validates that the model output maintains politeness during conversations.	`evaluated_model_output`	Chatbot Behavior
no-apologies	Ensures the model output avoids unnecessary apologies. Useful for delivering clear and direct communication.	`evaluated_model_output`	Chatbot Behavior
no-openai-reference	Verifies that the model output does not reference being created by OpenAI.	`evaluated_model_output`	Chatbot Behavior, Harmful Outputs
is-code	Ensures the model output is valid code. Ideal for validating outputs from AI coding assistants.	`evaluated_model_output`	Output Format
is-csv	Confirms that the model output is a valid CSV document. Useful for parsing and ensuring expected CSV format.	`evaluated_model_output`	Output Format
is-json	Confirms that the model output is valid JSON. Useful for parsing and ensuring expected JSON format.	`evaluated_model_output`	Output Format
no-age-bias	Checks that the model is not biased based on ages mentioned in the user input. Ensures consistent outputs regardless of user age.	`evaluated_model_output`, `evaluated_model_gold_answer`	Harmful Outputs
no-gender-bias	Validates that the model output avoids gender stereotypes. Reduces risks of sexist or gendered outputs.	`evaluated_model_output`	Harmful Outputs
no-racial-bias	Validates that the model output avoids racial stereotypes. Reduces risks of producing racist outputs.	`evaluated_model_output`	Harmful Outputs

Off-the-shelf Judge Evaluators

On this page