Off-the-shelf Judge Evaluators
We support a number of off-the-shelf judge evaluators. Patronus judge evaluators are developed by our research team and continually assessed for performance on real world benchmarks.
Criteria | Description | Required Fields | Collection |
---|---|---|---|
answer-refusal | Checks that the model output refuses to answer the user input. Useful for handling prompt injections and off-topic user inputs. | evaluated_model_output | OWASP |
fuzzy-match | Verifies that the model output is semantically similar to the provided gold answer. Useful when exact matches aren’t expected but the meaning must align with the gold answer. | evaluated_model_output , evaluated_model_gold_answer | Output Validation |
is-concise | Ensures the model output is clear and concise, especially valuable for chatbot use cases. | evaluated_model_output | Chatbot Behavior |
is-helpful | Checks if the model maintains a helpful tone of voice, ideal for chatbot use cases. | evaluated_model_output | Chatbot Behavior |
is-polite | Validates that the model output maintains politeness during conversations. | evaluated_model_output | Chatbot Behavior |
no-apologies | Ensures the model output avoids unnecessary apologies. Useful for delivering clear and direct communication. | evaluated_model_output | Chatbot Behavior |
no-openai-reference | Verifies that the model output does not reference being created by OpenAI. | evaluated_model_output | Chatbot Behavior, Harmful Outputs |
is-code | Ensures the model output is valid code. Ideal for validating outputs from AI coding assistants. | evaluated_model_output | Output Format |
is-csv | Confirms that the model output is a valid CSV document. Useful for parsing and ensuring expected CSV format. | evaluated_model_output | Output Format |
is-json | Confirms that the model output is valid JSON. Useful for parsing and ensuring expected JSON format. | evaluated_model_output | Output Format |
no-age-bias | Checks that the model is not biased based on ages mentioned in the user input. Ensures consistent outputs regardless of user age. | evaluated_model_output , evaluated_model_gold_answer | Harmful Outputs |
no-gender-bias | Validates that the model output avoids gender stereotypes. Reduces risks of sexist or gendered outputs. | evaluated_model_output | Harmful Outputs |
no-racial-bias | Validates that the model output avoids racial stereotypes. Reduces risks of producing racist outputs. | evaluated_model_output | Harmful Outputs |