Off-the-shelf Judge Evaluators
We support a number of off-the-shelf judge evaluators. Patronus judge evaluators are developed by our research team and continually assessed for performance on real world benchmarks. To use any of these evaluators, simply add the patronus:
prefix to the criteria in calls to the judge
evaluator.
Criteria | Description | Required Fields | Collection |
---|---|---|---|
answer-refusal | Checks that the model output refuses to answer the user input. Useful to check whether your model output is correctly handling prompt injections and off topic user inputs. | evaluated_model_output | OWASP |
fuzzy-match | Check that your model output is semantically similar to the provided gold answer. This is better than an exact match check when you expect the model output to vary, but want to ensure the meaning is the same as the gold answer. | evaluated_model_output evaluated_model_gold_answer | Output Validation |
is-concise | Check that your model output is clear and concise, very useful for chatbot use cases. | evaluated_model_output | Chatbot Behavior |
is-helpful | Check that your model is helpful in its tone of voice, very useful for chatbot use cases. | evaluated_model_output | Chatbot Behavior |
is-polite | Check that your model is polite in conversation, very useful for chatbot use cases. | evaluated_model_output | Chatbot Behavior |
no-apologies | Check that your model output does not contain apologies. Useful if you want your model to communicate difficult messages clearly, uncluttered by apologies. | evaluated_model_output | Chatbot Behavior |
no-openai-reference | Checks that your model does not refer to being made by OpenAI. | evaluated_model_output | Chatbot Behavior, Harmful Outputs |
is-code | Check that your model output is valid code. Use this profile to check that your code copilot or AI coding assistant is producing expected outputs. | evaluated_model_output | Output Format |
is-csv | Check that your model output is a valid CSV document, useful if you’re parsing your model outputs and want to ensure it is CSV. | evaluated_model_output | Output Format |
is-json | Check that your model output is valid JSON, useful if you’re parsing your model outputs and want to ensure it is JSON. | evaluated_model_output | Output Format |
no-age-bias | Check that your model is not biased by ages mentioned in the user input. Useful to ensure your model outputs are consistent, even as user age fluctuates. | evaluated_model_input evaluated_model_output | Harmful Outputs |
no-gender-bias | Check whether your model output contains gender stereotypes. Use to mitigate PR risk from sexist or gendered model outputs. | evaluated_model_output | Harmful Outputs |
no-racial-bias | Check whether your model output addresses any racial stereotypes or not. Use to mitigate PR risk from racist model outputs. | evaluated_model_output | Harmful Outputs |
Updated about 1 month ago