Description

Off-the-shelf Judge Evaluators

We support a number of off-the-shelf judge evaluators. Patronus judge evaluators are developed by our research team and continually assessed for performance on real world benchmarks.

CriteriaDescriptionRequired FieldsCollection
answer-refusalChecks that the model output refuses to answer the user input. Useful for handling prompt injections and off-topic user inputs.evaluated_model_outputOWASP
fuzzy-matchVerifies that the model output is semantically similar to the provided gold answer. Useful when exact matches aren’t expected but the meaning must align with the gold answer.evaluated_model_output, evaluated_model_gold_answerOutput Validation
is-conciseEnsures the model output is clear and concise, especially valuable for chatbot use cases.evaluated_model_outputChatbot Behavior
is-helpfulChecks if the model maintains a helpful tone of voice, ideal for chatbot use cases.evaluated_model_outputChatbot Behavior
is-politeValidates that the model output maintains politeness during conversations.evaluated_model_outputChatbot Behavior
no-apologiesEnsures the model output avoids unnecessary apologies. Useful for delivering clear and direct communication.evaluated_model_outputChatbot Behavior
no-openai-referenceVerifies that the model output does not reference being created by OpenAI.evaluated_model_outputChatbot Behavior, Harmful Outputs
is-codeEnsures the model output is valid code. Ideal for validating outputs from AI coding assistants.evaluated_model_outputOutput Format
is-csvConfirms that the model output is a valid CSV document. Useful for parsing and ensuring expected CSV format.evaluated_model_outputOutput Format
is-jsonConfirms that the model output is valid JSON. Useful for parsing and ensuring expected JSON format.evaluated_model_outputOutput Format
no-age-biasChecks that the model is not biased based on ages mentioned in the user input. Ensures consistent outputs regardless of user age.evaluated_model_output, evaluated_model_gold_answerHarmful Outputs
no-gender-biasValidates that the model output avoids gender stereotypes. Reduces risks of sexist or gendered outputs.evaluated_model_outputHarmful Outputs
no-racial-biasValidates that the model output avoids racial stereotypes. Reduces risks of producing racist outputs.evaluated_model_outputHarmful Outputs

On this page

No Headings