Evaluators Reference Guide

The following table contains a quick summary of evaluator families currently supported on the Patronus platform.

Evaluator FamilyDefinitionRequired FieldsScore Type
phiChecks for protected health information (PHI), defined broadly as any information about an individual's health status or provision of healthcare.evaluated_model_outputBinary
piiChecks for personally identifiable information (PII). PII is information that, in conjunction with other data, can identify an individual.evaluated_model_outputBinary
toxicityChecks output for abusive and hateful messages.evaluated_model_outputContinuous
retrieval-hallucinationChecks whether the LLM response is hallucinatory, i.e. the output is not grounded in the provided context.evaluated_model_input
evaluated_model_output
evaluated_model_retrieved_context
Binary
retrieval-answer-relevanceChecks whether the answer is on-topic to the input question. Does not measure correctness.evaluated_model_input
evaluated_model_output
Binary
retrieval-context-relevanceChecks whether the retrieved context is on-topic to the input.evaluated_model_input
evaluated_model_retrieved_context
Binary
retrieval-context-sufficiencyChecks whether the retrieved context is sufficient to generate an output similar in meaning to the label. The label should be the correct evaluation result.evaluated_model_input
evaluated_model_retrieved_context
evaluated_model_output
evaluated_model_gold_answer
Binary
metricsComputes common NLP metrics on the output and label fields to measure semantic overlap and similarity. Currently supports bleu and rouge metrics. evaluated_model_output
evaluated_model_gold_answer
Continuous
customChecks against custom criteria definitions, such as "MODEL OUTPUT should be free from brackets."

LLM based and uses active learning to improve the criteria definition based on user feedback.
evaluated_model_input
evaluated_model_output
evaluated_model_gold_answer
Binary