Evaluators Reference Guide

The following table contains a quick summary of evaluator families currently supported on the Patronus platform.

Evaluator Family	Definition	Required Fields	Score Type
phi	Checks for protected health information (PHI), defined broadly as any information about an individual's health status or provision of healthcare.	`evaluated_model_output`	Binary
pii	Checks for personally identifiable information (PII). PII is information that, in conjunction with other data, can identify an individual.	`evaluated_model_output`	Binary
toxicity	Checks output for abusive and hateful messages.	`evaluated_model_output`	Continuous
retrieval-hallucination	Checks whether the LLM response is hallucinatory, i.e. the output is not grounded in the provided context.	`evaluated_model_input` `evaluated_model_output` `evaluated_model_retrieved_context`	Binary
retrieval-answer-relevance	Checks whether the answer is on-topic to the input question. Does not measure correctness.	`evaluated_model_input` `evaluated_model_output`	Binary
retrieval-context-relevance	Checks whether the retrieved context is on-topic to the input.	`evaluated_model_input` `evaluated_model_retrieved_context`	Binary
retrieval-context-sufficiency	Checks whether the retrieved context is sufficient to generate an output similar in meaning to the label. The label should be the correct evaluation result.	`evaluated_model_input` `evaluated_model_retrieved_context` `evaluated_model_output` `evaluated_model_gold_answer`	Binary
metrics	Computes common NLP metrics on the output and label fields to measure semantic overlap and similarity. Currently supports `bleu` and `rouge` metrics.	`evaluated_model_output` `evaluated_model_gold_answer`	Continuous
custom	Checks against custom criteria definitions, such as "MODEL OUTPUT should be free from brackets." LLM based and uses active learning to improve the criteria definition based on user feedback.	`evaluated_model_input` `evaluated_model_output` `evaluated_model_gold_answer`	Binary