Reference Guide
The following table contains a quick summary of evaluator families currently supported on the Patronus platform.
Evaluator Family | Definition | Required Fields | Score Type |
---|---|---|---|
phi | Checks for protected health information (PHI), defined broadly as any information about an individual's health status or provision of healthcare. | evaluated_model_output | Binary |
pii | Checks for personally identifiable information (PII). PII is information that, in conjunction with other data, can identify an individual. | evaluated_model_output | Binary |
toxicity | Checks output for abusive and hateful messages. | evaluated_model_output | Continuous |
retrieval-hallucination | Checks whether the LLM response is hallucinatory, i.e. the output is not grounded in the provided context. | evaluated_model_input evaluated_model_output evaluated_model_retrieved_context | Binary |
retrieval-answer-relevance | Checks whether the answer is on-topic to the input question. Does not measure correctness. | evaluated_model_input evaluated_model_output | Binary |
retrieval-context-relevance | Checks whether the retrieved context is on-topic to the input. | evaluated_model_input evaluated_model_retrieved_context | Binary |
retrieval-context-sufficiency | Checks whether the retrieved context is sufficient to generate an output similar in meaning to the label. The label should be the correct evaluation result. | evaluated_model_input evaluated_model_retrieved_context evaluated_model_output evaluated_model_gold_answer | Binary |
metrics | Computes common NLP metrics on the output and label fields to measure semantic overlap and similarity. Currently supports bleu and rouge metrics. | evaluated_model_output evaluated_model_gold_answer | Continuous |
custom | Checks against custom criteria definitions, such as "MODEL OUTPUT should be free from brackets." LLM based and uses active learning to improve the criteria definition based on user feedback. | evaluated_model_input evaluated_model_output evaluated_model_gold_answer | Binary |
Updated 30 days ago