Core Concepts

Evaluation

Broadly, evaluation is the process of assessing the performance, accuracy, and capabilities of an LLM or GenAI system.
In Patronus, an Evaluation refers to a single assessment of your LLM. Generally, the following information is used to run evaluations:
- evaluated_model_input: The user input to the model you are evaluating
- evaluated_model_output: The output of the model you are evaluating
- evaluated_model_retrieved_context: Any extra context passed to the model you are evaluating, like from a retrieval system.
- evaluated_model_gold_answer: The "correct" or expected answer to the user input.

Evaluator

The infrastructure that performs the Evaluation. Evaluators can be based on scores, classifiers, or even carefully-tuned LLMs. Patronus provides several state-of-the-art Evaluators like Patronus Lynx for hallucination detection.
Evaluators are specialized and test for specific issues, such as hallucinations or toxicity. You can easily use a combination of evaluators together to check that your LLM meets your product needs.
Evaluators are versioned by the date they are released. So, custom-large-2024-05-16 is the large version of the custom evaluator released on 05/16/2024.

Evaluator Profile

Evaluator Profiles are configuration settings for Evaluators. You can use Evaluator Profiles to tailor an Evaluator's performance to your use case.
Several Evaluators have default Evaluator Profiles that it will run with if no Evaluator Profile is specified.

Evaluator Family

A grouping of Evaluators that test for the same issues and can all be configured in similar ways (with the same Evaluator Profiles).
For example, the custom family groups together all our custom evaluators - like custom-large-2024-05-16 and custom-small-2024-08-08. They share the same profiles like system:is-concise.

Evaluator Alias

Aliases let you refer to the latest and most advanced evaluator in the Family, rather than specifying an exact version.
For example, you can refer to the alias custom-large which always points to the newest large custom evaluator. If you directly reference custom-large-2024-05-16, you'll need to manually update this to custom-large-2024-08-08 when a new version is available.

Evaluation Result

The output from an Evaluator. Evaluation results are returned in the output of the POST /v1/evaluate API and can be viewed in the Logs dashboard. They are also returned by Evaluation Runs and the Patronus Experimentation SDK (both currently Enterprise features).
Evaluation Results contain all inputs for the evaluation as well as a pass/fail rating, raw score, and evaluator-specific metadata.