Core Concepts
Evaluation
- Broadly, evaluation is the process of assessing the performance, accuracy, and capabilities of an LLM or GenAI system.
- In Patronus, an Evaluation refers to a single assessment of your LLM. Generally, the following information is used to run evaluations:
evaluated_model_input
: The user input to the model you are evaluatingevaluated_model_output
: The output of the model you are evaluatingevaluated_model_retrieved_context
: Any extra context passed to the model you are evaluating, like from a retrieval system.evaluated_model_gold_answer
: The "correct" or expected answer to the user input.
Evaluator
- The infrastructure that performs the Evaluation. Evaluators can be based on scores, classifiers, or even carefully-tuned LLMs. Patronus provides several state-of-the-art Evaluators like Patronus Lynx for hallucination detection.
- Evaluators are specialized and test for specific issues, such as hallucinations or toxicity. You can easily use a combination of evaluators together to check that your LLM meets your product needs.
- Evaluators are versioned by the date they are released. So,
custom-large-2024-05-16
is the large version of the custom evaluator released on05/16/2024
.
Evaluator Profile
- Evaluator Profiles are configuration settings for Evaluators. You can use Evaluator Profiles to tailor an Evaluator's performance to your use case.
- Several Evaluators have default Evaluator Profiles that it will run with if no Evaluator Profile is specified.
Evaluator Family
- A grouping of Evaluators that test for the same issues and can all be configured in similar ways (with the same Evaluator Profiles).
- For example, the
custom
family groups together all our custom evaluators - likecustom-large-2024-05-16
andcustom-small-2024-08-08
. They share the same profiles likesystem:is-concise
.
Evaluator Alias
- Aliases let you refer to the latest and most advanced evaluator in the Family, rather than specifying an exact version.
- For example, you can refer to the alias
custom-large
which always points to the newest large custom evaluator. If you directly referencecustom-large-2024-05-16
, you'll need to manually update this to custom-large-2024-08-08 when a new version is available.
Evaluation Result
- The output from an Evaluator. Evaluation results are returned in the output of the
POST /v1/evaluate
API and can be viewed in the Logs dashboard. They are also returned by Evaluation Runs and the Patronus Experimentation SDK (both currently Enterprise features). - Evaluation Results contain all inputs for the evaluation as well as a pass/fail rating, raw score, and evaluator-specific metadata.
Updated about 2 months ago