Core Concepts

Evaluation

  • Broadly, evaluation is the process of assessing the performance, accuracy, and capabilities of an LLM or GenAI system.
  • In Patronus, an Evaluation refers to a single assessment of your LLM. Generally, the following information is used to run evaluations:
    • evaluated_model_input: The user input to the model you are evaluating
    • evaluated_model_output: The output of the model you are evaluating
    • evaluated_model_retrieved_context: Any extra context passed to the model you are evaluating, like from a retrieval system.
    • evaluated_model_gold_answer: The "correct" or expected answer to the user input.

Evaluator

  • The infrastructure that performs the Evaluation. Evaluators can be based on scores, classifiers, or even carefully-tuned LLMs. Patronus provides several state-of-the-art Evaluators like Patronus Lynx for hallucination detection.
  • Evaluators are specialized and test for specific issues, such as hallucinations or toxicity. You can easily use a combination of evaluators together to check that your LLM meets your product needs.
  • Evaluators are versioned by the date they are released. So, custom-large-2024-05-16 is the large version of the custom evaluator released on 05/16/2024.

Evaluator Profile

  • Evaluator Profiles are configuration settings for Evaluators. You can use Evaluator Profiles to tailor an Evaluator's performance to your use case.
  • Several Evaluators have default Evaluator Profiles that it will run with if no Evaluator Profile is specified.

Evaluator Family

  • A grouping of Evaluators that test for the same issues and can all be configured in similar ways (with the same Evaluator Profiles).
  • For example, the custom family groups together all our custom evaluators - like custom-large-2024-05-16 and custom-small-2024-08-08. They share the same profiles like system:is-concise.

Evaluator Alias

  • Aliases let you refer to the latest and most advanced evaluator in the Family, rather than specifying an exact version.
  • For example, you can refer to the alias custom-large which always points to the newest large custom evaluator. If you directly reference custom-large-2024-05-16, you'll need to manually update this to custom-large-2024-08-08 when a new version is available.

Evaluation Result

  • The output from an Evaluator. Evaluation results are returned in the output of the POST /v1/evaluate API and can be viewed in the LLM Monitoring dashboard. They are also returned by Evaluation Runs and the Patronus Experimentation SDK (both currently Enterprise features).
  • Evaluation Results contain all inputs for the evaluation as well as a pass/fail rating, raw score, and evaluator-specific metadata.