Evaluation refers to the process of assessing an AI application's performance. In Patronus, an eval is a logged evaluation result on a single data sample. The evaluation result tells the user whether the AI application passed or failed the given criteria for the given execution. An eval consist of the following:

  • Log: Data associated with the AI execution, including inputs and outputs to the application.
  • Evaluation Result: Scores on whether the logged data passed or failed the specified evaluation criteria.
  • Explanation (optional): Justification for the evaluation result.

In the following sections, we describe each concept and walk you through how to run evals consistently to obtain measurable assessments of an AI system's performance.

See Logs to quickly get started with logging AI application outputs.

See Experiments to learn how to run batched evals in experiments.

Read on to learn more on how to define evaluators that you can use in logging and experimentation ➡️