Agent Evaluation

This section describes how to evaluate agents in Patronus AI.

Broadly, there are two kinds of agent evaluations:

Observability: Real time monitoring of end-to-end agent executions and failures. This is performed with the Patronus evaluators and logging infrastructure.
Unit testing: Offline development, typically with subagent processes to improve specific functionalities. We recommend our experimentation framework and SDK for batched testing and experimentation.

Both observability and unit testing are critical for the development of performant, reliable agents. The following guide walks you through how to set up an end-to-end agent evaluation pipeline and continue to iterate and improve on agent performance.