Agent observability in Patronus is the real time monitoring and evaluation of end-to-end agent executions. Embedding observability and evaluation in agent executions is important, because it can identify failures such as
Incorrect tool use
Failure to delegate a task
Unsatisfactory answers
Incorrect tool outputs
Observing and evaluating agents in Patronus is the process of embedding evaluators in agent executions so that agent behaviors can be continuously monitored and analyzed in the platform.
The first step in evaluating an agent is to define a set of evaluators. See the Evaluators section to understand the difference between class based and Patronus API evaluators.
Let's create an example coding agent using CrewAI. The agent calls a LLM API and retrieves a response (example uses OpenAI, but any LLM API is equivalent).
Suppose we want to evaluate the helpfulness of the agent response. To embed a Patronus evaluator, simply add the following:
We can embed this evaluator in the tool call, right after the generation:
Now you can run the agent with crewai run, and you will see evaluation results populated in real time in Logs.
That's it! Now each agent execution that is triggered will also log outputs and evals to the Patronus logs dashboard.
Embedding evaluators in agent executions enables agent behaviors to be continuously monitored and analyzed our the platform. You can send alerts on failed agent outputs, filter for interesting examples and add it to your testing data, and re-try the agent response when there are failures. The possibilities are endless!:rainbow: