Agentic Observability and Error Detection
Teams building Agentic AI spend hours combing through traces and logs searching for planning mistakes, incorrect tool calls, and wrong outputs.
In this cookbook, we'll set up a simple agentic workflow and discover how Patronus can make agentic observability fast and reliable through Percival, our AI oversight companion. We'll use Python and orchestrate our agents using the OpenAI Agents SDK.
The first step to agent observability is setting up tracing. Patronus offers a Tracing feature that we will use in this example.
Tracing shows you the execution flow of your agentic system, which is often non-deterministic. It also logs inputs and outputs to important steps in your workflow - like function calls, model invocations, and database queries - so you can debug exactly why something went wrong.
1. Get a Patronus API Key
If you do not have an account yet, sign up for an account at app.patronus.ai
To create an API key, click on API Keys in the navigation bar. Make sure you store this securely as you will not be able to view it again.
2. Set your API Keys
Add the following API Keys to your environment
3. Install your dependencies
Note that Tracing is compatible with OpenTelemetry and OpenInference. We are using OpenInference to automatically wrap API calls and steps taken by the OpenAI Agents SDK.
4. Set up and run your code (Python)
The code below:
- Sets up a weather agent with a function tool to retrieve weather information in a city
- Creates a manager agent that can delegate to the weather agent
- Handles the workflow using the OpenAI Agents Runner
- Traces the entire agent execution flow with Patronus
Now run your code. The output should include something like "The weather in Paris is sunny."
5. Visualize the agent Execution
Go to the Patronus Platform and click on Tracing in the navigation bar.
You should see a trace populated in the table on the page. Click on it, and you should see something like this:
6. Analyze the trace with Percival
Now click "Analyze with Percival" in the top right. This kicks off Percival, our AI oversight agent that parses your trace, analyzes it for systemic errors and 20+ failure modes, and suggests prompt improvements to fix them. You can learn more about Percival and the kinds of errors it detects here.
Percival should've found no issues with the trace:
7. Add an error
Now let's make things interesting by modifying our script and adding an error.
Change the "get_weather" tool call to return information about wave height instead of weather.
Traditional evaluation tools can struggle to discover the problem with the output here, because it involves understanding the goals of the agentic system. Percival ingests the instructions given to each Agent in the system so it can discover systemic issues or faulty tool calls.
Now, run your script again with the updated "get_weather" function.
8. Catch the error with Percival
Go back to the Tracing tab and open the new trace you just added. Now, click on "Analyze with Percival".
Percival should have discovered that the "get_weather" tool gave an irrelevant response.
Percival does more than categorize trace falures and suggest fixes. Notice the scores assigned to your trace, like Plan Optimality or Reilability. It also references the spans where failures were found. You can click on the button in the "Spans" section to open the step where an error was found.
You can learn more about Tracing here and Percival here.
Also be sure to check out our specialized Patronus SDK Tracing documentation.