Agentic Observability and Error Detection

Teams building Agentic AI spend hours combing through traces and logs searching for planning mistakes, incorrect tool calls, and wrong outputs.

In this cookbook, we'll set up a simple agentic workflow and discover how Patronus can make agentic observability fast and reliable through Percival, our AI oversight companion. We'll use Python and orchestrate our agents using the OpenAI Agents SDK.

The first step to agent observability is setting up tracing. Patronus offers a Tracing feature that we will use in this example.

Tracing shows you the execution flow of your agentic system, which is often non-deterministic. It also logs inputs and outputs to important steps in your workflow - like function calls, model invocations, and database queries - so you can debug exactly why something went wrong.

1. Get a Patronus API Key

If you do not have an account yet, sign up for an account at app.patronus.ai

To create an API key, click on API Keys in the navigation bar. Make sure you store this securely as you will not be able to view it again.

2. Set your API Keys

Add the following API Keys to your environment

export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

3. Install your dependencies

pip install patronus
pip install openai-agents
pip install openinference-instrumentation-openai-agents
pip install opentelemetry-instrumentation-asyncio

Note that Tracing is compatible with OpenTelemetry and OpenInference. We are using OpenInference to automatically wrap API calls and steps taken by the OpenAI Agents SDK.

4. Set up and run your code (Python)

The code below:

Sets up a weather agent with a function tool to retrieve weather information in a city
Creates a manager agent that can delegate to the weather agent
Handles the workflow using the OpenAI Agents Runner
Traces the entire agent execution flow with Patronus

from agents import Agent, Runner, function_tool
from openinference.instrumentation.openai_agents import OpenAIAgentsInstrumentor
from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor
import asyncio
import patronus
 
patronus.init(
    integrations=[
        OpenAIAgentsInstrumentor(),
        AsyncioInstrumentor(),
    ]
)
 
 
@function_tool
def get_weather(city: str) -> str:
    return f"The weather in {city} is sunny"
 
 
def get_agents(tools=[]):
    weather_agent = Agent(
        name="weather_agent",
        instructions="You are a helpful assistant that can call tools and return weather related information",
        model="o3-mini",
        tools=tools,
    )
 
    manager_agent = Agent(
        name="manager_agent",
        instructions="You are a helpful assistant that can call other agents to accomplish different tasks",
        model="o3-mini",
        handoffs=[weather_agent],
    )
    return manager_agent
 
 
@patronus.traced("weather-openai-agent")
async def main():
    manager_agent = get_agents([get_weather])
    result = await Runner.run(manager_agent, "How is the weather in Paris, France?")
    return result.final_output
 
 
if __name__ == "__main__":
    print("Starting agent...")
    result = asyncio.run(main())
    print(result)

Now run your code. The output should include something like "The weather in Paris is sunny."

5. Visualize the agent Execution

Go to the Patronus Platform and click on Tracing in the navigation bar.

You should see a trace populated in the table on the page. Click on it, and you should see something like this:

6. Analyze the trace with Percival

Now click "Analyze with Percival" in the top right. This kicks off Percival, our AI oversight agent that parses your trace, analyzes it for systemic errors and 20+ failure modes, and suggests prompt improvements to fix them. You can learn more about Percival and the kinds of errors it detects here.

Percival should've found no issues with the trace:

7. Add an error

Now let's make things interesting by modifying our script and adding an error.

Change the "get_weather" tool call to return information about wave height instead of weather.

@function_tool
def get_weather(city: str) -> str:
    return f"Sure, the weather in {city} is interesting, but have you considered the wave height?"

Traditional evaluation tools can struggle to discover the problem with the output here, because it involves understanding the goals of the agentic system. Percival ingests the instructions given to each Agent in the system so it can discover systemic issues or faulty tool calls.

Now, run your script again with the updated "get_weather" function.

8. Catch the error with Percival

Go back to the Tracing tab and open the new trace you just added. Now, click on "Analyze with Percival".

Percival should have discovered that the "get_weather" tool gave an irrelevant response.

Percival does more than categorize trace falures and suggest fixes. Notice the scores assigned to your trace, like Plan Optimality or Reilability. It also references the spans where failures were found. You can click on the button in the "Spans" section to open the step where an error was found.

You can learn more about Tracing here and Percival here.

Also be sure to check out our specialized Patronus SDK Tracing documentation.

Agentic Observability and Error Detection

On this page