Percival Overview

Percival is a highly intelligent agent developed by the Patronus AI team. It is capable of detecting 20+ failure modes in agentic traces and suggesting optimizations for agentic systems. Think of Percival as your best AI debugger who has spent thousands of hours understanding your traces and processing millions of tokens. It has saved engineering teams hundreds of hours in analyzing individual traces, clustering errors, and prompt engineering.

Percival can be activated through an "Analyze with Percival" button over traces. Above is an image showing a cluster of errors and prompt optimizations to prevent repeated tool calls. These prompts can be appended to existing ones. The full list of errors is below.

How Percival works

Percival is an adaptive learning evaluation agent. It ingests a trace, processes the spans, and generates insights. The generated insights are a summary of errors and optimizations. Specifically, Percival clusters errors, recommends prompt fixes, and scores the trace on a 1-5 scale for security, reliability, etc. It also stores generated insights in memory. Memory is both episodic (what tools have previously been called in traces) and semantic (human-provided feedback on agents). Percival uses this memory to improve its generated insights, allowing it to learn from any system and improve over time.

Why we created Percival

We’ve seen firsthand how many hours teams building agentic AI spend combing through traces and logs searching for planning mistakes, incorrect tool calls, and wrong outputs. We built Percival to make this process fast and reliable: with the click of a button, it analyzes full agent workflows, surfaces 20+ failure modes, and suggests prompt improvements to fix them.

Traditional evaluation approaches like LLMs-as-a-Judge or curated test datasets catch mistakes at specific points, but often miss the broader context and overlook systemic issues like flawed planning or misused tools. Percival can close this gap by analyzing the full agentic execution, even when it is long or complex.

Error Taxonomy

Here are the agentic errors that Percival can catch:

Category	Sub-category	Error Type	Brief Description
Reasoning Errors	Hallucinations	Language-only	Fabricated content without using tools
Reasoning Errors	Hallucinations	Tool-related	Invented tool outputs or capabilities
Reasoning Errors	Information Processing	Poor Information Retrieval	Retrieved or cited information irrelevant to the task
Reasoning Errors	Information Processing	Tool Output Misinterpretation	Misread or mis-applied a tool's result
Reasoning Errors	Decision Making	Incorrect Problem Identification	Misunderstood the overall or local task
Reasoning Errors	Decision Making	Tool Selection Errors	Chose an inappropriate tool for the job
Reasoning Errors	Output Generation	Formatting Errors	Produced malformed code / data or wrong structure
Reasoning Errors	Output Generation	Instruction Non-compliance	Ignored or deviated from the given instructions
System Execution Errors	Configuration	Tool Definition Issues	Tool was mis-declared (e.g. search declared as calculator)
System Execution Errors	Configuration	Environment Setup Errors	Missing keys, permissions, or other setup problems
System Execution Errors	API Issues	Rate Limiting	Exceeded quota (HTTP 429)
System Execution Errors	API Issues	Authentication Errors	Invalid or missing credentials (HTTP 401/403)
System Execution Errors	API Issues	Service Errors	Upstream failure (HTTP 500)
System Execution Errors	API Issues	Resource Not Found	Endpoint or asset missing (HTTP 404)
System Execution Errors	Resource Management	Resource Exhaustion	Ran out of memory / disk / other resources
System Execution Errors	Resource Management	Timeout Issues	Model timed out during execution
Planning and Coordination Errors	Context Management	Context Handling Failures	Context is not retained or used correctly
Planning and Coordination Errors	Context Management	Resource Abuse	A resource is unnecessarily used or called repeatedly
Planning and Coordination Errors	Task Management	Goal Deviation	Orchestrator deviates from the intended plan
Planning and Coordination Errors	Task Management	Task Orchestration	Assignment of tasks to the wrong sub-agents

How to get started

Plug into tracing here:
Use the @traced decorator from here:
Patronus Tracing Documentation

Or import traces in using OpenTelemetry:
OpenTelemetry Documentation

Or if you want to get started fast and don't have anything to trace, plug into Colab notebooks here.
- Smolagents
- Pydantic AI
- OpenAI Agent SDK
- Langchain
- CrewAI
- Custom OpenAI and Anthropic clients (compatible with OpenAIInstrumentor and AnthropicInstrumentor)
Navigate to the Traces tab
Click "Analyze with Percival"
Start reading the summary of Generated Insights and double click into error clustering and prompt optimizations!

Integrations with Agentic Frameworks

Since Percival's trace parser relies on opentelemetry and openinference tracing convention, the following frameworks are supported out of the box:

Smolagents
Pydantic AI
OpenAI Agent SDK
Langchain
CrewAI
Custom OpenAI and Anthropic clients (compatible with OpenAIInstrumentor and AnthropicInstrumentor)