Percival Overview
Percival is a highly intelligent agent developed by the Patronus AI team. It is capable of detecting 20+ failure modes in agentic traces and suggesting optimizations for agentic systems. Think of Percival as your best AI debugger who has spent thousands of hours understanding your traces and processing millions of tokens. It has saved engineering teams hundreds of hours in analyzing individual traces, clustering errors, and prompt engineering.
Percival can be activated through an "Analyze with Percival" button over traces. Above is an image showing a cluster of errors and prompt optimizations to prevent repeated tool calls. These prompts can be appended to existing ones. The full list of errors is below.
How Percival works
Percival is an adaptive learning evaluation agent. It ingests a trace, processes the spans, and generates insights. The generated insights are a summary of errors and optimizations. Specifically, Percival clusters errors, recommends prompt fixes, and scores the trace on a 1-5 scale for security, reliability, etc. It also stores generated insights in memory. Memory is both episodic (what tools have previously been called in traces) and semantic (human-provided feedback on agents). Percival uses this memory to improve its generated insights, allowing it to learn from any system and improve over time.
Why we created Percival
We’ve seen firsthand how many hours teams building agentic AI spend combing through traces and logs searching for planning mistakes, incorrect tool calls, and wrong outputs. We built Percival to make this process fast and reliable: with the click of a button, it analyzes full agent workflows, surfaces 20+ failure modes, and suggests prompt improvements to fix them.
Traditional evaluation approaches like LLMs-as-a-Judge or curated test datasets catch mistakes at specific points, but often miss the broader context and overlook systemic issues like flawed planning or misused tools. Percival can close this gap by analyzing the full agentic execution, even when it is long or complex.
Error Taxonomy
Here are the agentic errors that Percival can catch:
Category | Sub-category | Error Type | Brief Description |
---|---|---|---|
Reasoning Errors | Hallucinations | Language-only | Fabricated content without using tools |
Reasoning Errors | Hallucinations | Tool-related | Invented tool outputs or capabilities |
Reasoning Errors | Information Processing | Poor Information Retrieval | Retrieved or cited information irrelevant to the task |
Reasoning Errors | Information Processing | Tool Output Misinterpretation | Misread or mis-applied a tool's result |
Reasoning Errors | Decision Making | Incorrect Problem Identification | Misunderstood the overall or local task |
Reasoning Errors | Decision Making | Tool Selection Errors | Chose an inappropriate tool for the job |
Reasoning Errors | Output Generation | Formatting Errors | Produced malformed code / data or wrong structure |
Reasoning Errors | Output Generation | Instruction Non-compliance | Ignored or deviated from the given instructions |
System Execution Errors | Configuration | Tool Definition Issues | Tool was mis-declared (e.g. search declared as calculator) |
System Execution Errors | Configuration | Environment Setup Errors | Missing keys, permissions, or other setup problems |
System Execution Errors | API Issues | Rate Limiting | Exceeded quota (HTTP 429) |
System Execution Errors | API Issues | Authentication Errors | Invalid or missing credentials (HTTP 401/403) |
System Execution Errors | API Issues | Service Errors | Upstream failure (HTTP 500) |
System Execution Errors | API Issues | Resource Not Found | Endpoint or asset missing (HTTP 404) |
System Execution Errors | Resource Management | Resource Exhaustion | Ran out of memory / disk / other resources |
System Execution Errors | Resource Management | Timeout Issues | Model timed out during execution |
Planning and Coordination Errors | Context Management | Context Handling Failures | Context is not retained or used correctly |
Planning and Coordination Errors | Context Management | Resource Abuse | A resource is unnecessarily used or called repeatedly |
Planning and Coordination Errors | Task Management | Goal Deviation | Orchestrator deviates from the intended plan |
Planning and Coordination Errors | Task Management | Task Orchestration | Assignment of tasks to the wrong sub-agents |
How to get started
-
Plug into tracing here:
Use the @traced decorator from here:
Patronus Tracing DocumentationOr import traces in using OpenTelemetry:
OpenTelemetry DocumentationOr if you want to get started fast and don't have anything to trace, plug into Colab notebooks here.
- Smolagents
- Pydantic AI
- OpenAI Agent SDK
- Langchain
- CrewAI
- Custom OpenAI and Anthropic clients (compatible with OpenAIInstrumentor and AnthropicInstrumentor)
-
Navigate to the Traces tab
-
Click "Analyze with Percival"
-
Start reading the summary of Generated Insights and double click into error clustering and prompt optimizations!
Integrations with Agentic Frameworks
Since Percival's trace parser relies on opentelemetry and openinference tracing convention, the following frameworks are supported out of the box:
- Smolagents
- Pydantic AI
- OpenAI Agent SDK
- Langchain
- CrewAI
- Custom OpenAI and Anthropic clients (compatible with OpenAIInstrumentor and AnthropicInstrumentor)