Our Python SDK got smarter. We developed a Typscript SDK too. We are updating our SDK code blocks. Python SDKhere.Typscript SDKhere.
Description
Percival

Percival Overview

Percival is a highly intelligent agent developed by the Patronus AI team. It is capable of detecting 20+ failure modes in agentic traces and suggesting optimizations for agentic systems. Think of Percival as your best AI debugger who has spent thousands of hours understanding your traces and processing millions of tokens. It has saved engineering teams hundreds of hours in analyzing individual traces, clustering errors, and prompt engineering.

Percival can be activated through an "Analyze with Percival" button over traces. Above is an image showing a cluster of errors and prompt optimizations to prevent repeated tool calls. These prompts can be appended to existing ones. The full list of errors is below.

How Percival works

Percival is an adaptive learning evaluation agent. It ingests a trace, processes the spans, and generates insights. The generated insights are a summary of errors and optimizations. Specifically, Percival clusters errors, recommends prompt fixes, and scores the trace on a 1-5 scale for security, reliability, etc. It also stores generated insights in memory. Memory is both episodic (what tools have previously been called in traces) and semantic (human-provided feedback on agents). Percival uses this memory to improve its generated insights, allowing it to learn from any system and improve over time.

Why we created Percival

We’ve seen firsthand how many hours teams building agentic AI spend combing through traces and logs searching for planning mistakes, incorrect tool calls, and wrong outputs. We built Percival to make this process fast and reliable: with the click of a button, it analyzes full agent workflows, surfaces 20+ failure modes, and suggests prompt improvements to fix them.

Traditional evaluation approaches like LLMs-as-a-Judge or curated test datasets catch mistakes at specific points, but often miss the broader context and overlook systemic issues like flawed planning or misused tools. Percival can close this gap by analyzing the full agentic execution, even when it is long or complex.

Error Taxonomy

Here are the agentic errors that Percival can catch:

CategorySub-categoryError TypeBrief Description
Reasoning ErrorsHallucinationsLanguage-onlyFabricated content without using tools
Reasoning ErrorsHallucinationsTool-relatedInvented tool outputs or capabilities
Reasoning ErrorsInformation ProcessingPoor Information RetrievalRetrieved or cited information irrelevant to the task
Reasoning ErrorsInformation ProcessingTool Output MisinterpretationMisread or mis-applied a tool's result
Reasoning ErrorsDecision MakingIncorrect Problem IdentificationMisunderstood the overall or local task
Reasoning ErrorsDecision MakingTool Selection ErrorsChose an inappropriate tool for the job
Reasoning ErrorsOutput GenerationFormatting ErrorsProduced malformed code / data or wrong structure
Reasoning ErrorsOutput GenerationInstruction Non-complianceIgnored or deviated from the given instructions
System Execution ErrorsConfigurationTool Definition IssuesTool was mis-declared (e.g. search declared as calculator)
System Execution ErrorsConfigurationEnvironment Setup ErrorsMissing keys, permissions, or other setup problems
System Execution ErrorsAPI IssuesRate LimitingExceeded quota (HTTP 429)
System Execution ErrorsAPI IssuesAuthentication ErrorsInvalid or missing credentials (HTTP 401/403)
System Execution ErrorsAPI IssuesService ErrorsUpstream failure (HTTP 500)
System Execution ErrorsAPI IssuesResource Not FoundEndpoint or asset missing (HTTP 404)
System Execution ErrorsResource ManagementResource ExhaustionRan out of memory / disk / other resources
System Execution ErrorsResource ManagementTimeout IssuesModel timed out during execution
Planning and Coordination ErrorsContext ManagementContext Handling FailuresContext is not retained or used correctly
Planning and Coordination ErrorsContext ManagementResource AbuseA resource is unnecessarily used or called repeatedly
Planning and Coordination ErrorsTask ManagementGoal DeviationOrchestrator deviates from the intended plan
Planning and Coordination ErrorsTask ManagementTask OrchestrationAssignment of tasks to the wrong sub-agents

How to get started

  1. Plug into tracing here:
    Use the @traced decorator from here:
    Patronus Tracing Documentation

    Or import traces in using OpenTelemetry:
    OpenTelemetry Documentation

    Or if you want to get started fast and don't have anything to trace, plug into Colab notebooks here.

  2. Navigate to the Traces tab

  3. Click "Analyze with Percival"

  4. Start reading the summary of Generated Insights and double click into error clustering and prompt optimizations!

Integrations with Agentic Frameworks

Since Percival's trace parser relies on opentelemetry and openinference tracing convention, the following frameworks are supported out of the box:

  1. Smolagents
  2. Pydantic AI
  3. OpenAI Agent SDK
  4. Langchain
  5. CrewAI
  6. Custom OpenAI and Anthropic clients (compatible with OpenAIInstrumentor and AnthropicInstrumentor)

On this page