Custom Error Taxonomies with Percival

For developers and product teams working with agentic systems, one of the biggest challenges is observability. Teams need to know how and when their agentic systems are failing. Often, these failures are domain-specific — requiring a custom taxonomy of errors.

Patronus provides Percival, an agentic debugger with a flexible error taxonomy. In this walkthrough, we’ll build a simple mock sales Q&A chatbot with a custom error taxonomy. By the end of this walkthrough, you’ll know how to:

Define a custom error taxonomy with Percival
Trace an agent and log to the Patronus platform
Use Percival to catch errors in your taxonomy

Percival Custom Error for Header

0. Define Error Taxonomy

In the Traces tab, select your relevant project. Then navigate to the Taxonomy tab.

From here, you can view the base taxonomy Percival uses to categorize errors. You can extend this by clicking “Define New.”

Base Taxonomy UI

Within a taxonomy you can define:

Errors: specific ways your agent may fail that you intend to catch
Categories: groups of similar errors. These don’t affect how Percival analyzes your trace, but help keep the taxonomy organized (similar to folders in a file system).

When defining an error, provide a short description, similar to how you would describe pass criteria for a judge. You can view examples by expanding the Patronus AI Error Taxonomy.

Let’s define a custom error for our sales Q&A Agent. We wouldn’t want the agent making promises about roadmap or unreleased features, so we’ll add a custom error mode to catch this:

Name: Roadmap Promise
Description: The final output includes product roadmap, unreleased features, makes commitments about future functionality, or promises functionality that does not exist yet.

Define New Error

Once saved, this new error taxonomy is ready to use on real traces!

1. Set Up Mock Agent Environment

Now we can develop our agent. We start by importing the necessary packages.

import random, time
from typing import Dict, Any
 
# --------------------------
# Patronus tracing
# --------------------------
import patronus
from patronus import traced
from patronus.experiments import run_experiment
from patronus.evals import RemoteEvaluator
 
patronus.init(project_name="demo-percival-sales-qa-agent")
log = patronus.get_logger()

We’ll also define a simple one-element golden dataset. Notice that the gold answer for this question does not commit to a specific roadmap date. Instead, it defers to the sales team and avoids roadmap commitments.

gold_dataset = [
    {
        "task_input": "When will SCIM be fully GA and not just beta?",
        "gold_answer": "Answer: SCIM provisioning is currently in beta. Contact Sales for current availability and Support for configuration guidance. Avoid roadmap commitments."
    },
]

2. Build Mock Agent and Run a Trace

Next, we'll build a mock agent with a couple of tools to emulate answering the user's question. Each tool is decorated with the Patronus @traced decorator, and we wrap them in a simple task function.

# mock KB
KB = [
    {
        "id": "DOC-SEC-OVERVIEW",
        "title": "Security Overview",
        "url": "https://docs.example.com/security",
        "content": (
            "Security features include SSO (SAML 2.0), role-based access control, audit logs, "
            "SCIM provisioning (beta, expected to launch Q1 2026), and encryption at rest and in transit."
        ),
    },
]
 
# --------------------------
# Simple Tools
# --------------------------
 
@traced("retrieve_from_kb")
def tool_retrieve_from_kb(query: str) -> Dict[str, Any]:
    """Retrieve the SCIM security doc from knowledge base"""
    time.sleep(random.uniform(0.1, 0.3))  # Simulate KB lookup latency
 
    # Since we only have one doc, just return it
    doc = KB[0]
    return {
        "doc_id": doc["id"],
        "title": doc["title"],
        "url": doc["url"],
        "content": doc["content"]
    }
 
@traced("format_response")
def tool_format_response(answer: str, source_title: str, source_url: str) -> str:
    """Format the final response with answer and source"""
    time.sleep(random.uniform(0.1, 0.3))  # Simulate KB lookup latency
 
    return f"""Answer:
{answer}
 
Source:
- {source_title}: {source_url}"""
 
SYSTEM = """You are a Sales Support / Technical Product Chatbot for reps.
 
Your workflow:
1) retrieve_from_kb(query) - get the SCIM security doc from knowledge base
2) format_response(answer, source_title, source_url) - format the final response
 
Rules:
- Answer user questions about SCIM based on retrieved documentation
- Always cite your sources
- Output only the final formatted response"""
 
@traced("sales-agent-run")
def mock_sales_agent(row, **kwargs) -> str:
    # log prompts in trace
    log.info({
        "system_prompt": SYSTEM,
        "prompt_version": 1,
        "user_question": row.task_input,
    })
    
    # Execute KB retrieval for tracing (creates realistic trace)
    retrieved = tool_retrieve_from_kb(row.task_input)
    
    # Format the output (also traced)
    formatted = tool_format_response(
        answer="SCIM will be GA in Q1 2026 as part of our security roadmap. SCIM provisioning is currently in beta.", # mock response
        source_title=retrieved["title"],
        source_url=retrieved["url"]
    )
 
    return formatted

With our mock agent defined, we can run an experiment and log the trace to the Patronus UI.

# run experiment
run_experiment(
    experiment_name= "demo-percival-sales-qa-agent",
    project_name= "demo-percival-sales-qa-agent",
    dataset=gold_dataset,
    task=mock_sales_agent,
    evaluators=[
        # Use a Patronus-managed evaluator
        RemoteEvaluator("judge", "patronus:fuzzy-match").load(),
    ],
    tags={"model": "simulated", "version": "v1"}
)

Custom error taxonomies let you encode the exact failure modes your team cares about.
Tracing ensures you have visibility into how your agent is behaving in practice.
Percival analysis surfaces errors and provides actionable fixes.
Prompt improvements close the loop, making agents safer and more reliable over time.

Custom Error Taxonomies with Percival

0. Define Error Taxonomy

1. Set Up Mock Agent Environment

2. Build Mock Agent and Run a Trace

3. Get Percival Error Insights

4. Improving Our Agent

Wrap Up

On this page