Chain Evaluations

Evaluation chaining allows you to create sequential pipelines where the results of one evaluation step can be used in subsequent steps. This is particularly useful when you need to:

Process model outputs through multiple stages
Make evaluation decisions based on previous results
Create complex evaluation workflows that depend on earlier outcomes

Basic Chain Configuration

Here's a simple example of how to set up an evaluation chain:

Python

from patronus import Client
 
client = Client()
 
client.experiment(
    "Tutorial Project",
    dataset=dataset,
    chain=[
        {"task": agent_sql_generator, "evaluators": [eval_sql_syntax, detect_sql_injection]},
        {"task": agent_sql_executor, "evaluators": [eval_output_correctness]},
    ]
)

Chain Execution Flow

Links in the chain are executed sequentially
Within each link:
1. First, the task is executed
2. If the task returns None, the chain execution stops for this dataset row
3. If the task returns a result, all evaluators for this link are executed concurrently
After all evaluators in a link complete, execution moves to the next link
If any task raises an exception, the chain execution stops for this dataset row

Accessing Previous Results

Tasks and evaluators in the chain can access results from previous links using the parent parameter. Here's an example: