Our docs got a refresh! Check out the new content and improved navigation. For detailed API reference see our Python SDK docs and TypeScript SDK.
Description
Experiments

Chaining evaluations

Creating multi-stage evaluation chains in experiments

In multi-stage evaluation chains, evaluators from one stage can see the results of previous stages. This allows you to build complex evaluation workflows where later stages can make decisions based on earlier evaluation results.

Basic chain structure

To create a multi-stage evaluation chain, use the chain parameter instead of separate task and evaluators parameters:

from patronus.evals import RemoteEvaluator
from patronus.experiments import run_experiment
 
experiment = run_experiment(
    dataset=dataset,
    chain=[
        # First stage
        {
            "task": generate_summary,
            "evaluators": [
                RemoteEvaluator("judge", "conciseness"),
                RemoteEvaluator("judge", "coherence")
            ]
        },
        # Second stage - evaluating based on first stage results
        {
            "task": None,  # No additional processing
            "evaluators": [
                # This evaluator can see previous evaluations
                DependentEvaluator()
            ]
        }
    ]
)

Accessing previous evaluation results

Evaluators in later stages can access the results from previous stages using the parent parameter:

from patronus import evaluator
from patronus.evals import EvaluationResult
 
@evaluator()
def final_aggregate_evaluator(row, task_result, parent, **kwargs):
    # Check if we have previous evaluation results
    if not parent or not parent.evals:
        return None
 
    # Access evaluations from previous stage
    conciseness = parent.evals.get("judge:conciseness")
    coherence = parent.evals.get("judge:coherence")
 
    # Use the previous results
    avg_score = ((conciseness.score or 0) + (coherence.score or 0)) / 2
    return EvaluationResult(score=avg_score, pass_=avg_score > 0.7)

Using parent information

The parent parameter provides access to:

  • parent.task - The task result from the previous stage
  • parent.evals - A dictionary of evaluation results from the previous stage, keyed by evaluator name

This allows you to create sophisticated evaluation workflows where later stages can:

  • Aggregate scores from multiple evaluators
  • Make decisions based on whether previous evaluations passed
  • Access metadata from previous task executions
  • Build conditional evaluation logic

Best practices

When working with evaluation chains:

  • Keep chains focused: Each stage should have a clear purpose
  • Handle missing data: Always check if parent and parent.evals exist before accessing them
  • Use meaningful names: Give evaluators descriptive names so they're easy to reference in later stages
  • Document dependencies: Make it clear which evaluators depend on results from previous stages

On this page