Experiments
Chaining evaluations
Creating multi-stage evaluation chains in experiments
In multi-stage evaluation chains, evaluators from one stage can see the results of previous stages. This allows you to build complex evaluation workflows where later stages can make decisions based on earlier evaluation results.
Basic chain structure
To create a multi-stage evaluation chain, use the chain parameter instead of separate task and evaluators parameters:
Accessing previous evaluation results
Evaluators in later stages can access the results from previous stages using the parent parameter:
Using parent information
The parent parameter provides access to:
parent.task- The task result from the previous stageparent.evals- A dictionary of evaluation results from the previous stage, keyed by evaluator name
This allows you to create sophisticated evaluation workflows where later stages can:
- Aggregate scores from multiple evaluators
- Make decisions based on whether previous evaluations passed
- Access metadata from previous task executions
- Build conditional evaluation logic
Best practices
When working with evaluation chains:
- Keep chains focused: Each stage should have a clear purpose
- Handle missing data: Always check if
parentandparent.evalsexist before accessing them - Use meaningful names: Give evaluators descriptive names so they're easy to reference in later stages
- Document dependencies: Make it clear which evaluators depend on results from previous stages
