Chain Evaluations
Evaluation chaining allows you to create sequential pipelines where the results of one evaluation step can be used in subsequent steps. This is particularly useful when you need to:
- Process model outputs through multiple stages
- Make evaluation decisions based on previous results
- Create complex evaluation workflows that depend on earlier outcomes
Basic Chain Configuration
Here's a simple example of how to set up an evaluation chain:
Chain Execution Flow
- Links in the chain are executed sequentially
- Within each link:
- First, the task is executed
- If the task returns
None
, the chain execution stops for this dataset row - If the task returns a result, all evaluators for this link are executed concurrently
- After all evaluators in a link complete, execution moves to the next link
- If any task raises an exception, the chain execution stops for this dataset row
Accessing Previous Results
Tasks and evaluators in the chain can access results from previous links using the parent parameter. Here's an example:
Best Practices
- Early Termination: Use task result None to stop chain execution when further processing would be unnecessary or invalid.
- Result Propagation: Pass relevant information through task results and evaluations to make it available to subsequent chain links.
- Error Handling: Implement retry mechanisms for unreliable operations using the retry helper.