Optimizing compute time for evaluations
Reduce latency and improve performance for real-time evaluation workflows
Users of our real-time monitoring product supporting evaluations at scale typically have more stringent requirements around latency and volume. We recommend the following strategies to optimize for scalability and reduced latency.
1. Use small tier evaluators
Evaluator families support evaluator tiers containing evaluators of different sizes. The small evaluators are the most efficient and optimized for latency-sensitive use cases.
Examples: lynx-small, custom-small, retrieval-context-relevance-small
Using small tier evaluators can significantly reduce response times while maintaining high accuracy for most use cases.
2. Reduce explanation generation
By default, evaluation results contain natural language explanations. To reduce latency in online workflows, we recommend limiting when explanations are generated using the explain_strategy parameter.
Available options:
"never"- No explanations are generated (fastest)"on-fail"- Only generate explanations for failed evaluations (recommended balance)"on-success"- Only generate explanations for passed evaluations"always"- Generate explanations for all evaluations (slowest)
Example
3. Use async evaluation calls
Wrap your evaluation API calls in an async function to handle multiple evaluations concurrently and improve throughput.
Example
This pattern allows you to run multiple evaluations concurrently without blocking, which is especially useful when processing batches of evaluations or handling high-volume production traffic.
