Best Practices
Users of our real time monitoring product to support evaluations at scale typically have more stringent requirements around latency and volume. We recommend the following settings to optimize for scalability and reduced latency.
- Reduce the number of generated explanations. By default, evaluation results contain natural language explanations. To reduce latency in online workflows, we recommend reducing the number of generated explanations. You can do this in two ways:
- Set
explain_strategy="never"
, which does not generate explanations for any evaluation result. - Set
explain_strategy="on-fail"
, which will generate explanations only on failed results.
- Set
- Use small tier evaluators. Evaluator families support evaluator tiers containing evaluators of different sizes. The small evaluators are the most efficient and optimized for latency sensitive use cases. Examples include
custom-small
,retrieval-context-relevance-small
etc. - Wrap your evaluation API calls in an async function. A python template is provided below:
import asyncio
import requests
def run_evaluation():
# Your evaluation API call
result = requests.post(...)
return result
async def async_evaluation():
result = await asyncio.to_thread(run_evaluation)
print(f"Result: {result}")
# Run the async function
asyncio.run(async_evaluation())
Updated 24 days ago