Best Practices

Users of our real time monitoring product to support evaluations at scale typically have more stringent requirements around latency and volume. We recommend the following settings to optimize for scalability and reduced latency.

  1. Reduce the number of generated explanations. By default, evaluation results contain natural language explanations. To reduce latency in online workflows, we recommend reducing the number of generated explanations. You can do this in two ways:
    1. Set explain_strategy="never", which does not generate explanations for any evaluation result.
    2. Set explain_strategy="on-fail", which will generate explanations only on failed results.
  2. Use small tier evaluators. Evaluator families support evaluator tiers containing evaluators of different sizes. The small evaluators are the most efficient and optimized for latency sensitive use cases. Examples include custom-small, retrieval-context-relevance-small etc.
  3. Wrap your evaluation API calls in an async function. A python template is provided below:
import asyncio
import requests

def run_evaluation():
    # Your evaluation API call
    result = requests.post(...)
    return result

async def async_evaluation():
    result = await asyncio.to_thread(run_evaluation)
    print(f"Result: {result}")

# Run the async function
asyncio.run(async_evaluation())