Evaluations
Concepts
Understanding evaluations in Patronus AI
What are evaluations?
Evaluations are the results you see after running evaluators on your model outputs. When you run an evaluator (whether through experiments, logged traffic, or direct API calls), Patronus records each evaluation result and makes it available in the Evaluations tab of the UI.
Evaluation results
Each evaluation result contains detailed information about how your model output was scored:
Core fields
- Pass/Fail: Binary result indicating whether the output passed the evaluation criteria
- Score: Numerical score (typically 0-1) measuring confidence in the result
- Explanation: Human-readable reasoning explaining why the evaluator gave this score
Evaluation context
When you click into an individual evaluation, you can see the complete context:
- Evaluator: Which evaluator was used and what criteria it checked
- Input: The prompt or question sent to your LLM
- Output: The response generated by your LLM
- Retrieved Context: Any context documents or information used (for RAG applications)
- Gold answer: Reference or expected answer (if provided)
- Tags: Custom labels you assigned for filtering and grouping results
- Metadata: Additional information about the evaluation (model name, parameters, timing, etc.)
How evaluations get created
Evaluation results appear in the Evaluations tab when you:
- Run experiments: All evaluator results from experiments are logged
- Log evaluations directly: Use the
.evaluate()method to log individual evaluation results - Evaluate production traffic: When using traces with evaluators attached
Next steps
- Explore evaluators to understand what runs the evaluations
- Understand evaluation explanations
