Concepts

What are evaluations?

Evaluations are the results you see after running evaluators on your model outputs. When you run an evaluator (whether through experiments, logged traffic, or direct API calls), Patronus records each evaluation result and makes it available in the Evaluations tab of the UI.

Evaluation results

Each evaluation result contains detailed information about how your model output was scored:

Core fields

Pass/Fail: Binary result indicating whether the output passed the evaluation criteria
Score: Numerical score (typically 0-1) measuring confidence in the result
Explanation: Human-readable reasoning explaining why the evaluator gave this score

Evaluation context

When you click into an individual evaluation, you can see the complete context:

Evaluator: Which evaluator was used and what criteria it checked
Input: The prompt or question sent to your LLM
Output: The response generated by your LLM
Retrieved Context: Any context documents or information used (for RAG applications)
Gold answer: Reference or expected answer (if provided)
Tags: Custom labels you assigned for filtering and grouping results
Metadata: Additional information about the evaluation (model name, parameters, timing, etc.)

How evaluations get created

Evaluation results appear in the Evaluations tab when you:

Run experiments: All evaluator results from experiments are logged
Log evaluations directly: Use the .evaluate() method to log individual evaluation results
Evaluate production traffic: When using traces with evaluators attached

Next steps

Explore evaluators to understand what runs the evaluations
Understand evaluation explanations