Evaluate
Requires either **input** or **output** field to be specified. Absence of both leads to an HTTP_422 (Unprocessable Entity) error.
v1
/evaluate
Authorization
X-API-KEY
<token>
In: header
Request Body
application/json
Requiredevaluators
RequiredEvaluators
List of evaluators to evaluate against.
system_prompt
System Prompt
The system prompt provided to the LLM.
task_context
Task Context
Optional context retrieved from vector database. This is a list of strings, with the following restrictions:
- Number of items must be less/equal than 50.
- The sum of tokens in all elements must be less/equal than 120000, using o200k_base tiktoken encoding
task_input
Task Input
The input (prompt) provided to LLM.
task_output
Task Output
LLM's response to the given input.
gold_answer
Gold Answer
Gold answer for given evaluated model input
capture
CaptureOptions
Capture evaluation result based on given option, default is none:
all
captures the result of all evaluations (pass + failed).fails-only
captures the evaluation result when evaluation failed.none
does not capture evaluation result
"all"
Value in: "all" | "fails-only" | "none"
project_id
Project Id
Attach project with given ID to the evaluation.
Note: This parameter is ignored in case project_name or experiment_id is provided.
project_name
Project Name
Attach project with given name to the evaluation. If project with given name doesn't exist, one will be created.
Note: This parameter is ignored in case experiment_id is provided.
Note: This parameter takes precedence over project_id.
app
App
Assigns evaluation results to the app.
app
cannot be used together withexperiment_id
.- If
app
andexperiment_id
is omitted,app
is set automatically to "default" on capture. - Automatically creates an app if it doesn't exist.
- Only relevant for captured results. If will capture the results under given app.
experiment_id
Experiment Id
Assign evaluation results to the experiment.
experiment_id
cannot be used together withapp
.- Only relevant for captured results. If will capture the results under experiment.
dataset_id
Dataset Id
The ID of the dataset from which the evaluated sample originates. This field serves as metadata for the evaluation. This endpoint does not ensure data consistency for this field. There is no guarantee that the dataset with the given ID is present in the Patronus AI platform, as this is a self-reported value.
dataset_sample_id
Dataset Sample Id
The ID of the sample within the dataset. This field serves as metadata for the evaluation. This endpoint does not ensure data consistency for this field. There is no guarantee that the dataset and sample are present in the Patronus AI platform, as this is a self-reported value.
tags
Tags
Tags are key-value pairs used to label resources
confidence_interval_strategy
ConfidenceIntervalStrategies
Create confidence intervals based on one of the following strategies:
- 'none': returns None
- 'full-history': calculates upper boundary, median, and lower boundary of confidence interval based on all available historic records.
- 'generated': calculates upper boundary, median, and lower boundary of confidence interval based on on-flight generated sample of evaluations.
"none"
Value in: "none" | "full-history"
evaluated_model_attachments
Evaluated Model Attachments
Optional list of attachments to be associated with the evaluation sample. This will be added to all evaluation results in this request. Each attachment is a dictionary with the following keys:
url
: URL of the attachment.media_type
: Media type of the attachment (e.g., "image/jpeg", "image/png").usage_type
: Type of the attachment (e.g., "evaluated_model_system_prompt", "evaluated_model_input").
trace_id
Trace Id
span_id
Span Id
log_id
Log Id
Successful Response