Description
API ReferenceEvaluations

Search Evaluation Results

POST
/v1/evaluation-results/search
/v1/evaluation-results/search

The Authorization access token

Authorization

X-API-KEY<token>

In: header

Request Body

application/jsonRequired

appApp

Filter by the application name related to the evaluation results.

project_idProject Id

Filter by the project ID related to the evaluation results.

experiment_idExperiment Id

Filter by the experiment ID related to the evaluation results.

evaluation_run_idEvaluation Run Id

Filter by the evaluation run ID related to the evaluation results.

evaluator_idEvaluator Id

Filter by the ID of the evaluation criterion.

evaluator_familyEvaluator Family

Filter by the evaluator family associated with the evaluation results.

profile_name
Deprecated
Profile Name

Filter by the name of the evaluator profile associated with the evaluation results.

criteriaCriteria

Filter by the name of the evaluator criteria associated with the evaluation results.

after_datetimeAfter Datetime

Filter results to those recorded after this date and time.

before_datetimeBefore Datetime

Filter results to those recorded before this date and time.

after_idAfter Id

Filter results to those with an ID greater than this value.

before_idBefore Id

Filter results to those with an ID less than this value.

explainExplain

Filter results by having explanation.

explain_strategyAny properties in EvaluationExplainStrategies

Filter results by explain strategy.

passPass

Filter results by those which pass or failed the evaluation.

score_raw_minScore Raw Min

score_raw_maxScore Raw Max

tagsTags

Filter by given tags. Tags given in this filter are combined with the and clause. Example: {"model_version": "1.0.0", "next_tag": "1234"} Will return only those evaluation results which have both tags with given values. This is an exact match.

evaluator_profile_public_idEvaluator Profile Public Id

Filter by public id of evaluator profile used in evaluation.

dataset_idDataset Id

Filter by the dataset ID related to the evaluation results.

limitLimit

Maximum number of results to return.

Default: 1000Minimum: 1Maximum: 1000

orderEvaluateResultSearchOrderOptions

Ordering option for the search results.

Default: "-created_at"Value in: "created_at" | "-created_at" | "dataset_sample_id" | "-dataset_sample_id"

favoriteFavorite

evaluation_feedback_statusEvaluation Feedback Status

curl -X POST "https://api.patronus.ai/v1/evaluation-results/search" \
  -H "X-API-KEY: <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "app": "string",
    "project_id": "405d8375-3514-403b-8c43-83ae74cfe0e9",
    "experiment_id": "string",
    "evaluation_run_id": "string",
    "evaluator_id": "string",
    "evaluator_family": "string",
    "profile_name": "string",
    "criteria": "string",
    "after_datetime": "2019-08-24T14:15:22Z",
    "before_datetime": "2019-08-24T14:15:22Z",
    "after_id": 0,
    "before_id": 0,
    "explain": true,
    "explain_strategy": "never",
    "pass": true,
    "score_raw_min": 0,
    "score_raw_max": 0,
    "tags": {
      "property1": "string",
      "property2": "string"
    },
    "evaluator_profile_public_id": "fe6c9202-ffdf-40e1-8f9b-304d0cb5a8db",
    "dataset_id": "string",
    "limit": 1000,
    "order": "created_at",
    "favorite": true,
    "evaluation_feedback_status": "given"
  }'

Successful Response

{
  "results": [
    {
      "id": "string",
      "log_id": "14b5977f-7a80-40ca-bb79-eca6c2abdb34",
      "app": "string",
      "project_id": "405d8375-3514-403b-8c43-83ae74cfe0e9",
      "experiment_id": "string",
      "created_at": "2019-08-24T14:15:22Z",
      "evaluator_id": "string",
      "profile_name": "string",
      "evaluated_model_system_prompt": "string",
      "evaluated_model_retrieved_context": [
        "string"
      ],
      "evaluated_model_input": "string",
      "evaluated_model_output": "string",
      "evaluated_model_gold_answer": "string",
      "evaluated_model_attachments": [
        {
          "url": "string",
          "media_type": "string",
          "usage_type": "string"
        }
      ],
      "explain_strategy": "never",
      "pass": true,
      "score_raw": 0,
      "text_output": "string",
      "additional_info": {
        "positions": [
          null
        ],
        "extra": {},
        "confidence_interval": {
          "strategy": "string",
          "alpha": 0,
          "lower": 0,
          "median": 0,
          "upper": 0
        }
      },
      "evaluation_metadata": {},
      "explanation": "string",
      "evaluation_duration": "string",
      "explanation_duration": "string",
      "evaluation_run_id": 0,
      "evaluator_family": "string",
      "evaluator_profile_public_id": "fe6c9202-ffdf-40e1-8f9b-304d0cb5a8db",
      "evaluated_model_id": "string",
      "evaluated_model_name": "string",
      "evaluated_model_provider": "string",
      "evaluated_model_params": {},
      "evaluated_model_selected_model": "string",
      "dataset_id": "string",
      "dataset_sample_id": 0,
      "tags": {
        "property1": "string",
        "property2": "string"
      },
      "external": true,
      "favorite": true,
      "evaluation_feedback": true,
      "usage_tokens": 0,
      "metric_name": "string",
      "metric_description": "string",
      "evaluation_type": "string",
      "annotation_criteria_id": "e5d4b10b-c239-4b00-9620-5b9e8428bf29"
    }
  ]
}