Description

Log Batched Evals

Users have the ability to log batched evals with the POST /v1/evaluation-results/batch endpoint. You can check out the API reference guide here.

Use the Logs and Comparisons features to understand your LLM's performance across all evaluations, whether they are run through Patronus evaluators or not.

All the features available to you through evaluations run through Patronus - like organizing evaluations and experiments into projects, adding tags and filters, specifying raw scores and pass or fail - are available for imported evaluations as well. Below is a sample cURL request showcasing these features.

cURL
curl --location 'https://api.patronus.ai/v1/evaluation-results/batch' \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: ••••••' \
--data '{
    "evaluation_results": [
        {
            "evaluator_id": "my-answer-relevance-evaluator",
            "pass": true,
            "score_raw": 0.92,
            "evaluated_model_input": "Did apples ever inspire a scientist?",
            "evaluated_model_output": "Yes! Sir Isaac Newton observed an apple fall from a tree which sparked his curiosity about why objects fall straight down and not sideways.",
            "tags": {
                "scientist": "Isaac Newton"
            },
            "app": "scientist-knowledge-model"
        },
        {
            "evaluator_id": "my-context-relevance-evaluator",
            "pass": false,
            "score_raw": 0.1,
            "evaluated_model_input": "My cat is my best friend. Do you feel that way about your cat?",
            "evaluated_model_retrieved_context": [
                "Dogs are man'\''s best friend"
            ],
            "tags": {
                "animals": "true"
            }
        }
    ]
}'

Imported evaluation results are distinguished from those generated through the Patronus platform in the following ways:

  • The attribute External is set to True
  • The Evaluator Id is pre-pended with the keyword external:. This is so evaluator names for imported evaluations do not clash with those run through Patronus.

You can see both of these in Logs.

On this page

No Headings