Log Evaluation Results (REST API)

Developers who are not using our Python SDK can import their evaluation results via our API. The endpoint to import evaluation results is POST /v1/evaluation-results/batch and you can see its documentation here. It is a batch API, so you can import multiple evaluation results at the same time.

All the features available to you through evaluations run through Patronus - like organizing evaluations into Apps, adding tags and filters, specifying raw scores and pass or fail - are available for imported evaluations as well. Below is a sample cURL request showcasing these features.

curl --location 'https://api.patronus.ai/v1/evaluation-results/batch' \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: ••••••' \
--data '{
    "evaluation_results": [
        {
            "evaluator_id": "my-answer-relevance-evaluator",
            "pass": true,
            "score_raw": 0.92,
            "evaluated_model_input": "Did apples ever inspire a scientist?",
            "evaluated_model_output": "Yes! Sir Isaac Newton observed an apple fall from a tree which sparked his curiosity about why objects fall straight down and not sideways.",
            "tags": {
                "scientist": "Isaac Newton"
            },
            "app": "scientist-knowledge-model"
        },
        {
            "evaluator_id": "my-context-relevance-evaluator",
            "pass": false,
            "score_raw": 0.1,
            "evaluated_model_input": "My cat is my best friend. Do you feel that way about your cat?",
            "evaluated_model_retrieved_context": [
                "Dogs are man'\''s best friend"
            ],
            "tags": {
                "animals": "true"
            }
        }
    ]
}'

Imported evaluation results are distinguished from those generated through the Patronus platform in the following ways:

The attribute External is set to True
The Evaluator Id is pre-pended with the keyword external:. This is so evaluator names for imported evaluations do not clash with those run through Patronus.

You can see both of these in Logs: Alt text

Log Evaluation Results (REST API)

On this page