Our docs got a refresh! Check out the new content and improved navigation. For detailed API reference see our Python SDK docs and TypeScript SDK.
Description
Research and DifferentiatorsLynx

Log Evaluation Results (REST API)

Developers who are not using our Python SDK can import their evaluation results via our API. The endpoint to import evaluation results is POST /v1/evaluation-results/batch and you can see its documentation here. It is a batch API, so you can import multiple evaluation results at the same time.

All the features available to you through evaluations run through Patronus - like organizing evaluations into Apps, adding tags and filters, specifying raw scores and pass or fail - are available for imported evaluations as well. Below is a sample cURL request showcasing these features.

curl --location 'https://api.patronus.ai/v1/evaluation-results/batch' \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: ••••••' \
--data '{
    "evaluation_results": [
        {
            "evaluator_id": "my-answer-relevance-evaluator",
            "pass": true,
            "score_raw": 0.92,
            "evaluated_model_input": "Did apples ever inspire a scientist?",
            "evaluated_model_output": "Yes! Sir Isaac Newton observed an apple fall from a tree which sparked his curiosity about why objects fall straight down and not sideways.",
            "tags": {
                "scientist": "Isaac Newton"
            },
            "app": "scientist-knowledge-model"
        },
        {
            "evaluator_id": "my-context-relevance-evaluator",
            "pass": false,
            "score_raw": 0.1,
            "evaluated_model_input": "My cat is my best friend. Do you feel that way about your cat?",
            "evaluated_model_retrieved_context": [
                "Dogs are man'\''s best friend"
            ],
            "tags": {
                "animals": "true"
            }
        }
    ]
}'

Imported evaluation results are distinguished from those generated through the Patronus platform in the following ways:

  • The attribute External is set to True
  • The Evaluator Id is pre-pended with the keyword external:. This is so evaluator names for imported evaluations do not clash with those run through Patronus.

You can see both of these in Logs: Alt text

On this page

No Headings