Custom Evaluators

Custom Evaluators are one of the most powerful features of Patronus. They allow you to define your own evaluation criteria based off of whatever use case you are working with. For instance, you could check that every output from your LLM contains a snippet of code or that it is always written in simple English that a fifth-grade student could understand. The possibilities are endless. All we need from you a simple explanation of what you are evaluating for and we should be set.

To start, Patronus provides a series of simple custom evaluators that we have seen a lot of our customers using. These are good starting points and help you understand how to define the criteria you are evaluating against. You must add the prefix system: to these since they are managed by Patronus.

The custom evaluator scores text on whether it is passing the user-defined requirement. A score of 0 indicates that the output fails the requirement, whereas a score of 1 indicates that output passes.

You can call one of these pre-existing custom evaluators with the following parameters:

{
    "evaluators": [
        {
            "evaluator": "custom",
            "profile_name": "system:is-concise",
            "explain_strategy": "never"
        }
    ],
    "evaluated_model_input": "Tell me a bit more about Company A.",
    "evaluated_model_output": "Company A builds the leading platform for database management software, called A-DB. Company A has won many awards for their innovative work here, including the '2023 Innovative Company of the year' award for their groundbreaking features. Sign up to get access to A-DB, and accelerate your engineering teams today!",
    "capture": "all",
    "app": "default",
    "tags": {
        "owner": "team-name"
    }
}

The expected output is:

{
    "results": [
        {
            "evaluation_result": {
                "id": "112824560708980266",
                "app": "default",
                "created_at": "2024-08-08T19:06:49.726901Z",
                "evaluator_id": "custom-large-2024-05-16",
                "profile_name": "system:is-concise",
                "input": "Tell me a bit more about Company A.",
                "evaluated_model_input": "Tell me a bit more about Company A.",
                "evaluated_model_output": "Company A builds the leading platform for database management software, called A-DB. Company A has won many awards for their innovative work here, including the '2023 Innovative Company of the year' award for their groundbreaking features. Sign up to get access to A-DB, and accelerate your engineering teams today!",
                "evaluated_model_gold_answer": null,
                "explain": false,
                "explain_strategy": "never",
                "pass": false,
                "score_raw": 0.0,
                "additional_info": {
                    "score_raw": 0.0,
                    "positions": null,
                    "extra": null,
                    "confidence_interval": null
                },
                "explanation": null,
                "evaluation_duration": "PT2.376S",
                "explanation_duration": null,
                "evaluator_family": "custom",
                "evaluator_profile_public_id": "42b26216-a359-430e-b60a-49176599cbb2",
                "tags": {
                    "owner": "team-name"
                },
                "external": false,
                "criterion_id": "custom-large-2024-05-16"
            },
        }
    ]
}

You can of course create your own profiles. The next section provides information about how to do that.


What’s Next