The following tutorial describes how to use your own evaluators in logging with client.evalute(...). See the Patronus Evaluators and evaluation API section for how to use Patronus evaluators in logging.

Register an Evaluator

To use any evaluator in Logging, simply wrap the evaluator definition with the register_local_evaluator decorator. Below is an example that invokes a "reverse" evaluator that returns pass/fail scores using a random number generator.

import random
from typing import Optional

import patronus

client = patronus.Client(api_key="YOUR_API_KEY")


@client.register_local_evaluator("reverse")
def my_local_evaluator(
    evaluated_model_system_prompt: Optional[str],
    evaluated_model_input: Optional[str],
    evaluated_model_output: Optional[str],
    **kwargs,
) -> patronus.EvaluationResult:
    v = random.random()
    pass_ = v < 0.66
    if pass_ < 0.33:
        pass_ = None

    return patronus.EvaluationResult(
        pass_=pass_,
        score_raw=v,
        text_output=evaluated_model_output[::-1],
        metadata={
            "system_prompt": evaluated_model_system_prompt and evaluated_model_system_prompt[::-1],
            "input": evaluated_model_input and evaluated_model_input[::-1],
            "output": evaluated_model_output and evaluated_model_output[::-1],
        },
        explanation="An explanation!",
        evaluation_duration_s=random.random(),
        explanation_duration_s=random.random(),
        tags={"env": "local"}
    )


resp = client.evaluate(
    "reverse",
    evaluated_model_system_prompt="You are a helpful assistant.",
    evaluated_model_input="Say Foo",
    evaluated_model_output="Foo!"
)

print(resp.model_dump(by_alias=True))

Running this script, we see the eval result in the Logs dashboard.

Here we provide inputs and outputs inMetadata, but it can also contain fine grained scores produced by the evaluation. The external: prefix indicates that the evaluator was registered locally.

Metadata contains evaluation result information, whereas tags are used for tracking application configurations. See Logs and Monitoring to learn more about tags, filtering and visualizing evals.