Retries

Patronus Evaluators are automatically retried in case of a failure.

If you want to implement retries for tasks or evaluations that may fail due to exceptions, you can use the built-in retry() helper decorator provided by the framework. Please note that retry() only supports asynchronous functions. You can also implement your own retry mechanism.

from patronus import retry, task, evaluator

# Retry usage for tasks

@task
@retry(max_attempts=3)
async def unreliable_task(evaluated_model_input: str) -> str:
    r = random.random()
    if r < 0.5:
        raise Exception(f"Task random exception; r={r}")
    return f"Hi {evaluated_model_input}"


# Retry usage for evaluators

@evaluator
@retry(max_attempts=3)
async def unreliable_iexact_match(evaluated_model_output: str, evaluated_model_gold_answer: str) -> bool:
    r = random.random()
    if r < 0.5:
        raise Exception(f"Evaluation random exception; r={r}")
    return evaluated_model_output.lower().strip() == evaluated_model_gold_answer.lower().strip()

Enabling Debug Logging

To get more detailed logs and increase verbosity in the Patronus Experimentation Framework, you can use standard Python logging. By configuring Python's logging module, you can capture and display debug-level logs.

Here's an example of how to configure logging:

import logging

formatter = logging.Formatter('[%(levelname)-5s] [%(name)-10s] %(message)s')
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG)
console_handler.setFormatter(formatter)

plog = logging.getLogger("patronus")
plog.setLevel(logging.DEBUG)
plog.propagate = False
plog.addHandler(console_handler)

Change Concurrency Settings

You can control how many concurrent calls to Patronus get made through the max_concurrency setting when creating an experiment. The default max_concurrency is 10. See below for an example:

from patronus import Client

client = Client()

detect_pii = client.remote_evaluator("pii")

client.experiment(
    "Tutorial",
    data=[
        {
            "evaluated_model_input": "Please provide your contact details.",
            "evaluated_model_output": "My email is [email protected] and my phone number is 123-456-7890.",
        },
        {
            "evaluated_model_input": "Share your personal information.",
            "evaluated_model_output": "My name is Jane Doe and I live at 123 Elm Street.",
        },
    ],
    evaluators=[detect_pii],
    experiment_name="Detect PII",
    max_concurrency=2,
)

Logging Individual Evaluation Calls to an Experiment

Although this approach is not recommended, there may be situations where you want to log individual evaluation calls to an experiment. To do so, you will need to create a project and then create an experiment in that project. You will then receive an experiment ID. You can add that experiment ID to evaluator calls and this will populate those results into that experiment, allowing you to use Experiments through the UI afterwards.

import asyncio
from patronus import Client
from patronus.api_types import CreateProjectRequest, CreateExperimentRequest

client = Client()

samples = [
    {
        "evaluated_model_input": "What is the capital of France?",
        "evaluated_model_output": "Paris",
        "evaluated_model_retrieved_context": ["Paris is the capital of France."],
    },
    {
        "evaluated_model_input": "What is the capital of Spain?",
        "evaluated_model_output": "Madrid",
        "evaluated_model_retrieved_context": ["Madrid is the capital of Spain."],
    },
    {
        "evaluated_model_input": "What is the capital of Germany?",
        "evaluated_model_output": "Berlin",
        "evaluated_model_retrieved_context": ["Madrid is the capital of Spain."],
    },
    {
        "evaluated_model_input": "What is the capital of Italy?",
        "evaluated_model_output": "Milan",
        "evaluated_model_retrieved_context": ["Rome is the capital of Italy."],
    },
]

hallucination_evaluator = client.remote_evaluator(evaluator_id_or_alias="lynx")
context_relevance_evaluator = client.remote_evaluator(
    evaluator_id_or_alias="context-relevance"
)


async def create_experiment_and_log_calls(samples):
    project_name = "My Project"  # TODO: Set your project name
    experiment_name = "my-experiment"  # TODO: Set your experiment name

    project = await client.api.create_project(
        CreateProjectRequest(name=project_name)
    )
    experiment = await client.api.create_experiment(
        CreateExperimentRequest(project_id=project.id, name=experiment_name)
    )
    experiment_id = experiment.id

    for sample in samples:
        await asyncio.gather(
            hallucination_evaluator.evaluate(
                evaluated_model_input=sample["evaluated_model_input"],
                evaluated_model_output=sample["evaluated_model_output"],
                evaluated_model_retrieved_context=sample[
                    "evaluated_model_retrieved_context"
                ],
                experiment_id=experiment_id,
                tags={"topic": "capital-cities"},
            ),
            context_relevance_evaluator.evaluate(
                evaluated_model_input=sample["evaluated_model_input"],
                evaluated_model_output=sample["evaluated_model_output"],
                evaluated_model_retrieved_context=sample[
                    "evaluated_model_retrieved_context"
                ],
                experiment_id=experiment_id,
                tags={"topic": "capital-cities"},
            ),
        )


asyncio.run(create_experiment_and_log_calls(samples))