Running an Experiment (Python)

This page covers how to set up and run experiments using the Patronus Python SDK.

Setting up an experiment

The run_experiment function

The main entry point for the framework is the run_experiment() function:

from patronus.experiments import run_experiment
 
experiment = run_experiment(
    dataset=my_dataset,               # Required: What to evaluate
    task=my_task_function,            # Optional: How to process inputs
    evaluators=[my_evaluator],        # Required: How to assess outputs
    tags={"dataset-version": "v1.0"}, # Optional: Tags for the experiment
    max_concurrency=10,               # Optional: Control parallel execution
    project_name="My Project",        # Optional: Override the global project name
    experiment_name="Test Run"        # Optional: Name this experiment run
)

Creating a simple experiment

Let's walk through a complete example:

from patronus.evals import RemoteEvaluator
from patronus.experiments import run_experiment
 
dataset = [
    {
        "task_input": "What is the capital of France?",
        "gold_answer": "Paris"
    },
    {
        "task_input": "Who wrote Romeo and Juliet?",
        "gold_answer": "William Shakespeare"
    }
]
 
# Define a task (in a real scenario, this would call an LLM)
def answer_question(row, **kwargs):
    if "France" in row.task_input:
        return "The capital of France is Paris."
    return "I don't know the answer to that question."
 
run_experiment(
    dataset=dataset,
    task=answer_question,
    evaluators=[
        # Use a Patronus-managed evaluator
        RemoteEvaluator("judge", "patronus:fuzzy-match"),
    ],
    tags={"model": "simulated", "version": "v1"}
)

Experiment execution flow

When you call run_experiment(), the framework follows these steps:

Preparation: Initializes the experiment context and prepares the dataset
Processing: For each dataset row:
- Runs the task function if provided
- Passes the task output to the evaluators
- Collects evaluation results
Reporting: Generates a summary of evaluation results
Return: Returns an Experiment object with the complete results

Synchronous vs. asynchronous execution

The run_experiment() function detects whether it's being called from an async context:

In a synchronous context, it will block until the experiment completes
In an async context, it returns an awaitable that can be awaited

# Synchronous usage:
experiment = run_experiment(dataset, task, evaluators)
 
# Asynchronous usage:
experiment = await run_experiment(dataset, task, evaluators)