Our docs got a refresh! Check out the new content and improved navigation. For detailed API reference see our Python SDK docs and TypeScript SDK.
Description
Experiments

Running an Experiment (Python)

How to set up and run experiments using the Patronus Python SDK

This page covers how to set up and run experiments using the Patronus Python SDK.

Setting up an experiment

The run_experiment function

The main entry point for the framework is the run_experiment() function:

from patronus.experiments import run_experiment
 
experiment = run_experiment(
    dataset=my_dataset,               # Required: What to evaluate
    task=my_task_function,            # Optional: How to process inputs
    evaluators=[my_evaluator],        # Required: How to assess outputs
    tags={"dataset-version": "v1.0"}, # Optional: Tags for the experiment
    max_concurrency=10,               # Optional: Control parallel execution
    project_name="My Project",        # Optional: Override the global project name
    experiment_name="Test Run"        # Optional: Name this experiment run
)

Creating a simple experiment

Let's walk through a complete example:

from patronus.evals import RemoteEvaluator
from patronus.experiments import run_experiment
 
dataset = [
    {
        "task_input": "What is the capital of France?",
        "gold_answer": "Paris"
    },
    {
        "task_input": "Who wrote Romeo and Juliet?",
        "gold_answer": "William Shakespeare"
    }
]
 
# Define a task (in a real scenario, this would call an LLM)
def answer_question(row, **kwargs):
    if "France" in row.task_input:
        return "The capital of France is Paris."
    return "I don't know the answer to that question."
 
run_experiment(
    dataset=dataset,
    task=answer_question,
    evaluators=[
        # Use a Patronus-managed evaluator
        RemoteEvaluator("judge", "patronus:fuzzy-match"),
    ],
    tags={"model": "simulated", "version": "v1"}
)

Experiment execution flow

When you call run_experiment(), the framework follows these steps:

  1. Preparation: Initializes the experiment context and prepares the dataset
  2. Processing: For each dataset row:
    • Runs the task function if provided
    • Passes the task output to the evaluators
    • Collects evaluation results
  3. Reporting: Generates a summary of evaluation results
  4. Return: Returns an Experiment object with the complete results

Synchronous vs. asynchronous execution

The run_experiment() function detects whether it's being called from an async context:

  • In a synchronous context, it will block until the experiment completes
  • In an async context, it returns an awaitable that can be awaited
# Synchronous usage:
experiment = run_experiment(dataset, task, evaluators)
 
# Asynchronous usage:
experiment = await run_experiment(dataset, task, evaluators)

On this page