Run an Experiment (Python)
Learn how to run experiments with the Patronus Python SDK
Note: For comprehensive API documentation and more detailed examples, please refer to the Patronus Python SDK documentation.
Before you can start using the Patronus evaluation framework, you'll need to create an account here.
Additionally, you'll need an API Key. After signing in to the platform, you can generate one here.
Install Patronus SDK
To start using Experiments, you'll need to have Python 3.8 or higher installed on your machine. To install the Patronus library:
Write Your First Evaluation Script
Below is a simple example of how to use Patronus to evaluate a model using a "Hello World" example.
Explanation of the script
-
The
my_task
function processes each row in the dataset. It takes arow
parameter (and additional keyword arguments) and returns a string result. -
The
run_experiment()
function brings everything together:dataset
: A list of examples to processtask
: The function that processes each exampleevaluators
: A list of evaluators to assess the outputs (must be structured evaluators or adapted evaluators)project_name
andexperiment_name
: Help organize your experiments in the Patronus platformapi_key
: Can be passed directly to the function if not set via environment variable
Note that unlike other Patronus SDK functions, the experiment framework does not require an explicit patronus.init()
call.
Running the script
Before you run the script, provide your API key as an environment variable:
Now you can execute the Python file:
The output should look similar to this:
You'll also be able to see the results of your evaluation in the Patronus Platform UI through the provided link.
A More Comprehensive Example
Let's create a more realistic example that evaluates a RAG (Retrieval-Augmented Generation) system using OpenAI's API.
First, install the required packages:
This more comprehensive example demonstrates:
-
Creating evaluators in two ways:
- A proper structured evaluator by extending the
StructuredEvaluator
class - A function-based evaluator adapted for use in experiments with
FuncEvaluatorAdapter
- A proper structured evaluator by extending the
-
Using a real RAG task that leverages the OpenAI API
-
Setting up a realistic dataset with context and expected answers
-
Combining custom evaluators with Patronus remote evaluators
-
Adding instrumentation to capture details of OpenAI API calls
-
Exporting and analyzing results using the DataFrame export
Best Practices
When working with the Patronus experimentation framework:
-
Choose the right evaluator type:
- Use function-based evaluators for simple cases (but remember to adapt them for experiments)
- Use structured evaluators for more complex evaluation logic
- Use remote evaluators for sophisticated evaluations without writing code
-
Structure your dataset consistently: Use standard field names like
task_input
,task_context
, andgold_answer
-
Handle edge cases: Make your evaluators robust to missing or unexpected data
-
Add instrumentation: Use integrations like OpenInference to capture detailed traces
-
Tag your experiments: Add metadata tags to help organize and filter your experiments
-
Export results for analysis: Use the DataFrame or CSV export for deeper analysis
Next Steps
Now that you understand the basics of running experiments with Patronus, you can explore:
- Using different types of evaluators
- Working with larger datasets
- Advanced experiment configurations
- Chaining evaluations
For more detailed API documentation and advanced features, please refer to the Patronus Python SDK documentation.