Before you can start using the Patronus evaluation framework, you'll need to create an account here.

Additionally, you'll need an API Key. After signing in to the platform, you can generate one here.

Install Patronus Experimentation Framework

To start using Experiments, you'll need to have Python 3.11 or higher installed on your machine. To install the Patronus library:

pip install patronus

Write Your First Evaluation Script

Below is a simple example of how to use Patronus to evaluate a model using a "Hello World" example.

import os
from patronus import Client, simple_task, simple_evaluator

client = Client(
    # This is the default and can be omitted
    api_key=os.environ.get("PATRONUSAI_API_KEY"),
)

task = simple_task(lambda input: f"{input} World")

exact_match = simple_evaluator(lambda output, gold_answer: output == gold_answer)

client.experiment(
    "Tutorial Project",
    data=[
        {
            "evaluated_model_input": "Hello",
            "evaluated_model_gold_answer": "Hello World",
        },
    ],
    task=task,
    evaluators=[exact_match],
)

Explanation of the script

The Client object is initialized with your API key, which is essential for authenticating your requests to the Patronus service. Your evaluation results will be exported directly to the Patronus platform, where you can manage and analyze them centrally.
- Additionally, the framework can utilize remote Patronus Evaluators, state-of-the-art models hosted on Patronus infrastructure that can perform complex and difficult evaluations. You can leverage these remote resources, or run evaluations locally, all in a unified workflow.
Defining the Experiment:
- The first argument is the project name, 'Hello World'. You can replace it with your project name.
- data - a list of samples with evaluated_model_input and evaluated_model_output. There are other ways to provide datasets which we will cover later.
- task is typically a function that takes inputs (like evaluated_model_input in this case) and produces anevaluated_model_output. Here we wrap a simple lambda function with simple_task. Later on we'll explore more complex tasks definitions.
- evaluators accepts a list of evaluators used for the experiment. In this case we define a very simple exact_match evaluator that compares evaluated_model_output and evaluated_model_gold_answer.

Running the script

Before you run the script, don't forget to provide your API key as an environment variable:

export PATRONUSAI_API_KEY="sk-your_api_key_here"

Now you can simple execute Python file:

python hello_world_evaluation.py

This will run the evaluation experiment and print the results to the console. The framework will evaluate whether the output from the task matches the expected answer.

The output of the script will looks similarly to this:

Preparing dataset... DONE
Preparing evaluators... DONE
============================================
Experiment  Tutorial-Project/root-1725904824: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1305.82sample/s]

Summary: FunctionalEvaluator
----------------------------
Count     : 1
Pass rate : 1
Mean      : 1.0
Min       : 1.0
25%       : 1.0
50%       : 1.0
75%       : 1.0
Max       : 1.0

Score distribution
Score Range          Count      Histogram
0.00 - 0.20          0          
0.20 - 0.40          0          
0.40 - 0.60          0          
0.60 - 0.80          0          
0.80 - 1.00          1          ####################

You'll also be able to see results of your evaluation in the Platform UI.