Getting Started
Estimated Time: 4 minutes
Before you can start using the Patronus evaluation framework, you'll need to create an account here.
Additionally, you'll need an API Key. After signing in to the platform, you can generate one here.
Install Patronus Experimentation Framework
To start using Experiments, you'll need to have Python 3.9 or higher installed on your machine. To install the Patronus library:
pip install patronus
Write Your First Evaluation Script
Below is a simple example of how to use Patronus to evaluate a model using a "Hello World" example.
import os
from patronus import Client, task, evaluator, Row, TaskResult
client = Client(
# This is the default and can be omitted
api_key=os.environ.get("PATRONUS_API_KEY"),
)
@task
def my_task(row: Row):
return f"{row.evaluated_model_input} World"
@evaluator
def exact_match(row: Row, task_result: TaskResult):
return task_result.evaluated_model_output == row.evaluated_model_gold_answer
client.experiment(
"Tutorial Project",
experiment_name="Hello World Experiment",
dataset=[
{
"evaluated_model_input": "Hello",
"evaluated_model_gold_answer": "Hello World",
},
],
task=my_task,
evaluators=[exact_match],
)
Explanation of the script
- The
Client
object is initialized with your API key, which is essential for authenticating your requests to the Patronus service. Your evaluation results will be exported directly to the Patronus platform, where you can manage and analyze them centrally.- Additionally, the framework can utilize remote Patronus Evaluators, state-of-the-art models hosted on Patronus infrastructure that can perform complex and difficult evaluations. You can leverage these remote resources, or run evaluations locally, all in a unified workflow.
- Defining the Experiment:
- The first argument is the project name, 'Hello World'. You can replace it with your project name.
dataset
- a list of samples withevaluated_model_input
andevaluated_model_output
. There are other ways to provide datasets which we will cover later.task
is typically a function that takes inputs (likeevaluated_model_input
in this case) and produces anevaluated_model_output
.evaluators
accepts a list of evaluators used for the experiment. In this case we define a very simpleexact_match
evaluator that comparesevaluated_model_output
andevaluated_model_gold_answer
.
Running the script
Before you run the script, don't forget to provide your API key as an environment variable:
export PATRONUS_API_KEY="sk-your_api_key_here"
Now you can simple execute Python file:
python hello_world_evaluation.py
This will run the evaluation experiment and print the results to the console. The framework will evaluate whether the output from the task matches the expected answer.
The output of the script will looks similarly to this:
Preparing dataset... DONE
Preparing evaluators... DONE
============================================
Experiment Tutorial-Project/root-1729600247: 100%|██████████| 1/1 [00:00<00:00, 507.91sample/s]
Summary: exact_match
--------------------
Count : 1
Pass rate : 1
Mean : 1.0
Min : 1.0
25% : 1.0
50% : 1.0
75% : 1.0
Max : 1.0
Score distribution
Score Range Count Histogram
0.00 - 0.20 0
0.20 - 0.40 0
0.40 - 0.60 0
0.60 - 0.80 0
0.80 - 1.00 1 ####################
https://app.patronus.ai/experiments/111247673728424740
You'll also be able to see results of your evaluation in the Platform UI.
Updated 9 days ago