Get Started
Before you can start using the Patronus evaluation framework, you'll need to create an account here.
Additionally, you'll need an API Key. After signing in to the platform, you can generate one here.
Install Patronus Experimentation Framework
To start using Experiments, you'll need to have Python installed on your machine. Follow these steps to install the Patronus library:
- Install Python: Make sure Python 3.11 or higher is installed on your system.
- Install the Patronus Library:
pip install patronus
Writing Your First Evaluation Script
Below is a simple example of how to use Patronus to evaluate a model using a "Hello World" example.
import os
from patronus import Client, simple_task, simple_evaluator
client = Client(
# This is the default and can be omitted
api_key=os.environ.get("PATRONUSAI_API_KEY"),
)
task = simple_task(lambda input: f"{input} World")
exact_match = simple_evaluator(lambda output, gold_answer: output == gold_answer)
client.experiment(
"Tutorial Project",
data=[
{
"evaluated_model_input": "Hello",
"evaluated_model_gold_answer": "Hello World",
},
],
task=task,
evaluators=[exact_match],
)
Explanation of the script
- The
Client
object is initialized with your API key, which is essential for authenticating your requests to the Patronus service. Your evaluation results will be exported directly to the Patronus platform, where you can manage and analyze them centrally.- Additionally, the framework can utilize remote Patronus Evaluators, state-of-the-art models hosted on Patronus infrastructure that can perform complex and difficult evaluations. You can leverage these remote resources, or run evaluations locally, all in a unified workflow.
- Defining the Experiment:
- The first argument is the project name, 'Hello World'. You can replace it with your project name.
data
- a list of samples withevaluated_model_input
andevaluated_model_output
. There are other ways to provide datasets which we will cover later.task
is typically a function that takes inputs (likeevaluated_model_input
in this case) and produces anevaluated_model_output
. Here we wrap a simple lambda function withsimple_task
. Later on we'll explore more complex tasks definitions.evaluators
accepts a list of evaluators used for the experiment. In this case we define a very simpleexact_match
evaluator that comparesevaluated_model_output
andevaluated_model_gold_answer
.
Running the script
Before you run the script, don't forget to provide your API key as an environment variable:
export PATRONUSAI_API_KEY="sk-your_api_key_here"
Now you can simple execute Python file:
python hello_world_evaluation.py
This will run the evaluation experiment and print the results to the console. The framework will evaluate whether the output from the task matches the expected answer.
The output of the script will looks similarly to this:
Preparing dataset... DONE
Preparing evaluators... DONE
============================================
Experiment Tutorial-Project/root-1725904824: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1305.82sample/s]
Summary: FunctionalEvaluator
----------------------------
Count : 1
Pass rate : 1
Mean : 1.0
Min : 1.0
25% : 1.0
50% : 1.0
75% : 1.0
Max : 1.0
Score distribution
Score Range Count Histogram
0.00 - 0.20 0
0.20 - 0.40 0
0.40 - 0.60 0
0.60 - 0.80 0
0.80 - 1.00 1 ####################
You'll also be able to see results of your evaluation in the Platform UI.
Updated 1 day ago