Run an Experiment (Python)
Before you can start using the Patronus evaluation framework, you'll need to create an account here.
Additionally, you'll need an API Key. After signing in to the platform, you can generate one here.
Install Patronus Experimentation Framework
To start using Experiments, you'll need to have Python 3.9 or higher installed on your machine. To install the Patronus library:
Write Your First Evaluation Script
Below is a simple example of how to use Patronus to evaluate a model using a "Hello World" example.
Explanation of the script
- The
Client
object is initialized with your API key, which is essential for authenticating your requests to the Patronus service. Your evaluation results will be exported directly to the Patronus platform, where you can manage and analyze them centrally.- Additionally, the framework can utilize remote Patronus Evaluators, state-of-the-art models hosted on Patronus infrastructure that can perform complex and difficult evaluations. You can leverage these remote resources, or run evaluations locally, all in a unified workflow.
- Defining the Experiment:
- The first argument is the project name, 'Hello World'. You can replace it with your project name.
dataset
- a list of samples withevaluated_model_input
andevaluated_model_output
. There are other ways to provide datasets which we will cover later.task
is typically a function that takes inputs (likeevaluated_model_input
in this case) and produces anevaluated_model_output
.evaluators
accepts a list of evaluators used for the experiment. In this case we define a very simpleexact_match
evaluator that comparesevaluated_model_output
andevaluated_model_gold_answer
.
Running the script
Before you run the script, don't forget to provide your API key as an environment variable:
Now you can simple execute Python file:
This will run the evaluation experiment and print the results to the console. The framework will evaluate whether the output from the task matches the expected answer.
The output of the script should look similar to this:
You'll also be able to see results of your evaluation in the Platform UI.