Quickstart Guide
Getting Started
We are excited for you to try out the Patronus AI product!
To get started, you will need to create an account. You can go to app.patronus.ai and sign up by either entering an email/password combination or by using your Google account. If you opt for the first option, you will need to verify your email in order to access Patronus.
Once you've successfully created an account, you will be taken to the "Evaluator Profiles" page. This page lists all the different evaluators that you have access to by default on Patronus. They're a great starting point, but you're not limited to these. You'll soon learn how to create a variety of new evaluators to suit your exact needs!
Now before going any further, it's important to note that the web app is only part of Patronus' offering. It works in conjunction with our API and you should use a combination of both to get the most value from the platform.
Our API gives you the flexibility to perform evaluations anywhere, anytime while the web app allows you to easily manage which evaluators you want to run and view the results from your API calls.
To use our API, you'll need an API key. We get to that later but if you're feeling impatient then you can jump there directly and start pinging our API.
Running a Demo Script
Let's take a look at a simple Python script to better understand how to effectively use Patronus. Go ahead and copy the code below to your environment. We'll walk through how to execute it and exactly what it's doing.
import requests
API_KEY = "<YOUR API KEY HERE>"
samples = [
{
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_output": "You can deposit a check by either going to your nearest bank branch, an ATM, or through your bank's mobile app.",
},
{
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_output": "The only way to deposit your check by going to your bank in person.",
},
{
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_output": "The Dodo bird was last seen in 1662. It is now extinct and likely not at your local bank.",
},
]
headers = {
"Content-Type": "application/json",
"X-API-KEY": API_KEY,
}
for i, sample in enumerate(samples):
data = {
"evaluators": [
{"evaluator": "retrieval-hallucination"},
{"evaluator": "retrieval-answer-relevance"},
{
"evaluator": "custom",
"profile_name": "system:is-concise",
},
],
"evaluated_model_input": sample["evaluated_model_input"],
"evaluated_model_retrieved_context": sample[
"evaluated_model_retrieved_context"
],
"evaluated_model_output": sample["evaluated_model_output"],
"app": "demo_banking_chat_assistant",
"tags": {
"project": "demo",
"model_name": "chat_assistant",
"topic": "banking",
},
}
response = requests.post(
"https://api.patronus.ai/v1/evaluate", headers=headers, json=data
)
if response.status_code != 200:
print(response.text)
response.raise_for_status()
results = response.json().get("results")
print("------------------------------------")
print(f"Evalauted Model Input: {sample['evaluated_model_input']}")
print(
f"Evaluated Model Retrieved Context: {sample['evaluated_model_retrieved_context']}"
)
print(f"Evaluated Model Output: {sample['evaluated_model_output']}")
print("------------------------------------")
for result in results:
evaluation_result = result.get("evaluation_result")
evaluator_id = evaluation_result.get("evaluator_id")
profile_name = evaluation_result.get("profile_name") or None
passed = bool(evaluation_result["pass"])
print(f"{evaluator_id + (" " + profile_name) if profile_name else ""}: {'PASS' if passed else 'FAIL'}")
print("------------------------------------")
Script Overview
In its current form, this script is not runnable. We're missing an API key. We'll get to that in a sec but first let's walk through what the script is doing. At a high level, the script is going through the following steps:
- We are looping through a list of samples. There are exactly 3 samples in the list. These are dictionaries with the keys
evaluated_model_input
,evaluated_model_output
, andevaluated_model_retrieved_context
. Think of these as the relevant pieces of information that Patronus needs to know about a specific LLM's response to evaluate its correctness across a variety of evaluation dimensions. The input would be the prompt, the retrieved context would be additional information fed into the model's input to ground its answer, and the output is the answer it sends back. - For each sample, we call Patronus' evaluation endpoint at
https://api.patronus.ai/v1/evaluate
. This is an HTTPS request and the necessary information is fed in as a JSON data packet. That packet is a dictionary and includes the pieces of the LLM's response discussed above. We also need to tell that API what we want to evaluate for. That comes through the evaluators list. In this situation, we are calling three pre-defined Patronus evaluators calledretrieval-hallucination
,retrieval-answer-relevance
, andcustom
with a profile namesystem:is-concise
. What do these mean though?retrieval-hallucination
is an evaluator that checks if a model's output is grounded in the context that was retrieved to answer the prompt. If it's not, it's called a hallucination. Hallucinations are a common mistake that LLMs used with a retrieval system make. You can read up more about those here.retrieval-answer-relevance
is an evaluator that checks whether the answer provided by an LLM answers the prompt it was sent. That's pretty straightforward.custom
evaluators are a Patronus-specific concept. It allows users to define what they want an evaluator to check for. You give us a description of what an evaluator needs to check and we will spin up an evaluator that checks for that. In this instance, we already have a profile for this custom evaluatorsystem:is-concise
. You can see what the definition is here. Notice that you cannot modify this profile's definition and that it includessystem:
as a prefix. That's because this profile is actually managed by Patronus and checks for concise outputs from LLMs.
- The last step is to check that we got a valid response from our query and printed out those results. We'll talk about the format of responses and how they are returned in a sec.
Getting an API Key
With that, we're now ready to modify the script and actually run it to look at some results. First, we need an API_KEY
to authenticate our API request. We can get that from the web app. Navigate to the Patronus web app, click on your account name in the top left, navigate to API Keys and click the button in the top right labeled CREATE KEY to create an API key.
Finally Executing the Script
Now that you've generate a new API key, replace the placeholder string in the script and run the code! You'll need to have the Python requests library installed to run the code. It'll take a few seconds to run and print out the results. Let's look at what you can expect back below.
------------------------------------
Evalauted Model Input: How do I deposit a check at the bank?
Evaluated Model Retrieved Context: ['To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.', "You can also deposit a check at an ATM or through your bank's mobile app.", 'Remember to sign the back of the check before depositing it.']
Evaluated Model Output: You can deposit a check by either going to your nearest bank branch, an ATM, or through your bank's mobile app.
------------------------------------
retrieval-hallucination-large-2024-07-23 system:detect-rag-hallucination: PASS
------------------------------------
retrieval-answer-relevance-large-2024-07-23 system:detect-rag-irrelevant-answers: PASS
------------------------------------
custom-large-2024-08-08 system:is-concise: PASS
------------------------------------
------------------------------------
Evalauted Model Input: How do I deposit a check at the bank?
Evaluated Model Retrieved Context: ['To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.', "You can also deposit a check at an ATM or through your bank's mobile app.", 'Remember to sign the back of the check before depositing it.']
Evaluated Model Output: The only way to deposit your check by going to your bank in person.
------------------------------------
retrieval-hallucination-large-2024-07-23 system:detect-rag-hallucination: FAIL
------------------------------------
retrieval-answer-relevance-large-2024-07-23 system:detect-rag-irrelevant-answers: FAIL
------------------------------------
custom-large-2024-08-08 system:is-concise: PASS
------------------------------------
------------------------------------
Evalauted Model Input: How do I deposit a check at the bank?
Evaluated Model Retrieved Context: ['To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.', "You can also deposit a check at an ATM or through your bank's mobile app.", 'Remember to sign the back of the check before depositing it.']
Evaluated Model Output: The Dodo bird was last seen in 1662. It is now extinct and likely not at your local bank.
------------------------------------
retrieval-hallucination-large-2024-07-23 system:detect-rag-hallucination: FAIL
------------------------------------
retrieval-answer-relevance-large-2024-07-23 system:detect-rag-irrelevant-answers: FAIL
------------------------------------
custom-large-2024-08-08 system:is-concise: FAIL
------------------------------------
We called our evaluators on the 3 samples with 3 evaluators running on each sample for a total of 9 evaluation results. The script runs synchronously currently for each sample (one sample after the other) although the evaluators themselves are run concurrently on the backend side. We'll show you how you can make use of asynchronous workers later on for batch evaluation runs. You can dive into the exact results and see why certain results are PASS
and others are FAIL
. For instance, the Dodo bird answer doesn't have much to do with depositing a check at a bank... A bit of a trivial example but it gets the point across.
Exact Response Format
Now let's actually dissect the entire response that you get for a single API request. This is what you'll see after calling response.json().get("results")
on a specific request response. Note that what comes back from the API will depend on the exact evaluator you end up calling and what you ask from it. In general though, these responses tend to follow the same schema.
[
{
"evaluator_id": "retrieval-hallucination-large-2024-07-23",
"profile_name": "system:detect-rag-hallucination",
"status": "success",
"error_message": None,
...
"evaluation_result": {
...
"id": None,
"app": "demo_banking_chat_assistant",
"created_at": None,
"evaluator_id": "retrieval-hallucination-large-2024-07-23",
"profile_name": "system:detect-rag-hallucination",
"evaluated_model_system_prompt": None,
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_output": "You can deposit a check by either going to your nearest bank branch, an ATM, or through your bank's mobile app.",
"evaluated_model_gold_answer": None,
"explain_strategy": "never",
"pass": True,
"score_raw": 1.0,
"additional_info": {
...
"positions": None,
"extra": None,
"confidence_interval": None,
},
"explanation": None,
"evaluation_duration": "PT3.095S",
"explanation_duration": None,
"evaluation_run_project_id": None,
"evaluation_run_id": None,
"evaluator_family": "retrieval-hallucination",
"evaluator_profile_public_id": "0bcf9064-1a60-4263-9c9d-fdeb2e618671",
"evaluated_model_id": None,
"evaluated_model_name": None,
"evaluated_model_provider": None,
"evaluated_model_params": None,
"evaluated_model_selected_model": None,
"dataset_id": None,
"dataset_sample_id": None,
"tags": {
"project": "demo",
"model_name": "chat_assistant",
"topic": "banking",
},
"external": False,
},
},
{
"evaluator_id": "retrieval-answer-relevance-large-2024-07-23",
"profile_name": "system:detect-rag-irrelevant-answers",
"status": "success",
"error_message": None,
...
"evaluation_result": {
"id": None,
"app": "demo_banking_chat_assistant",
"created_at": None,
"evaluator_id": "retrieval-answer-relevance-large-2024-07-23",
"profile_name": "system:detect-rag-irrelevant-answers",
"evaluated_model_system_prompt": None,
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_output": "You can deposit a check by either going to your nearest bank branch, an ATM, or through your bank's mobile app.",
"evaluated_model_gold_answer": None,
"explain_strategy": "never",
"pass": True,
"score_raw": 1.0,
"additional_info": {
...
"positions": None,
"extra": None,
"confidence_interval": None,
},
"explanation": None,
"evaluation_duration": "PT1.559S",
"explanation_duration": None,
"evaluation_run_project_id": None,
"evaluation_run_id": None,
"evaluator_family": "retrieval-answer-relevance",
"evaluator_profile_public_id": "92069477-3dd8-406c-a834-e87ef6ec1df9",
"evaluated_model_id": None,
"evaluated_model_name": None,
"evaluated_model_provider": None,
"evaluated_model_params": None,
"evaluated_model_selected_model": None,
"dataset_id": None,
"dataset_sample_id": None,
"tags": {
"project": "demo",
"model_name": "chat_assistant",
"topic": "banking",
},
"external": False,
},
},
{
"evaluator_id": "custom-large-2024-08-08",
"profile_name": "system:is-concise",
"status": "success",
"error_message": None,
...
"evaluation_result": {
"id": None,
"app": "demo_banking_chat_assistant",
"created_at": None,
"evaluator_id": "custom-large-2024-08-08",
"profile_name": "system:is-concise",
"system_prompt": None,
"evaluated_model_system_prompt": None,
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_output": "You can deposit a check by either going to your nearest bank branch, an ATM, or through your bank's mobile app.",
"evaluated_model_gold_answer": None,
"explain_strategy": "never",
"pass": True,
"score_raw": 1.0,
"additional_info": {
...
"positions": None,
"extra": None,
"confidence_interval": None,
},
"explanation": None,
"evaluation_duration": "PT3.574S",
"explanation_duration": None,
"evaluation_run_project_id": None,
"evaluation_run_id": None,
"evaluator_family": "custom",
"evaluator_profile_public_id": "42b26216-a359-430e-b60a-49176599cbb2",
"evaluated_model_id": None,
"evaluated_model_name": None,
"evaluated_model_provider": None,
"evaluated_model_params": None,
"evaluated_model_selected_model": None,
"dataset_id": None,
"dataset_sample_id": None,
"tags": {
"project": "demo",
"model_name": "chat_assistant",
"topic": "banking",
},
"external": False,
},
},
]
In its current form, the response is quite massive. Don't worry, we're actively working on cleaning things up! Since we requested evaluations from 3 evaluators for that first sample, we get back a list of 3 evaluation results with relevant metadata. For the sake of simplicity, let's just look at that first result for retrieval-hallucination
. You can see what the result on its own looks like below.
{
"evaluator_id": "retrieval-hallucination-large-2024-07-23",
"profile_name": "system:detect-rag-hallucination",
"status": "success",
"error_message": None,
...
"evaluation_result": {
...
"id": None,
"app": "demo_banking_chat_assistant",
"created_at": None,
"evaluator_id": "retrieval-hallucination-large-2024-07-23",
"profile_name": "system:detect-rag-hallucination",
"evaluated_model_system_prompt": None,
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_output": "You can deposit a check by either going to your nearest bank branch, an ATM, or through your bank's mobile app.",
"evaluated_model_gold_answer": None,
"explain_strategy": "never",
"pass": True,
"score_raw": 1.0,
"additional_info": {
...
"positions": None,
"extra": None,
"confidence_interval": None,
},
"explanation": None,
"evaluation_duration": "PT3.095S",
"explanation_duration": None,
"evaluation_run_project_id": None,
"evaluation_run_id": None,
"evaluator_family": "retrieval-hallucination",
"evaluator_profile_public_id": "0bcf9064-1a60-4263-9c9d-fdeb2e618671",
"evaluated_model_id": None,
"evaluated_model_name": None,
"evaluated_model_provider": None,
"evaluated_model_params": None,
"evaluated_model_selected_model": None,
"dataset_id": None,
"dataset_sample_id": None,
"tags": {
"project": "demo",
"model_name": "chat_assistant",
"topic": "banking",
},
"external": False,
},
}
There's a few things to note about what we got back:
- We initially called
retrieval-hallucination
in our API request but it seems like theevaluator_id
we got back actually refers toretrieval-hallucination-large-2024-07-23
. That's becauseretrieval-hallucination
is an alias and points toretrieval-hallucination-large-2024-07-23
by default. This will usually be the most recent and most powerful evaluator we offer in this category. - Whether a request passed a specific evaluator can be found in
evaluation_result["pass"]
as eitherTrue
orFalse
since the evaluator supports a binary pass/fail response in this instance. evaluation_result["explanation"]
returnsNone
here because we did not explicitly request an explanation from the evaluator.
Logging Results to the Web App
There's a lot more information returned but let's not worry too much about it now. Instead, let's focus on the one big thing we haven't done yet as part of this demo: logging. The results we got back are great and we can use them in code. The issue is that we're currently not logging them to the LLM Monitoring view (Patronus' monitoring dashboard accessible on the web app) in order to easily view them later on. Put simply, we're not capturing the results.
Going back to the original Python script, you just need to modify the capture
parameter to always
. You could also make it on-fail
if you only care about failures. We'd recommend always logging results because it can't hurt but that's up to you. You can always filter on failure via the web app. The relevant excerpt is below
"app": "demo_banking_chat_assistant",
"tags": {
"project": "demo",
"model_name": "chat_assistant",
"topic": "banking",
},
"capture": "all",
We've included this snippet to point out two concepts that are helpful when logging:
- Using an
app
name as part of a call is a great way to separate projects or use cases in the LLM Monitoring view. Doing that helps keep things clean and organized. - We understand that there's probably a bunch of other things you want to keep track of. That's why you can also add your own
tags
which takes in a dictionary with any key/value pairs you want. Then you can filter by those as well on the web app.
Now with that all done, you can re-run the script and should be able to see the results logged neatly to the web app!
You can click into results if you want a dedicated view of the response. That's the easiest to view things and make sense of a result. And that's pretty much it for the tutorial.
That's all folks! Take a look at some of the recommended sections below if you'd like to dive deeper. And most importantly we hope you enjoy using Patronus AI. 😄
Updated 24 days ago