Description

Using Datasets

Datasets are a fundamental component of experiments in the Patronus SDK. They provide the inputs and context needed to evaluate your Generative AI applications. The SDK offers flexible ways to work with datasets and supports various data formats and sources.

Dataset Formats

The SDK accepts datasets in several formats:

  • List of dictionaries.
  • pandas.DataFrames objects.
  • patronus.Dataset objects. These are datasets that are uploaded to the Patronus platform.
  • Asynchronous functions that return any of the above.

The SDK automatically handles dataset loading and conversion internally, so you can focus on your experiment logic rather than data management.

Dataset Fields

You can run experiments on any dataset with arbitrary schema in our SDK to allow maximum flexibility. We highly recommend that you map the dataset fields to corresponding fields in Patronus Datasets, as this allows you to use evaluators in the Patronus API and other platform features. This is easy to do with our data adaptors:

Python
# Load CSV
dataset = read_csv(
    "path/to/dataset.csv",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response",
)
 
# Load JSONL
dataset = read_jsonl(
    "path/to/dataset.jsonl",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response"
)

Alternatively, you can perform this mapping yourself with a CSV loader:

Python
import csv
 
dataset = []
 
with open('dataset.csv', mode='r') as file:
	csv_reader = csv.DictReader(file)
	for row in csv_reader:
  	dataset.append({
			"evaluated_model_input": row["YOUR_INPUT_FIELD"],
			"evaluated_model_gold_answer": row["YOUR_GOLD_ANSWER_FIELD"],
		})

Patronus Datasets Fields

The Patronus SDK supports the following fields in datasets:

Field NameTypeDescription
sidintA unique identifier for each data sample. This can be used to track and reference specific entries within a dataset. The sid should be a number starting from 1. If not provided, the number will be inferred from the position in the dataset.
evaluated_model_system_promptstrThe system prompt provided to the model, setting the context or behavior for the model's response.
evaluated_model_retrieved_contextlist[str]A list of context strings (list[str]) retrieved and provided to the model as additional information. This is typically used in a Retrieval-Augmented Generation (RAG) setup, where the model's response depends on external context or supporting information that has been fetched from a knowledge base or similar source.
evaluated_model_inputstrTypically a user input provided to the model that it must respond to.
evaluated_model_outputstrThe output generated by the model.
evaluated_model_gold_answerstrThe expected or correct answer that the model output is compared against during evaluation.

We recommend you map fields in your datasets to these fields to integrate with our API and platform features. Note that evaluators in the Patronus API follow a structured schema and expect these fields. User defined evaluators can access arbitrary fields in the dataset.

Other Dataset Fields

Developers may have fields outside of our supported schema. In these cases, you can still access the dataset fields in tasks and evaluators as part of a Row object. For example,

Text
@task
async def my_task(row: Row):
    my_field_1 = row.field_1
    my_field_2 = row.field_2
    ...
    
@evaluator
def exact_match(row: Row, task_result: TaskResult):
    return task_result.evaluated_model_output == row.evaluated_model_gold_answer

Option 1: Lists of Dictionaries

The simplest way to provide data is passing a list of dictionaries directly to the experiment. This can be defined in line:

Python
dataset = [
    {
        "evaluated_model_system_prompt": "You are a helpful assistant.",
        "evaluated_model_input": "How do I write a Python function?",
    },
    {
        "evaluated_model_system_prompt": "You are a knowledgeable assistant.",
        "evaluated_model_input": "Explain polymorphism in OOP.",
    },
]
 
cli.experiment(
    "Project Name",
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
)

Option 2: Load Files Locally

For datasets stored locally in .csv or .jsonl format, we provide native data adaptors that make it easy to map fields from your dataset to our schema:

Python
from patronus import read_csv, read_jsonl
 
## Load CSV
dataset = read_csv(
    "path/to/dataset.csv",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response",
)
 
## Load JSONL
dataset = read_jsonl(
    "path/to/dataset.jsonl",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response"
)
 
cli.experiment(
    "Project Name",
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
)

Option 3: pandas DataFrames

You can pass pandas DataFrames directly to experiments:

Python
import pandas as pd
 
df = pd.DataFrame([
    {"user_input": "Query 1", "model_output": "Response 1"},
    {"user_input": "Query 2", "model_output": "Response 2"},
])
 
cli.experiment(
    "Project Name",
    dataset=df,
    task=task,
    evaluators=[evaluator],
)

Option 4: Hosted Datasets

Datasets that follow the schema defined above can be uploaded to the Patronus AI platform. These can then be accessed as remote datasets.

We also support a number of off-the-shelf datasets, such as Financebench. To use datasets hosted on the Patronus AI platform:

Python
financebench_dataset = cli.remote_dataset("financebench")
 
## The framework will handle loading automatically when passed to an experiment
cli.experiment(
    "Project Name",
    dataset=financebench_dataset,
    task=task,
    evaluators=[evaluator],
)

Option 5: Patronus Datasets

Python
from patronus import Dataset
 
# Create from records (list of dicts)
dataset = Dataset.from_records([
    {"user_input": "Query 1", "model_output": "Response 1"},
    {"user_input": "Query 2", "model_output": "Response 2"},
], dataset_id="my-custom-dataset")
 
# Create from DataFrame
df = pd.DataFrame(...)
dataset = Dataset.from_dataframe(df, dataset_id="my-custom-dataset")
 
cli.experiment(
    "Project Name",
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
)

Advanced

Custom Dataset Loaders

You can create custom dataset loaders using async functions:

Python
 
import random
from patronus import Dataset
 
async def load_random_subset() -> Dataset:
    loader = cli.remote_dataset("pii-questions-1.0.0")
    dataset = await loader.load()
    # Modify the dataset
    subset = dataset.df.sample(n=10)
    return Dataset.from_dataframe(subset, dataset_id="random-subset")
 
## The framework will handle the async loading
cli.experiment(
    "Project Name",
    dataset=load_random_subset,
    task=task,
    evaluators=[evaluator],
)

On this page