Using Datasets

Datasets are a fundamental component of experiments in the Patronus SDK. They provide the inputs and context needed to evaluate your Generative AI applications. The SDK offers flexible ways to work with datasets and supports various data formats and sources.

Dataset Formats

The SDK accepts datasets in several formats:

List of dictionaries.
pandas.DataFrames objects.
patronus.Dataset objects. These are datasets that are uploaded to the Patronus platform.
Asynchronous functions that return any of the above.

The SDK automatically handles dataset loading and conversion internally, so you can focus on your experiment logic rather than data management.

Dataset Fields

You can run experiments on any dataset with arbitrary schema in our SDK to allow maximum flexibility. We highly recommend that you map the dataset fields to corresponding fields in Patronus Datasets, as this allows you to use evaluators in the Patronus API and other platform features. This is easy to do with our data adaptors:

Python

# Load CSV
dataset = read_csv(
    "path/to/dataset.csv",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response",
)
 
# Load JSONL
dataset = read_jsonl(
    "path/to/dataset.jsonl",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response"
)

Alternatively, you can perform this mapping yourself with a CSV loader:

Python

import csv
 
dataset = []
 
with open('dataset.csv', mode='r') as file:
	csv_reader = csv.DictReader(file)
	for row in csv_reader:
  	dataset.append({
			"evaluated_model_input": row["YOUR_INPUT_FIELD"],
			"evaluated_model_gold_answer": row["YOUR_GOLD_ANSWER_FIELD"],
		})

Patronus Datasets Fields

The Patronus SDK supports the following fields in datasets:

Field Name	Type	Description
`sid`	int	A unique identifier for each data sample. This can be used to track and reference specific entries within a dataset. The `sid` should be a number starting from 1. If not provided, the number will be inferred from the position in the dataset.
`evaluated_model_system_prompt`	str	The system prompt provided to the model, setting the context or behavior for the model's response.
`evaluated_model_retrieved_context`	list[str]	A list of context strings (`list[str]`) retrieved and provided to the model as additional information. This is typically used in a Retrieval-Augmented Generation (RAG) setup, where the model's response depends on external context or supporting information that has been fetched from a knowledge base or similar source.
`evaluated_model_input`	str	Typically a user input provided to the model that it must respond to.
`evaluated_model_output`	str	The output generated by the model.
`evaluated_model_gold_answer`	str	The expected or correct answer that the model output is compared against during evaluation.

We recommend you map fields in your datasets to these fields to integrate with our API and platform features. Note that evaluators in the Patronus API follow a structured schema and expect these fields. User defined evaluators can access arbitrary fields in the dataset.

Other Dataset Fields

Developers may have fields outside of our supported schema. In these cases, you can still access the dataset fields in tasks and evaluators as part of a Row object. For example,

Text

@task
async def my_task(row: Row):
    my_field_1 = row.field_1
    my_field_2 = row.field_2
    ...
    
@evaluator
def exact_match(row: Row, task_result: TaskResult):
    return task_result.evaluated_model_output == row.evaluated_model_gold_answer

Option 1: Lists of Dictionaries

The simplest way to provide data is passing a list of dictionaries directly to the experiment. This can be defined in line:

Python

dataset = [
    {
        "evaluated_model_system_prompt": "You are a helpful assistant.",
        "evaluated_model_input": "How do I write a Python function?",
    },
    {
        "evaluated_model_system_prompt": "You are a knowledgeable assistant.",
        "evaluated_model_input": "Explain polymorphism in OOP.",
    },
]
 
cli.experiment(
    "Project Name",
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
)

Option 2: Load Files Locally

For datasets stored locally in .csv or .jsonl format, we provide native data adaptors that make it easy to map fields from your dataset to our schema:

Python

from patronus import read_csv, read_jsonl
 
## Load CSV
dataset = read_csv(
    "path/to/dataset.csv",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response",
)
 
## Load JSONL
dataset = read_jsonl(
    "path/to/dataset.jsonl",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response"
)
 
cli.experiment(
    "Project Name",
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
)

Option 3: pandas DataFrames

You can pass pandas DataFrames directly to experiments:

Python

import pandas as pd
 
df = pd.DataFrame([
    {"user_input": "Query 1", "model_output": "Response 1"},
    {"user_input": "Query 2", "model_output": "Response 2"},
])
 
cli.experiment(
    "Project Name",
    dataset=df,
    task=task,
    evaluators=[evaluator],
)

Option 4: Hosted Datasets

Datasets that follow the schema defined above can be uploaded to the Patronus AI platform. These can then be accessed as remote datasets.

We also support a number of off-the-shelf datasets, such as Financebench. To use datasets hosted on the Patronus AI platform:

Python

financebench_dataset = cli.remote_dataset("financebench")
 
## The framework will handle loading automatically when passed to an experiment
cli.experiment(
    "Project Name",
    dataset=financebench_dataset,
    task=task,
    evaluators=[evaluator],
)

Option 5: Patronus Datasets

Python

from patronus import Dataset
 
# Create from records (list of dicts)
dataset = Dataset.from_records([
    {"user_input": "Query 1", "model_output": "Response 1"},
    {"user_input": "Query 2", "model_output": "Response 2"},
], dataset_id="my-custom-dataset")
 
# Create from DataFrame
df = pd.DataFrame(...)
dataset = Dataset.from_dataframe(df, dataset_id="my-custom-dataset")
 
cli.experiment(
    "Project Name",
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
)

Advanced

Custom Dataset Loaders

You can create custom dataset loaders using async functions:

Python

 
import random
from patronus import Dataset
 
async def load_random_subset() -> Dataset:
    loader = cli.remote_dataset("pii-questions-1.0.0")
    dataset = await loader.load()
    # Modify the dataset
    subset = dataset.df.sample(n=10)
    return Dataset.from_dataframe(subset, dataset_id="random-subset")
 
## The framework will handle the async loading
cli.experiment(
    "Project Name",
    dataset=load_random_subset,
    task=task,
    evaluators=[evaluator],
)

Using Datasets

On this page