Using Datasets

Datasets are a fundamental component of experiments in the Patronus SDK. They provide the inputs and context needed to evaluate your Generative AI applications. The SDK offers flexible ways to work with datasets and supports various data formats and sources.

Dataset Formats

The SDK accepts datasets in several formats:

List of dictionaries
pandas.DataFrames objects
patronus.Dataset objects created from various sources
Remote datasets loaded from the Patronus platform
Asynchronous functions that return any of the above

The SDK automatically handles dataset loading and conversion internally, so you can focus on your experiment logic rather than data management.

Dataset Fields

You can run experiments on any dataset with arbitrary schema in our SDK to allow maximum flexibility. We highly recommend that you map the dataset fields to corresponding fields in Patronus Datasets, as this allows you to use evaluators in the Patronus API and other platform features. This is easy to do with our data adaptors:

from patronus.datasets import read_csv, read_jsonl
 
# Load CSV
dataset = read_csv(
    "path/to/dataset.csv",
    task_input_field="input_text",
    task_output_field="model_response",
)
 
# Load JSONL
dataset = read_jsonl(
    "path/to/dataset.jsonl",
    task_input_field="input_text",
    task_output_field="model_response"
)

Alternatively, you can perform this mapping yourself with a CSV loader:

import csv
 
dataset = []
 
with open('dataset.csv', mode='r') as file:
    csv_reader = csv.DictReader(file)
    for row in csv_reader:
        dataset.append({
            "task_input": row["YOUR_INPUT_FIELD"],
            "gold_answer": row["YOUR_GOLD_ANSWER_FIELD"],
        })

Patronus Datasets Fields

The Patronus SDK supports the following standard fields in datasets:

Field Name	Type	Description
`sid`	str/int	A unique identifier for each data sample. This can be used to track and reference specific entries within a dataset. If not provided, it will be automatically generated.
`system_prompt`	str	The system prompt provided to the model, setting the context or behavior for the model's response.
`task_context`	str/list[str]	Additional context or information provided to the model. This can be a string or list of strings, typically used in RAG setups where the model's response depends on external information.
`task_input`	str	The primary input to the model or task, typically a user query or instruction.
`task_output`	str	The output generated by the model or task being evaluated.
`gold_answer`	str	The expected or correct answer that the model output is compared against during evaluation.
`tags`	dict	Key-value pairs for categorizing and filtering samples.
`task_metadata`	dict	Additional structured information about the task.

We recommend you map fields in your datasets to these fields to integrate with our API and platform features. Note that evaluators in the Patronus API follow a structured schema and expect these fields. User defined evaluators can access arbitrary fields in the dataset.

Other Dataset Fields

Developers may have fields outside of our supported schema. In these cases, you can still access the dataset fields in tasks and evaluators as part of a Row object. For example,

def my_task(row, **kwargs):
    # Access standard fields
    question = row.task_input
    reference = row.gold_answer
    
    # Access custom fields directly
    my_field_1 = row.field_1
    my_field_2 = row.field_2
    
    return f"Processing question: {question}"
    
@evaluator()
def exact_match(row, task_result, **kwargs):
    return task_result.output == row.gold_answer

Option 1: Lists of Dictionaries

The simplest way to provide data is passing a list of dictionaries directly to the experiment. This can be defined in line:

from patronus.experiments import run_experiment
 
dataset = [
    {
        "system_prompt": "You are a helpful assistant.",
        "task_input": "How do I write a Python function?",
    },
    {
        "system_prompt": "You are a knowledgeable assistant.",
        "task_input": "Explain polymorphism in OOP.",
    },
]
 
experiment = run_experiment(
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
    experiment_name="Project Name"
)

Option 2: Load Files Locally

For datasets stored locally in .csv or .jsonl format, we provide native data adaptors that make it easy to map fields from your dataset to our schema:

from patronus.datasets import read_csv, read_jsonl
from patronus.experiments import run_experiment
 
# Load CSV
dataset = read_csv(
    "path/to/dataset.csv",
    task_input_field="input_text",
    task_output_field="model_response",
)
 
# Load JSONL
dataset = read_jsonl(
    "path/to/dataset.jsonl",
    task_input_field="input_text",
    task_output_field="model_response"
)
 
experiment = run_experiment(
    dataset=dataset,
    task=task,
    evaluators=[evaluator],
    experiment_name="Project Name"
)

Option 3: pandas DataFrames

You can pass pandas DataFrames directly to experiments:

import pandas as pd
from patronus.experiments import run_experiment
 
df = pd.DataFrame([
    {"user_input": "Query 1", "model_output": "Response 1"},
    {"user_input": "Query 2", "model_output": "Response 2"},
])
 
experiment = run_experiment(
    dataset=df,
    task=task,
    evaluators=[evaluator],
    experiment_name="Project Name"
)

Option 4: Hosted Datasets

Datasets that follow the schema defined above can be uploaded to the Patronus AI platform. These can then be accessed as remote datasets.

We also support a number of off-the-shelf datasets, such as Financebench. To use datasets hosted on the Patronus AI platform:

from patronus.datasets import RemoteDatasetLoader
from patronus.experiments import run_experiment
 
# Load a dataset from the Patronus platform using its name
financebench_dataset = RemoteDatasetLoader("financebench")
 
# Load a dataset from the Patronus platform using its ID
# financebench_dataset = RemoteDatasetLoader(by_id="d-jxrisvlp1hgf786h")
 
# The framework will handle loading automatically when passed to an experiment
experiment = run_experiment(
    dataset=financebench_dataset,
    task=task,
    evaluators=[evaluator],
    experiment_name="Project Name"
)

Advanced

Custom Dataset Loaders

You can create custom dataset loaders using async functions:

import random
from patronus.datasets import Dataset, RemoteDatasetLoader
from patronus.experiments import run_experiment
 
async def load_random_subset():
    loader = RemoteDatasetLoader("pii-questions-1.0.0")
    dataset = await loader.load()
    # Modify the dataset
    subset = dataset.df.sample(n=10)
    return Dataset.from_dataframe(subset, dataset_id="random-subset")
 
# The framework will handle the async loading
experiment = run_experiment(
    dataset=load_random_subset,
    task=task,
    evaluators=[evaluator],
    experiment_name="Project Name"
)

Using Datasets

On this page