When conducting experiments with the Patronus AI platform, datasets play a crucial role. They provide the inputs, context, and expected outputs needed to evaluate your models. There are several ways to define and use datasets in your experiments.

Passing Data Directly

As demonstrated in previous pages, you can pass your dataset directly to the experiment. This is useful when you have a small, predefined set of data and you want to quickly evaluate it.

data = [
    {
        "evaluated_model_system_prompt": "You are a helpful assistant.",
        "evaluated_model_input": "How do I write a Python function?",
    },
    {
        "evaluated_model_system_prompt": "You are a knowledgeable assistant.",
        "evaluated_model_input": "Explain the concept of polymorphism in OOP.",
    },
]

cli.experiment(
    "Project Name",
    data=data,
    task=task,
    evaluators=[evaluator],
)

Using Remote Datasets

The Patronus AI platform provides datasets that you can use out-of-the-box or datasets that have been uploaded by customers. These datasets are accessible via the platform and can be referenced in your experiments by specifying the dataset ID.

pii_dataset = cli.remote_dataset("pii-questions-1.0.0")

cli.experiment(
    "Project Name",
    data=pii_dataset,
    task=task,
    evaluators=[evaluator],
)

In this example, "pii-questions-1.0.0" is the dataset_id for a specific dataset available on the platform.

Loading Datasets from Local Storage

If you have datasets stored locally in CSV or JSONL format, you can load them into your experiment using the read_csv and read_jsonl.

local_dataset = read_csv("path/to/your/dataset.csv")

local_dataset = read_jsonl("path/to/your/dataset.jsonl")

When loading a dataset, you also have the option to provide a dataset_id as a parameter. This dataset_id will be included as metadata in the evaluation results, helping you track and reference the dataset used in your experiments. If you don't provide a dataset_id, the framework will automatically generate a name based on the file path.

local_dataset = read_csv("path/to/your/dataset.csv", dataset_id="my-custom-dataset")

local_dataset = read_jsonl("path/to/your/dataset.jsonl", dataset_id="my-custom-dataset")

Expected Fields in Datasets

When working with datasets in the Patronus AI platform, whether you're passing data directly, using remote datasets, or loading data from local files, it's important to understand the expected fields that the framework uses. These fields must be correctly defined in your dataset to ensure that tasks and evaluations can be executed properly.

Here’s a breakdown of the expected fields, which you may encounter or need to define in your datasets:

sid: A unique identifier for each data sample. This can be used to track and reference specific entries within a dataset. The sid should be a number starting from 1. If not provided, the number will be inferred from the position in the dataset.
evaluated_model_system_prompt: The system prompt provided to the model, setting the context or behavior for the model's response.
evaluated_model_retrieved_context: A list of context strings (list[str]) retrieved and provided to the model as additional information. This field is typically used in a Retrieval-Augmented Generation (RAG) setup, where the model's response depends on external context or supporting information that has been fetched from a knowledge base or similar source.
evaluated_model_input: Typically a user input provided to the model that it must respond to.
evaluated_model_output: The output generated by the model.
evaluated_model_gold_answer: The expected or correct answer that the model output is compared against during evaluation. This field is used to assess the accuracy and quality of the model's response.
evaluated_model_name: Metadata field that typically is a name of an AI agent (like "my-assistant") that was used to generate evaluated_model_output. This field should be included only if the dataset already includes evaluated_model_output.
evaluated_model_provider: A metadata field with the provider of the base model (like "openai"). This field should be included only if the dataset already includes evaluated_model_output.
evaluated_model_selected_model: A metadata field with the version of the base model (like "gpt-4o"). This field should be included only if the dataset already includes evaluated_model_output.
evaluated_model_params: A dictionary containing parameters used by the model during output generation, such as {"temperature": 0, "max_tokens": 256}. This field should be included only if the dataset already includes evaluated_model_output.

All these fields are optional in a dataset, but they may be required depending on the specific experiment setup.

Customizing Field Mappings

When loading datasets from local CSV or JSONL files, you can customize how fields in your file are mapped to the expected fields in the Patronus framework using the _field parameters. This allows you to adjust for differences in column names or keys in your files.

local_dataset = read_csv(
    "path/to/your/dataset.csv",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response"
)

local_dataset = read_jsonl(
    "path/to/your/dataset.jsonl",
    evaluated_model_input_field="input_text",
    evaluated_model_output_field="model_response"
)

In this example, the fields input_text and model_response in your files are mapped to evaluated_model_input and evaluated_model_output, respectively, as defined by the framework.

The exact field names that can be mapped are listed in the "Expected Fields in Datasets" section above. This flexibility ensures that you can easily integrate your existing datasets into the Patronus AI platform, even if they use different naming conventions.

Custom Dataset Loaders

In addition to using predefined methods like read_csv and read_jsonl for loading datasets, the Patronus framework allows you to define custom dataset loaders. These loaders can be any function—either synchronous or asynchronous—that returns a Dataset object. This provides great flexibility for dynamically generating or modifying datasets before they are used in an experiment.

Example: Creating a Random Subset from a Remote Dataset

Here’s an example of a custom dataset loader that retrieves a random subset of data from a remote Patronus dataset:

import random
from patronus import Dataset


async def load_dataset() -> Dataset:
    dataset = await cli.remote_dataset("pii-questions-1.0.0")
    dataset.data = random.choices(dataset.data, k=10)
    return dataset


cli.experiment(
    "PII",
    data=load_dataset,
    task=task,
    evaluators=[evalutor],
)