Concepts

What are datasets?

A dataset is a collection of test examples used to evaluate your LLM application. Each example typically includes inputs (like questions or prompts) and optionally expected outputs, context, or metadata.

Datasets let you systematically test your AI system against real-world scenarios to measure quality, safety, and performance.

Dataset schema

Datasets in Patronus use a standard schema with these fields:

task_input: The main input to your LLM (required)
task_output: The expected or actual output from your LLM
task_context: Additional context like retrieved documents
gold_answer: Reference answer for comparison
system_prompt: System message or instructions
tags: Labels for organizing examples
sid: Sample ID for tracking specific examples
task_metadata: Additional custom fields

You can include any of these fields in your dataset, and you can also add custom fields that your tasks or evaluators can access.

How to get datasets

There are two main ways to create or obtain datasets:

Upload existing data

Upload your own datasets via the UI or load them programmatically:

Supported formats: CSV, JSONL, pandas DataFrames, or Python lists
Size limit: 30,000 rows via UI upload
Field mapping: Map your column names to Patronus schema fields

See uploading datasets for details.

Use off-the-shelf datasets

Patronus provides pre-built datasets for common use cases:

Safety datasets: PII detection, toxic content, OWASP security tests
Benchmarks: FinanceBench, HaluBench
Domain-specific: Financial, legal, and more

Each dataset contains curated examples you can download and use immediately.

See off-the-shelf datasets for the complete list.

Next steps

Learn how to upload datasets
Understand using datasets in experiments
Explore off-the-shelf datasets
Work with large datasets