Our docs got a refresh! Check out the new content and improved navigation. For detailed API reference see our Python SDK docs and TypeScript SDK.
Description
Datasets

Concepts

Understanding datasets in Patronus AI

What are datasets?

A dataset is a collection of test examples used to evaluate your LLM application. Each example typically includes inputs (like questions or prompts) and optionally expected outputs, context, or metadata.

Datasets let you systematically test your AI system against real-world scenarios to measure quality, safety, and performance.

Dataset schema

Datasets in Patronus use a standard schema with these fields:

  • task_input: The main input to your LLM (required)
  • task_output: The expected or actual output from your LLM
  • task_context: Additional context like retrieved documents
  • gold_answer: Reference answer for comparison
  • system_prompt: System message or instructions
  • tags: Labels for organizing examples
  • sid: Sample ID for tracking specific examples
  • task_metadata: Additional custom fields

You can include any of these fields in your dataset, and you can also add custom fields that your tasks or evaluators can access.

How to get datasets

There are two main ways to create or obtain datasets:

Upload existing data

Upload your own datasets via the UI or load them programmatically:

  • Supported formats: CSV, JSONL, pandas DataFrames, or Python lists
  • Size limit: 30,000 rows via UI upload
  • Field mapping: Map your column names to Patronus schema fields

See uploading datasets for details.

Use off-the-shelf datasets

Patronus provides pre-built datasets for common use cases:

  • Safety datasets: PII detection, toxic content, OWASP security tests
  • Benchmarks: FinanceBench, HaluBench
  • Domain-specific: Financial, legal, and more

Each dataset contains curated examples you can download and use immediately.

See off-the-shelf datasets for the complete list.

Next steps

On this page