Experiments

At a high level, an Experiment is a collection of logs. Experiments allow you to run batched evals to compare performance across different configurations, models, and datasets, so that you can make informed decisions to optimize performance of your AI applications.

Patronus provides an intuitive Experimentation Framework to help you continuously improve your AI applications. Whether you're developing RAG apps, finetuning your own models, or prompt engineering, this framework provides the tools you need to set up, execute, and analyze experiments efficiently.

Overview

Experiments consist of several components

dataset: a list of samples with evaluated_model_input and evaluated_model_output. There are other ways to provide datasets which we will cover later.
task: typically a function that takes inputs (like evaluated_model_input in this case) and produces anevaluated_model_output.
evaluators: a list of evaluators used for the experiment.

Ready to dive in? Head over to the Quick Start page for a step-by-step guide on setting up your first experiment. You'll learn how to define evaluators, create tasks, work with datasets, and run evaluations, all within the Patronus AI framework.

Experiments

Overview

On this page