Experiments
Note: For comprehensive API documentation and more detailed examples, please refer to the Patronus Python SDK documentation.
At a high level, an Experiment is a structured evaluation of your AI application's performance across multiple samples. Experiments allow you to run batched evaluations to compare performance across different configurations, models, and datasets, so that you can make informed decisions to optimize your AI applications.
Patronus provides an intuitive Experimentation Framework to help you continuously improve your AI applications. Whether you're developing RAG apps, fine-tuning your own models, or iterating on prompts, this framework provides the tools you need to set up, execute, and analyze experiments efficiently.
Key Components
An experiment in Patronus consists of several components:
-
Dataset: A collection of examples to evaluate, which can be:
- A list of dictionaries in your code
- A CSV or JSON file
- A Pandas DataFrame
- Data from a Patronus dataset
-
Task (Optional): A function that processes each example, typically:
- Takes input from the dataset
- Calls an LLM or other AI system
- Returns an output for evaluation
- If your dataset already contains outputs, you can skip defining a task
-
Evaluators: One or more evaluators that assess the quality of outputs:
- Class-based: Extend
StructuredEvaluator
for more complex logic - Function-based: Simple functions wrapped with
FuncEvaluatorAdapter
- Remote: Patronus-hosted evaluators accessible via
RemoteEvaluator
- Class-based: Extend
-
Configuration: Additional options to customize experiment behavior:
- Project and experiment names for organization
- Tags for filtering and categorization
- Concurrency settings for performance
- Instrumentation options for tracing
A Simple Experiment
Here's a basic example of running an experiment:
Benefits of the Experimentation Framework
Using the Patronus Experimentation Framework offers several advantages:
- Standardization: Consistent evaluation methodology across different models and datasets
- Reproducibility: Easily rerun experiments with the same configuration
- Efficiency: Parallel execution for faster evaluation of large datasets
- Visibility: Detailed metrics and visualizations in the Patronus platform
- Integration: Seamless connection with logging and tracing for end-to-end visibility
Getting Started with Experiments
Ready to dive in? Check out these detailed guides:
- Running your first experiment with Python
- Working with different types of evaluators
- Managing and creating datasets
- Advanced experiment configurations
For comprehensive API references and additional examples, please refer to the Patronus Python SDK documentation.