GuidesCookbooks
Evaluating RAG Agents
This cookbook shows a minimal RAG evaluation setup using a mocked in-memory knowledge base and the current Patronus Python SDK.
Setup
Install dependencies:
Set environment variables:
1. Create a Dataset
Use Patronus dataset fields (task_input, gold_answer) so evaluators work out of the box.
2. Mock a Knowledge Base
Instead of using a vector database, create a simple list of documents and a tiny retriever.
3. Define the RAG Task
Use Row + TaskResult and pass retrieved context into the model prompt.
4. Run an Experiment
Run one evaluator for answer correctness and one for grounding quality.
5. Compare Model Variants
You can compare both runs in the Experiments UI and inspect row-level failures to improve retrieval logic or prompts.
