What is Patronus AI?

Patronus AI is the leading tool to score and optimize Generative AI applications.

Patronus provides an end-to-end system to evaluate, monitor and improve performance of an LLM system, enabling developers to ship AI products safely and confidently.

Alt text

Experimentation Framework: A/B test and optimize LLM system performance with experiments on different prompt, model, and data configurations

Real Time Monitoring: Monitor and receive real time alerts on LLM and agent interactions in production through tracing, logging, and alerts.

Visualizations and Analytics: Visualize performance of your AI applications, compare outputs side-by-side, and obtain insights to improve system performance over time.

Powerful Evaluation Models: Automatically catch hallucinations and unsafe outputs using our powerful suite of in-house evaluators through our Evaluation API, including Lynx, Glider, or define your own evaluator in our SDK.

Dataset Generation: Construct high quality custom datasets with our proprietary dataset generation algorithms for RAG, Agents, and other architectures. Automatically expose weaknesses in your AI systems with our redteaming algorithms.

What is Patronus AI?

Getting started

Core LLM evaluation concepts

Run a Patronus evaluation

Patronus Evaluators

Patronus experiments

Debug agent failures

Production LLM monitoring

Read a guide

The LLM Testing Guide

Start building

Debug agent failures

Benchmark models

Build evals with Percival Chat

Custom error taxonomy

Evaluate RAG applications

Add guardrails

LLM-as-judges

Prompt management

Human-in-the-loop annotations

Generate test datasets

Learn how other customers use Patronus

Nova AI <> Patronus

Etsy <> Patronus

Weaviate <> Patronus

On this page