What is Patronus AI?

Patronus AI is the leading LLM automated testing and evaluation platform.

LLMs are powerful, and their use cases are only just beginning to be unlocked. However, it's hard to catch the mistakes they make in a scalable way. LLMs have a wide space of model behavior, and there are few well-established, comprehensive benchmarks for LLM performance. Ensuring that LLMs consistently produce safe and helpful outputs is critical to unlocking their full value.

Patronus AI helps you do just that by bringing innovations in LLM Evaluation research to your fingertips. You can detect LLM mistakes in one line of code using our powerful suite of evaluators, monitor the performance of your GenAI system over time, and understand critical failure modes so you can remediate and fix them.

Patronus features are split between the Web platform, APIs, and Services:

  1. Web platform:
    1. Evaluator Playground: Find and test the best evaluators on your use case
    2. Custom Evaluators: Create new evaluators tailored exactly to your needs
    3. LLM Monitoring: View long-term trends of your LLM's performance and dive into specific evaluation failures
    4. Evaluation Runs (Enterprise): Register your LLM so Patronus can call and test it on your behalf
    5. Datasets (Enterprise): Upload and access adversarial datasets and benchmarks
  2. API:
    1. Evaluate API: Call our proprietary and research-leading evaluators directly in code to get immediate insight into LLM failures
    2. Evaluator Profiles API: Create new evaluator configurations (in Patronus lingo, we call these profiles) on-the-fly to adapt evaluator behavior to incoming requests.
  3. Services:
    1. Test Dataset Generation (Enterprise): Get custom test datasets for your use case, based on your product goals and any relevant documents.
    2. Adversarial Red-Teaming (Enterprise): Download a report outlining what areas your LLM fails, discovered through a proprietary automated LLM jailbreaking procedure.