Our Python SDK got smarter. We developed a Typscript SDK too. We are updating our SDK code blocks. Python SDKhere.Typscript SDKhere.
Description

Quick Start - Log your first eval

Learn how to log your first evaluation result with Patronus AI. Follow these steps to log your first evaluation result within minutes!

1. Create an API Key

If you do not have an account yet, sign up for an account at app.patronus.ai

To create an API key, click on API Keys in the navigation bar. Make sure you store this securely as you will not be able to view it again.

2. Installation & Initialization

Install the Patronus SDK with pip:

pip install patronus

Initialize the SDK with your API key:

import patronus
 
# Initialize with your API key
patronus.init(api_key="YOUR_API_KEY")
 
# Alternatively, set PATRONUS_API_KEY environment variable and initialize without arguments
# patronus.init()

You can also use a configuration file:

# patronus.yaml
api_key: "YOUR_API_KEY"
project_name: "Global"
app: "default"

3. Log an evaluation

An evaluation consists of the following pieces:

  • Inputs to your LLM application, e.g., "What is Patronus AI?"
  • Outputs of your LLM application, e.g., "Patronus AI is an LLM evaluation and testing platform."
  • Evaluation criteria, e.g., hallucination, conciseness, toxicity, and more!
from patronus import init
from patronus.evals import RemoteEvaluator
 
# Initialize with your API key
init(api_key="YOUR_API_KEY")
 
# Create a hallucination evaluator
hallucination_check = RemoteEvaluator("lynx", "patronus:hallucination")
 
# Run the evaluation
result = hallucination_check.evaluate(
    task_input="What is the largest animal in the world?",
    task_output="The giant sandworm.",
    task_context="The blue whale is the largest known animal."
)
 
result.pretty_print()

In this example, we evaluate whether the output contains a hallucination using Lynx. The evaluation result is automatically logged to the Logs dashboard.

4. Running Multiple Evaluations

For more comprehensive evaluation, you can run multiple evaluators in a single request:

from patronus import init
from patronus.pat_client import Patronus
from patronus.evals import RemoteEvaluator
 
init(api_key="YOUR_API_KEY")
 
with Patronus() as client:
    # Run multiple evaluators in parallel
    results = client.evaluate(
        evaluators=[
            RemoteEvaluator("lynx", "patronus:hallucination"),
            RemoteEvaluator("judge", "patronus:is-helpful")
        ],
        task_input="What is the largest animal in the world?",
        task_output="The giant sandworm is the largest animal from the Dune universe.",
        task_context="The blue whale is the largest known animal."
    )
 
    results.pretty_print()

5. View Evaluation Logs in UI

Now head to app.patronus.ai/logs to view results for your most recent evaluations!

Evaluation Results consist of the following fields:

  • Result: PASS/FAIL result for whether the LLM passed or failed the test
  • Score: This is a score between 0 and 1 measuring confidence in the result
  • Explanation: Natural language explanation for why the result and score was computed

In this case, Lynx scored the evaluation as FAIL because the context states that the largest animal is the blue whale, not the giant sandworm. We just flagged our first hallucination!

Now that you've logged your first evaluation, you can explore additional API fields, define your own evaluator, or run a batched evaluation experiment.

On this page