Our Python SDK got smarter. We developed a Typscript SDK too. We are updating our SDK code blocks. Python SDKhere.Typscript SDKhere.
Description
TutorialsEvals

Patronus Evaluators

Using pre-built evaluators from the Patronus platform

Note: For comprehensive API documentation and detailed examples, please refer to the Patronus SDK Evaluators Documentation.

Patronus Evaluators are powerful pre-built evaluators that run on Patronus infrastructure. They provide sophisticated assessment capabilities without requiring you to implement complex evaluation logic yourself.

Using Patronus Remote Evaluators

The easiest way to use Patronus evaluators is through the RemoteEvaluator class:

from patronus import init
from patronus.evals import RemoteEvaluator
 
init()
 
# Create a hallucination detector
hallucination_checker = RemoteEvaluator(
    "lynx", 
    "patronus:hallucination",
    explain_strategy="always"  # Control when explanations are generated
)
 
result = hallucination_checker.evaluate(
    task_input="What is the largest animal in the world?",
    task_output="The giant sandworm is the largest animal.",
    task_context="The blue whale is the largest known animal."
)
result.pretty_print()

Explanations in Evaluations

Explanations are justifications attached to evaluation results, typically generated by an LLM. Patronus evaluators support explanations by default, with options to control when they're generated.

Controlling Explanation Generation

The explain_strategy parameter controls when explanations are generated:

  • "never": No explanations are generated for any evaluation results
  • "on-fail": Only generates explanations for failed evaluations
  • "on-success": Only generates explanations for passed evaluations
  • "always" (default): Generates explanations for all evaluations
# Only generate explanations for failed evaluations
factual_checker = RemoteEvaluator(
    "lynx", 
    "patronus:factual-accuracy",
    explain_strategy="on-fail"  # Only explain failures
)
 
# Never generate explanations (fastest option)
conciseness_checker = RemoteEvaluator(
    "judge", 
    "patronus:conciseness",
    explain_strategy="never"  # No explanations
)

Performance Note: For optimizing latency in production environments, it's recommended to use either explain_strategy="never" or explain_strategy="on-fail" to reduce the number of generated explanations.

Using with Tracing

Remote evaluators integrate seamlessly with Patronus tracing:

from patronus import init, traced
from patronus.evals import RemoteEvaluator
 
init()
 
@traced()
def generate_response(query: str) -> str:
    """Generate a response to a user query."""
    # In a real application, this would call an LLM
    return "The blue whale can grow up to 100 feet long and weigh 200 tons."
 
@traced()
def process_query(query: str):
    """Process a user query with evaluation."""
    response = generate_response(query)
    
    # Use a remote evaluator
    fact_checker = RemoteEvaluator(
        "lynx", 
        "patronus:factual-accuracy",
        explain_strategy="on-fail"  # Only explain failures
    )
    
    # Evaluate the response
    result = fact_checker.evaluate(
        task_input=query,
        task_output=response
    )
    
    return {
        "query": query,
        "response": response,
        "factually_accurate": result.pass_,
        "accuracy_score": result.score,
        "explanation": result.explanation if not result.pass_ else "Passed"
    }
 
# Process a query
response = process_query("How big is a blue whale?")

Using in Experiments

Remote evaluators are particularly valuable in experiments for systematic evaluation:

from patronus import init
from patronus.evals import RemoteEvaluator
from patronus.experiments import run_experiment
 
fuzzy_match = RemoteEvaluator("judge-small", "patronus:fuzzy-match")
exact_match = RemoteEvaluator("exact-match", "patronus:exact-match")
 
# Run an experiment with remote evaluators
experiment = run_experiment(
    dataset=dataset,
    task=my_tesk,
    evaluators=[fuzzy_match, exact_match],
)

For more details on using Patronus evaluators, see the Evaluation API documentation.

On this page