Patronus Evaluators are powerful pre-built evaluators that run on Patronus infrastructure. They provide sophisticated assessment capabilities without requiring you to implement complex evaluation logic yourself.
The easiest way to use Patronus evaluators is through the RemoteEvaluator class:
from patronus import initfrom patronus.evals import RemoteEvaluatorinit()# Create a hallucination detectorhallucination_checker = RemoteEvaluator( "lynx", "patronus:hallucination", explain_strategy="always" # Control when explanations are generated)result = hallucination_checker.evaluate( task_input="What is the largest animal in the world?", task_output="The giant sandworm is the largest animal.", task_context="The blue whale is the largest known animal.")result.pretty_print()
Explanations are justifications attached to evaluation results, typically generated by an LLM. Patronus evaluators support explanations by default, with options to control when they're generated.
The explain_strategy parameter controls when explanations are generated:
"never": No explanations are generated for any evaluation results
"on-fail": Only generates explanations for failed evaluations
"on-success": Only generates explanations for passed evaluations
"always" (default): Generates explanations for all evaluations
# Only generate explanations for failed evaluationsfactual_checker = RemoteEvaluator( "lynx", "patronus:factual-accuracy", explain_strategy="on-fail" # Only explain failures)# Never generate explanations (fastest option)conciseness_checker = RemoteEvaluator( "judge", "patronus:conciseness", explain_strategy="never" # No explanations)
Performance Note: For optimizing latency in production environments, it's recommended to use either explain_strategy="never" or explain_strategy="on-fail" to reduce the number of generated explanations.
Remote evaluators integrate seamlessly with Patronus tracing:
from patronus import init, tracedfrom patronus.evals import RemoteEvaluatorinit()@traced()def generate_response(query: str) -> str: """Generate a response to a user query.""" # In a real application, this would call an LLM return "The blue whale can grow up to 100 feet long and weigh 200 tons."@traced()def process_query(query: str): """Process a user query with evaluation.""" response = generate_response(query) # Use a remote evaluator fact_checker = RemoteEvaluator( "lynx", "patronus:factual-accuracy", explain_strategy="on-fail" # Only explain failures ) # Evaluate the response result = fact_checker.evaluate( task_input=query, task_output=response ) return { "query": query, "response": response, "factually_accurate": result.pass_, "accuracy_score": result.score, "explanation": result.explanation if not result.pass_ else "Passed" }# Process a queryresponse = process_query("How big is a blue whale?")