Evaluators Overview

Evaluators use data from logs to produce evaluation results, i.e. measurable assessments of an AI system's performance. For example, an evaluator scoring the relevance of retrieved chunks can be used to assess retriever quality in a RAG pipeline, and an evaluator detecting prompt injections can be used to protect chatbot developers from malicious users.

Patronus supports many different types of evaluators.

  • Function-based: These are locally executed functions defined in the python SDK.
  • Class-based: Developers can instantiate evaluator classes with .evaluate() methods. Once instantiated, class-based evaluators will execute evals locally and results are logged to the platform in experiments.
  • Patronus API: The Patronus API supports a suite of powerful LLM judges that are benchmarked for human alignment and quality. Patronus API evaluators can be finetuned for various use cases. Evals are executed remotely on Patronus infrastructure.

You can use evaluators in Logs and Experiments. Read on to see how to define different kinds of evaluators, and reuse them in different parts of your workflow.