Evaluators 🖊️

Evaluators are code that execute evaluations to produce evaluation results, i.e. measurable assessments of an AI system's performance. For example, an evaluator scoring the relevance of retrieved chunks can be used to assess retriever quality in a RAG pipeline, and an evaluator detecting prompt injections can be used to protect chatbot developers from malicious users.

Patronus supports many different types of evaluators.

  • Function-based: These are locally executed functions defined in the python SDK.
  • Class-based: Developers can instantiate evaluator classes with .evaluate() methods. Once instantiated, class-based evaluators will execute evals locally and results are logged to the platform in experiments.
  • Patronus API: The Patronus API supports a suite of powerful LLM judges that are benchmarked for human alignment and quality. Patronus API evaluators can be finetuned for various use cases. Evals are executed remotely on Patronus infrastructure.

Read on to learn more on how to create different evaluators for your AI applications ➡️