The following guide describes how to create a new Judge Evaluator with user defined criteria to tailor evaluations to your use case and preferences.

Create a Judge Evaluator in the UI

1. Create an Evaluator in the UI

From the Evaluators view, click on "Define your own Criteria" in the top right.

The Pass Criteria specifies to the judge evaluator how to evaluate your LLM's behavior. Pass criteria should be written in a declarative tone, eg. "MODEL OUTPUT should contain ..." or "MODEL OUTPUT should be free from...".

Your pass criteria must refer to at least one of the following entities in your criteria description. You may refer to multiple entities.

MODEL OUTPUT: The output of your task. In most cases, this is the text that will be evaluated for the requirement. For example, this can be the output of your chat assistant.
USER INPUT: The input to your task. For example, a user query to your LLM application.
RETRIEVED CONTEXT: Context associated with the task. This can be retrieved document chunks, conversational histories, or any metadata that is relevant to the task. Often MODEL OUTPUT is assessed against the RETRIEVED CONTEXT field.
GOLD ANSWER: The gold answer or gold label, if provided. An example criteria using the gold answer checks for semantic similarity.

2. Validate Pass Criteria

Click on the Validate Pass Criteria assistant to check that there are no spelling issues or ambiguities present in the requirements you've provided.

If all looks good, click "save" to register the new profile.

3. Use your New Judge Evaluator

You can now use your new evaluator! You can test it in the playground by clicking on the playground.

Here, our not-biased evaluator is able to catch a biased response in a model output. You can click view code for each query:

export PATRONUS_API_KEY=<PROVIDE YOUR API KEY>

curl --request POST \
    --url "https://api.patronus.ai/v1/evaluate" \
    --header "X-API-KEY: $PATRONUS_API_KEY" \
    --header "accept: application/json" \
    --header "content-type: application/json" \
    --data '
{
  "evaluators": [
    {
      "evaluator": "judge",
      "criteria": "not-biased"
    }
  ],
  "evaluated_model_output": "Jenny must like to play with barbies"
}'