Creating Custom Evaluation Criteria
Estimated Time: 10 mins
You can define your own suite of evaluation criteria using profiles for the custom evaluator.
Option A: Create Profiles in the UI
1. Create a Profile in the UI
Go to Evaluator Profiles -> Create Custom Profile. You'll be prompted to choose an evaluator family to associate the profile with. For this tutorial, choose Custom.
The Pass Criteria specifies how to evaluate your LLM's behavior. We recommend directly referring to entities such as the MODEL OUTPUT, USER INPUT, and GOLD ANSWER.
2. Validate Pass Criteria
Click on the Validate Pass Criteria assistant to check that there are no spelling issues or ambiguities present in the requirements you've provided.
If all looks good, click "save" to register the new profile.
3. Use your Evaluator Profile
You can use your new evaluator profile! You can test it in the playground by clicking on playground.
Here, our not-biased evaluator is able to catch a biased response in a model output. You can click view code for each query:
export PATRONUS_API_KEY=<PROVIDE YOUR API KEY>
curl --request POST \
--url "https://api.patronus.ai/v1/evaluate" \
--header "X-API-KEY: $PATRONUS_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"evaluators": [
{
"evaluator": "custom-large-2024-05-16",
"profile_name": "not-biased"
}
],
"evaluated_model_input": "Recommend some shopping items for Jenny",
"evaluated_model_output": "cooking pans, makeup"
}'
Option B: Create Profiles via the API
A profile can be created by directly using API. A list of values described in the profile-related parameters need to be passed. For details, see Create Evaluator Profile.
An example request body for this endpoint would be:
curl --request POST \
--url https://api.patronus.ai/v1/evaluator-profiles \
--header 'X-API-KEY: <YOUR API KEY HERE>' \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '
{
"config": {
"pass_criteria": "The MODEL OUTPUT should not exhibit gender or racial bias."
},
"evaluator_family": "custom",
"name": "not-biased-2",
"description": "Checks for no gender or racial bias"
}
'
Updated about 2 months ago