NLP Metrics Evaluators
Currently we support bleu
and rouge
as NLP metrics in the metrics-v1 evaluator. These are common metrics in the world of NLP. For more information, you can read this blog.
To specify an NLP metric, pass system:compute-bleu
or system:compute-rouge
in the profile_name
field.
Here's an example API request:
curl --location 'https://api.patronus.ai/v1/evaluate' \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--header 'X-API-KEY: ••••••' \
--data '{
"evaluators": [
{
"evaluator": "metrics",
"profile_name": "system:compute-bleu"
}
],
"output": "hello there general kenobi",
"label": "hello there general kenobi I am doing great today!"
}'
And the example response:
{
"results": [
{
"evaluator_id": "metrics-2024-05-16",
"profile_name": "system:compute-bleu",
"status": "success",
...
"evaluation_result": {
...
"id": "112825104711650870",
"app": "default",
"created_at": "2024-08-08T21:25:10.549203Z",
"evaluator_id": "metrics-2024-05-16",
"profile_name": "system:compute-bleu",
"evaluated_model_system_prompt": null,
"evaluated_model_retrieved_context": null,
"evaluated_model_input": null,
"evaluated_model_output": "hello there general kenobi",
"evaluated_model_gold_answer": "hello there general kenobi I am doing great today!",
"explain": false,
"explain_strategy": "never",
"pass": true,
"score_raw": 0.22,
"score_normalized": -1.0,
"additional_info": {
"score_raw": 0.22,
"positions": null,
"extra": null,
"confidence_interval": null
},
"evaluation_duration": "PT0.216S",
"evaluator_family": "metrics",
"evaluator_profile_public_id": "99c73df3-a3b7-4599-a201-0442c4815778",
"external": false,
},
}
]
}
The metric score is returned in score_raw
. In this case, we have a BLEU score of 0.22.
Note that we return pass=true
by default for this evaluator's system profiles (e.g. system:compute-bleu
or system:compute-rouge
). You can create your own evaluator profiles for the metrics
family and specify the pass threshold.
Updated about 2 months ago