Log Batched Evals
Users have the ability to log batched evals with the POST /v1/evaluation-results/batch
endpoint. You can check out the API reference guide here.
Use the Logs and Comparisons features to understand your LLM's performance across all evaluations, whether they are run through Patronus evaluators or not.
All the features available to you through evaluations run through Patronus - like organizing evaluations and experiments into projects, adding tags and filters, specifying raw scores and pass or fail - are available for imported evaluations as well. Below is a sample cURL request showcasing these features.
Imported evaluation results are distinguished from those generated through the Patronus platform in the following ways:
- The attribute
External
is set toTrue
- The
Evaluator Id
is pre-pended with the keywordexternal:
. This is so evaluator names for imported evaluations do not clash with those run through Patronus.
You can see both of these in Logs.