LLM Monitoring
We provide the LLM Monitoring dashboard as the central view of the web app. As long as you toggle on logging for your Patronus API calls to the /v1/evaluate
endpoint, your results will appear here. We strongly you set the capture
parameter to all
or at least on-fail
once you're done testing your API calls since it's always helpful to be able to look through past evaluation results.
A simple API call like the following data payload is all you need for results to start appearing:
data = {
"evaluators": [
{"evaluator": "retrieval-hallucination"},
{"evaluator": "retrieval-answer-relevance"},
],
"evaluated_model_input": "How do I deposit a check at the bank?",
"evaluated_model_retrieved_context": [
"To deposit a check at the bank, you should first go to your nearest bank branch. You should find an attendant upon arrival and hand them your check. They will then process the check and deposit the funds into your account.",
"You can also deposit a check at an ATM or through your bank's mobile app.",
"Remember to sign the back of the check before depositing it.",
],
"evaluated_model_output": "You can deposit a check by either going to your nearest bank branch, an ATM, or through your bank's mobile app.",
"app": "demo_retrieval_banking",
"capture": "all",
"tags": {"model": "gpt-4"},
}
response = requests.post(
"https://api.patronus.ai/v1/evaluate", headers=headers, json=data
)
response.raise_for_status()
Filtering Results
As you log more calls to the API, your LLM Monitoring view will start to get more convoluted. You may also be calling the Patronus API to verify LLM outputs for a variety of different projects or use cases. Additionally, you may be comparing different models in each or tweaking parameters to check how that affects performance. That is why we provide multiple ways to segment your data so you can keep track of what you want. You can do the following:
- Set your app name: This is useful for differentiating between projects or use cases. In the LLM Monitoring view, you would select the app in the top left to only view results from that app. The default app name is "default" and this will be set when you do not provide an app name in your API call.
- Choose a time range: You can choose a time range for queries you are interested. That allows you to focus on whatever dates are most relevant to you.
- Search by criteria: This allows you to filter results in the search bar on the top right. Selecting specific criteria you are interested lets you focus on what you are evaluating for. You can select multiple criteria at a time.
- Search by result: All of our evaluators return a pass/fail flag along with a score between 0 and 1 in some situations. By filtering the results this way, you can focus only on failures for instance if that is what you are interested in.
- Search by tags: Using tags allows you to customize the LLM Monitoring view however you would like. In other words, if you're not satisfied by the options we provide you with to start then you can add additional tags that will be logged to the backend. You can then search for these tags which will be a combination of a key and a value. This is done very easily by providing a dictionary in the API call (as shown above).
Export CSV
Once you've filtered the view to what you are interested, you can click into a detailed view or export the CSV. We discuss the detailed view below. To export a CSV, all you need to do is click the button Export CSV on the UI and voilà!
This can be helpful if you want to save your records locally or if you would like to perform analysis in some other environment. You could also filter for failed evaluation outputs and construct either a new benchmark to see how your model is improving or use those downloaded samples as a fine-tuning dataset in case you're interested in fine-tuning on failed example.
Detailed View
Each evaluation instance (one LLM output evaluated by a specific evaluator) gets its own row in the dashboard. You can view the results and all additional details by clicking on the row in question. You can also expand that row into a detailed informational page if you want an easier view to inspect as shown below.
And that's pretty much it for the LLM Monitoring view. We keep it simple so you can make effective use of it while having it encapsulate potentially every evaluation API call you make in case you need to refer to historical data.
Updated 29 days ago