Comparisons
Using Patronus, you can understand how your LLM is behaving using our Evaluators. In the Comparisons tab, we take aggregate evaluation data and display statistics on how your LLM is performing. We call these Performance Snapshots.
Filter down your Performance Snapshots by time range, evaluation criteria, and tags, to get more detailed insight into your LLM's performance and identify specific failure modes that require your attention.
Add multiple Performance Snapshots to the Comparisons page to get side-by-side analytics. Use the side-by-side view to determine the best LLM for your GenAI application and track changes in LLM performance over time.
As of now, you can only compare Apps. We will soon add support to also compare Evaluation Runs.
Start using the Comparisons feature by navigating to the Comparisons tab, or click here.
Pick an App. In the picture above "default" is selected. You can click on "Search" to filter down by evaluators and tags.
To compare with another Performance Snapshot, click on "Add snapshot". In the example below, we're comparing evaluations in the "default" App across different time ranges - June and July for Snapshot A and August for Snapshot B:
You can see the total pass and fail percentages on evaluations and a breakdown of exactly which evaluators performed poorly.
Updated 16 days ago