Visualizing Experiments

You can visualize experiments in many ways. By default, the Experiments view displays the accuracy, mean score and evaluation metrics for different experiments over time.

Per-metric Performance

When a user selects a project, we display experiment performance for each evaluation metric separately. See the below where answer-relevance and fuzzy-match are plotted for different experiments.

Per-model Performance

In the default view of Experiments, we display the performance plot for each project in the same chart. You can configure different projects for different models. For example, the below shows GPT-4o vs. GPT-4o-mini across experiments in an agent evaluation setting.