The following sections walk you through how to set up evals in your AI application, from optimizing agent and LLM performance in development and testing, to monitoring performance in prod.

Evals: Understand the components of an eval, and set up evaluators that you can use in logging and experimentation everywhere in your AI workflow.

Logs: Logs contain data associated with AI application executions. A log is a single data sample, containing input/outputs of the AI application.

Datasets: Manage datasets in different formats (CSV, JSON), upload datasets, and learn more about our dataset generation services.

Experiments: A/B test and optimize LLM and agent performance with experiments on different prompt, LLM and data configurations.

Annotations: Use our annotations interface to human review AI application outputs, in development or live traffic.

Comparisons: Visualize performance of your AI applications, compare outputs side-by-side and obtain insights to improve system performance over time.

Monitoring: Monitor and receive real time alerts on LLM and agent interactions in production

Evaluation API: Automatically catch hallucinations and unsafe outputs using our powerful suite of in-house evaluators through our Evaluation API, including Lynx. Or define your own evaluators in our SDK.


What’s Next