Logs
Logs contain all data associated with your AI application, including inputs and outputs to your LLM systems. Note that this does not include telemetry and metadata surrounding function executions, which is part of spans. Evals are performed on log data, which includes
- User inputs to LLMs and agents
- LLM and agent outputs
- Documents returned by retrieval systems
- Intermediary outputs in chained calls
A log can be part of an experiment, or represent a single execution (for example, in a live monitoring configuration). In the following sections, we describe how to log a single AI execution and obtain evaluation results.
See the Experiments section to learn more about how to run batches of logs in experiments to optimize performance over a testing set.
See the Evals section to read more about how evaluators use logs to produce evaluation results, and how to use evaluation results to drive improvements in your AI application.