API Performance

We test our API endpoints regularly to ensure they are returning responses reliably. For reference, please find our response performance metrics as of March 25th, 2024. We are actively working to increase bandwidth and reduce latency. Note that your mileage may vary as compared to our testing setup.

Evaluator Name	Latency* (in s)	QPS**	Max Input Size	Average Input Tokens (for benchmarking)
toxicity-2024-05-16	0.44	15.53	20kb
phi-2024-05-31	0.47	15.88	10k chars
pii-2024-05-31	0.47	16.69	10k chars
retrieval-answer-relevance-large-2024-05-31	4.59	1.28	10k chars	Input: 17 tokens Output: 48 tokens
retrieval-hallucination-large-2024-05-31	6.11	1.00	retrieved context: 50k tokens, 10 items input: 10k chars output: 10k chars	Input: 17 tokens Output: 48 tokens Retrieved context: 60 tokens
retrieval-context-relevance-large-2024-05-31	5.22	1.17	retrieved context: 50k tokens, 10 items input: 10k chars output: 10k chars	Input: 17 tokens Retrieved context: 60 tokens
retrieval-context-sufficiency-large-2024-05-31	7.52	0.12	retrieved_context: 50k tokens, 10 items input: 10k chars output: 10k chars	Input: 17 tokens Label: 48 tokens Retrieved context: 60 tokens
custom-large-2024-05-16	2.51	2.76	10k chars	Input: 17 tokens Output: 48 tokens

*Latency numbers are calculated as the average response time across 50 requests made sequentially to a specific evaluator by a single worker.

**QPS numbers are calculated by taking the max number of queries we see per second using a varying number of workers. The total number of queries made in each configuration is 50.