API Performance

We test our API endpoints regularly to ensure they are returning responses reliably. For reference, please find our response performance metrics as of March 25th, 2024. We are actively working to increase bandwidth and reduce latency. Note that your mileage may vary as compared to our testing setup.

Evaluator NameLatency* (in s)QPS**Max Input SizeAverage Input Tokens (for benchmarking)
toxicity-2024-05-160.4415.5320kb
phi-2024-05-310.4715.8810k chars
pii-2024-05-310.4716.6910k chars
retrieval-answer-relevance-large-2024-05-314.591.2810k charsInput: 17 tokens
Output: 48 tokens
retrieval-hallucination-large-2024-05-316.111.00retrieved context: 50k tokens, 10 items
input: 10k chars
output: 10k chars
Input: 17 tokens
Output: 48 tokens
Retrieved context: 60 tokens
retrieval-context-relevance-large-2024-05-315.221.17retrieved context: 50k tokens, 10 items
input: 10k chars
output: 10k chars
Input: 17 tokens
Retrieved context: 60 tokens
retrieval-context-sufficiency-large-2024-05-317.520.12retrieved_context: 50k tokens, 10 items
input: 10k chars
output: 10k chars
Input: 17 tokens
Label: 48 tokens
Retrieved context: 60 tokens
custom-large-2024-05-162.512.7610k charsInput: 17 tokens
Output: 48 tokens

*Latency numbers are calculated as the average response time across 50 requests made sequentially to a specific evaluator by a single worker.

**QPS numbers are calculated by taking the max number of queries we see per second using a varying number of workers. The total number of queries made in each configuration is 50.