API Performance
We test our API endpoints regularly to ensure they are returning responses reliably. For reference, please find our response performance metrics as of March 25th, 2024. We are actively working to increase bandwidth and reduce latency. Note that your mileage may vary as compared to our testing setup.
Evaluator Name | Latency* (in s) | QPS** | Max Input Size | Average Input Tokens (for benchmarking) |
---|---|---|---|---|
toxicity-2024-05-16 | 0.44 | 15.53 | 20kb | |
phi-2024-05-31 | 0.47 | 15.88 | 10k chars | |
pii-2024-05-31 | 0.47 | 16.69 | 10k chars | |
retrieval-answer-relevance-large-2024-05-31 | 4.59 | 1.28 | 10k chars | Input: 17 tokens Output: 48 tokens |
retrieval-hallucination-large-2024-05-31 | 6.11 | 1.00 | retrieved context: 50k tokens, 10 items input: 10k chars output: 10k chars | Input: 17 tokens Output: 48 tokens Retrieved context: 60 tokens |
retrieval-context-relevance-large-2024-05-31 | 5.22 | 1.17 | retrieved context: 50k tokens, 10 items input: 10k chars output: 10k chars | Input: 17 tokens Retrieved context: 60 tokens |
retrieval-context-sufficiency-large-2024-05-31 | 7.52 | 0.12 | retrieved_context: 50k tokens, 10 items input: 10k chars output: 10k chars | Input: 17 tokens Label: 48 tokens Retrieved context: 60 tokens |
custom-large-2024-05-16 | 2.51 | 2.76 | 10k chars | Input: 17 tokens Output: 48 tokens |
*Latency numbers are calculated as the average response time across 50 requests made sequentially to a specific evaluator by a single worker.
**QPS numbers are calculated by taking the max number of queries we see per second using a varying number of workers. The total number of queries made in each configuration is 50.
Updated about 1 month ago