- Per model. Each model resolves to a model-specific override if one exists, otherwise a default based on the model’s size class and type.
- Text models enforce both limits. A requests-per-minute (RPM) and a tokens-per-minute (TPM) limit apply — whichever you hit first returns a
429. - Video and music models are not rate-limited. They’re priced by usage/credits instead.
/api_keys/rate_limits endpoint is the canonical way to fetch your current limits:
View Your Limits
Interactive playground
Rate Limit Logs
See which requests hit limits
Default Limits
Text & Embedding Models
Text, reasoning, and embedding models are grouped into size classes. Each model card on the Models page displays its size badge.| Size class | Requests/min | Tokens/min |
|---|---|---|
| X-Small | 500 | 5,000,000 |
| Small | 150 | 3,000,000 |
| Medium | 100 | 2,000,000 |
| Large | 100 | 2,000,000 |
| Provider class | Requests/min | Tokens/min |
|---|---|---|
| DeepInfra | 150 | 10,000,000 |
| Anthropic (direct) | 500 | 5,000,000 |
| xAI (direct) | 500 | 10,000,000 |
| OpenRouter | 1,000 | None |
| Parasail | 1,000 | None |
Image Models
Covers image generation, upscaling, and inpainting.| Class | Requests/min |
|---|---|
| Default | 20 |
| Bytedance | 50 |
| Fal | 50 |
| xAI grok-imagine | 120 |
| xAI grok-imagine (pro) | 20 |
Audio Models
| Type | Requests/min |
|---|---|
| Text-to-speech (TTS) | 60 |
| Speech-to-text (ASR) | 60 |
Video & Music Models
Not rate-limited — these are billed by usage/credits.Handling Errors
Failed requests (500, 503, 429) should be retried with exponential backoff. For 429 errors specifically, check thex-ratelimit-reset-requests header for the exact Unix timestamp when you can retry. Most HTTP libraries have built-in retry mechanisms that handle this automatically.
Abuse Protection
If you generate more than 20 failed requests in 30 seconds, the API will block further requests for 30 seconds:Response Headers
Every response includes these headers:| Header | Description |
|---|---|
x-ratelimit-limit-requests | Max requests allowed in current window |
x-ratelimit-remaining-requests | Requests remaining in current window |
x-ratelimit-reset-requests | Unix timestamp when window resets |
x-ratelimit-limit-tokens | Max tokens allowed per minute |
x-ratelimit-remaining-tokens | Tokens remaining in current minute |
x-ratelimit-reset-tokens | Seconds until token limit resets |
Partner Tier
Partners (partner-tier-1) sit above the standard tier with significantly higher limits, tuned to their specific usage.
If you’re consistently hitting your rate limits and your usage patterns show sustained demand over time, reach out to discuss partner access: api@venice.ai.