Skip to main content
The limits on this page apply to the standard paid tier — any funded Venice account using the API lands here. There is no separate lower API tier; partners sit above this tier with higher limits. How limits are applied:
  • Per model. Each model resolves to a model-specific override if one exists, otherwise a default based on the model’s size class and type.
  • Text models enforce both limits. A requests-per-minute (RPM) and a tokens-per-minute (TPM) limit apply — whichever you hit first returns a 429.
  • Video and music models are not rate-limited. They’re priced by usage/credits instead.
The default limits below are a useful reference, but the /api_keys/rate_limits endpoint is the canonical way to fetch your current limits:

View Your Limits

Interactive playground

Rate Limit Logs

See which requests hit limits
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
  -H "Authorization: Bearer $VENICE_API_KEY"

Default Limits

Text & Embedding Models

Text, reasoning, and embedding models are grouped into size classes. Each model card on the Models page displays its size badge.
Size classRequests/minTokens/min
X-Small5005,000,000
Small1503,000,000
Medium1002,000,000
Large1002,000,000
Some upstream providers run on their own classes instead of the size-class defaults:
Provider classRequests/minTokens/min
DeepInfra15010,000,000
Anthropic (direct)5005,000,000
xAI (direct)50010,000,000
OpenRouter1,000None
Parasail1,000None

Image Models

Covers image generation, upscaling, and inpainting.
ClassRequests/min
Default20
Bytedance50
Fal50
xAI grok-imagine120
xAI grok-imagine (pro)20

Audio Models

TypeRequests/min
Text-to-speech (TTS)60
Speech-to-text (ASR)60

Video & Music Models

Not rate-limited — these are billed by usage/credits.

Handling Errors

Failed requests (500, 503, 429) should be retried with exponential backoff. For 429 errors specifically, check the x-ratelimit-reset-requests header for the exact Unix timestamp when you can retry. Most HTTP libraries have built-in retry mechanisms that handle this automatically.

Abuse Protection

If you generate more than 20 failed requests in 30 seconds, the API will block further requests for 30 seconds:
Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again.

Response Headers

Every response includes these headers:
HeaderDescription
x-ratelimit-limit-requestsMax requests allowed in current window
x-ratelimit-remaining-requestsRequests remaining in current window
x-ratelimit-reset-requestsUnix timestamp when window resets
x-ratelimit-limit-tokensMax tokens allowed per minute
x-ratelimit-remaining-tokensTokens remaining in current minute
x-ratelimit-reset-tokensSeconds until token limit resets

Partner Tier

Partners (partner-tier-1) sit above the standard tier with significantly higher limits, tuned to their specific usage. If you’re consistently hitting your rate limits and your usage patterns show sustained demand over time, reach out to discuss partner access: api@venice.ai.