Rate Limits | Venice API Docs

The limits on this page apply to the standard paid tier — any funded Venice account using the API lands here. There is no separate lower API tier; partners sit above this tier with higher limits. How limits are applied:

Per model. Each model resolves to a model-specific override if one exists, otherwise a default based on the model’s size class and type.
Text models enforce both limits. A requests-per-minute (RPM) and a tokens-per-minute (TPM) limit apply — whichever you hit first returns a 429.
Video and music models are not rate-limited. They’re priced by usage/credits instead.

The default limits below are a useful reference, but the /api_keys/rate_limits endpoint is the canonical way to fetch your current limits:

View Your Limits

Interactive playground

Rate Limit Logs

See which requests hit limits

curl https://api.venice.ai/api/v1/api_keys/rate_limits \
  -H "Authorization: Bearer $VENICE_API_KEY"

Default Limits

Text & Embedding Models

Text, reasoning, and embedding models are grouped into size classes. Each model card on the Models page displays its size badge.

Size class	Requests/min	Tokens/min
X-Small	500	5,000,000
Small	150	3,000,000
Medium	100	2,000,000
Large	100	2,000,000

Some upstream providers run on their own classes instead of the size-class defaults:

Provider class	Requests/min	Tokens/min
DeepInfra	150	10,000,000
Anthropic (direct)	500	5,000,000
xAI (direct)	500	10,000,000
OpenRouter	1,000	None
Parasail	1,000	None

Image Models

Covers image generation, upscaling, and inpainting.

Class	Requests/min
Default	20
Bytedance	50
Fal	50
xAI grok-imagine	120
xAI grok-imagine (pro)	20

Audio Models

Type	Requests/min
Text-to-speech (TTS)	60
Speech-to-text (ASR)	60

Video & Music Models

Not rate-limited — these are billed by usage/credits.

Handling Errors

Failed requests (500, 503, 429) should be retried with exponential backoff. For 429 errors specifically, check the x-ratelimit-reset-requests header for the exact Unix timestamp when you can retry. Most HTTP libraries have built-in retry mechanisms that handle this automatically.

Abuse Protection

If you generate more than 20 failed requests in 30 seconds, the API will block further requests for 30 seconds:

Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again.

Response Headers

Every response includes these headers:

Header	Description
`x-ratelimit-limit-requests`	Max requests allowed in current window
`x-ratelimit-remaining-requests`	Requests remaining in current window
`x-ratelimit-reset-requests`	Unix timestamp when window resets
`x-ratelimit-limit-tokens`	Max tokens allowed per minute
`x-ratelimit-remaining-tokens`	Tokens remaining in current minute
`x-ratelimit-reset-tokens`	Seconds until token limit resets

Partner Tier

Partners (partner-tier-1) sit above the standard tier with significantly higher limits, tuned to their specific usage. If you’re consistently hitting your rate limits and your usage patterns show sustained demand over time, reach out to discuss partner access: api@venice.ai.

View Your Limits

Rate Limit Logs

​Default Limits

​Text & Embedding Models

​Image Models

​Audio Models

​Video & Music Models

​Handling Errors

​Abuse Protection

​Response Headers

​Partner Tier