> ## Documentation Index
> Fetch the complete documentation index at: https://veniceai-docs-sidebar-topnav-design.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate Limits

> Request and token rate limits for the Venice API.

The limits on this page apply to the **standard paid tier** — any funded Venice account using the API lands here. There is no separate lower API tier; [partners](#partner-tier) sit above this tier with higher limits.

How limits are applied:

* **Per model.** Each model resolves to a model-specific override if one exists, otherwise a default based on the model's **size class** and **type**.
* **Text models enforce both limits.** A requests-per-minute (RPM) *and* a tokens-per-minute (TPM) limit apply — whichever you hit first returns a `429`.
* **Video and music models are not rate-limited.** They're priced by usage/credits instead.

The default limits below are a useful reference, but the `/api_keys/rate_limits` endpoint is the canonical way to fetch your current limits:

<CardGroup cols={2}>
  <Card title="View Your Limits" icon="gauge-high" href="/api-reference/endpoint/api_keys/rate_limits?playground=open">
    Interactive playground
  </Card>

  <Card title="Rate Limit Logs" icon="clock-rotate-left" href="/api-reference/endpoint/api_keys/rate_limit_logs?playground=open">
    See which requests hit limits
  </Card>
</CardGroup>

```bash theme={"system"}
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
  -H "Authorization: Bearer $VENICE_API_KEY"
```

## Default Limits

### Text & Embedding Models

Text, reasoning, and embedding models are grouped into size classes. Each model card on the [Models page](/models/text) displays its size badge.

| Size class | Requests/min | Tokens/min |
| :--------- | -----------: | ---------: |
| X-Small    |          500 |  5,000,000 |
| Small      |          150 |  3,000,000 |
| Medium     |          100 |  2,000,000 |
| Large      |          100 |  2,000,000 |

Some upstream providers run on their own classes instead of the size-class defaults:

| Provider class     | Requests/min | Tokens/min |
| :----------------- | -----------: | ---------: |
| DeepInfra          |          150 | 10,000,000 |
| Anthropic (direct) |          500 |  5,000,000 |
| xAI (direct)       |          500 | 10,000,000 |
| OpenRouter         |        1,000 |       None |
| Parasail           |        1,000 |       None |

### Image Models

Covers image generation, upscaling, and inpainting.

| Class                  | Requests/min |
| :--------------------- | -----------: |
| Default                |           20 |
| Bytedance              |           50 |
| Fal                    |           50 |
| xAI grok-imagine       |          120 |
| xAI grok-imagine (pro) |           20 |

### Audio Models

| Type                 | Requests/min |
| :------------------- | -----------: |
| Text-to-speech (TTS) |           60 |
| Speech-to-text (ASR) |           60 |

### Video & Music Models

Not rate-limited — these are billed by usage/credits.

## Handling Errors

Failed requests (500, 503, 429) should be retried with exponential backoff.

For 429 errors specifically, check the `x-ratelimit-reset-requests` header for the exact Unix timestamp when you can retry. Most HTTP libraries have built-in retry mechanisms that handle this automatically.

### Abuse Protection

If you generate more than 20 failed requests in 30 seconds, the API will block further requests for 30 seconds:

```
Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again.
```

## Response Headers

Every response includes these headers:

| Header                           | Description                            |
| :------------------------------- | :------------------------------------- |
| `x-ratelimit-limit-requests`     | Max requests allowed in current window |
| `x-ratelimit-remaining-requests` | Requests remaining in current window   |
| `x-ratelimit-reset-requests`     | Unix timestamp when window resets      |
| `x-ratelimit-limit-tokens`       | Max tokens allowed per minute          |
| `x-ratelimit-remaining-tokens`   | Tokens remaining in current minute     |
| `x-ratelimit-reset-tokens`       | Seconds until token limit resets       |

## Partner Tier

Partners (`partner-tier-1`) sit above the standard tier with significantly higher limits, tuned to their specific usage.

If you're consistently hitting your rate limits and your usage patterns show **sustained demand over time**, reach out to discuss partner access: [api@venice.ai](mailto:api@venice.ai).
