API Reference | Venice API Docs

The Venice API is a REST API at https://api.venice.ai/api/v1 for private inference across uncensored, frontier models. It implements the OpenAI specification, so existing OpenAI clients and SDKs work by changing the base URL — then adds Venice-native features like server-side web search, characters, and wallet payments. The full spec is also available as an OpenAPI YAML file.

Quickstart

Point any OpenAI-compatible client at Venice’s base URL (https://api.venice.ai/api/v1). Create and manage keys in your API settings.

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

All API requests require HTTP Bearer authentication:

Authorization: Bearer VENICE_API_KEY

Your API key is a secret. Do not share it or expose it in any client-side code.

Differences from OpenAI

Venice is OpenAI-compatible. The main additions and differences:

venice_parameters — Venice-only request options (web search, scraping, character personas, thinking controls). Reference below.
System prompts — Venice appends defaults tuned for natural, uncensored output; disable with include_venice_system_prompt: false. Details below.
Models — Use Venice model IDs directly rather than OpenAI mappings. Browse models.
Response headers — Balance, rate-limit, model, and content-safety metadata on every response. Reference below.
Private inference — TEE-backed and end-to-end-encrypted model options. Privacy models.
Payments — Credits, a daily DIEM allowance, or per-request USDC via x402. x402 guide.

System Prompts

Venice provides default system prompts designed to ensure uncensored and natural model responses. You have two options for handling system prompts:

Default Behavior: Your system prompts are appended to Venice’s defaults
Custom Behavior: Disable Venice’s system prompts entirely

Disabling Venice System Prompts

Use the venice_parameters option to remove Venice’s default system prompts:

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [
      {"role": "system", "content": "Your custom system prompt"},
      {"role": "user", "content": "Why is the sky blue?"}
    ],
    "venice_parameters": {
      "include_venice_system_prompt": false
    }
  }'

Venice Parameters

The venice_parameters object allows you to access Venice-specific features not available in the standard OpenAI API:

Parameter	Type	Description	Default
`character_slug`	string	The character slug of a public Venice character (discoverable as “Public ID” on the published character page)	-
`strip_thinking_response`	boolean	Strip `<think></think>` blocks from the response (models using legacy `<think>` tag format). See Reasoning Models.	`false`
`disable_thinking`	boolean	On supported reasoning models, disable thinking and strip the `<think></think>` blocks from the response	`false`
`enable_web_search`	string	Enable web search for this request (`off`, `on`, `auto` - auto enables based on model’s discretion) Additional usage-based pricing applies, see pricing.	`off`
`enable_web_scraping`	boolean	Enable web scraping of up to 5 URLs detected in the user message. Scraped content augments responses and bypasses web search. Only successfully scraped URLs are billed. Additional usage-based pricing applies, see pricing.	`false`
`enable_x_search`	boolean	Enable xAI’s native search (web + X/Twitter) for supported Grok models (e.g., `grok-4-20-beta`). Provides higher quality search results by using xAI’s search infrastructure. When enabled, Venice’s standard web search is bypassed. Additional usage-based pricing applies, see pricing.	`false`
`enable_web_citations`	boolean	When web search is enabled, request that the LLM cite its sources using `[REF]0[/REF]` format	`false`
`include_search_results_in_stream`	boolean	Experimental: Include search results in the stream as the first emitted chunk	`false`
`return_search_results_as_documents`	boolean	Surface search results in an OpenAI-compatible tool call named `venice_web_search_documents` for LangChain integration	`false`
`include_venice_system_prompt`	boolean	Whether to include Venice’s default system prompts alongside specified system prompts	`true`

These parameters can also be specified as model suffixes appended to the model name (e.g., zai-org-glm-5:enable_web_search=auto). See Model Feature Suffixes for details.

Prompt Caching

Venice supports prompt caching on select models to reduce latency and costs for repeated content. For supported models, Venice automatically caches system prompts—no code changes required. You can also manually mark content for caching using the cache_control property on message content.

Parameter	Type	Description
`prompt_cache_key`	string	Optional routing hint to improve cache hit rates. When supplied, Venice routes requests to the same backend infrastructure, increasing the likelihood of cache hits across multi-turn conversations.

See Prompt Caching for details on how caching works, billing, and best practices.

Response Headers

All Venice API responses include HTTP headers with request, rate-limit, model, and account-balance metadata. In addition to error codes returned from API responses, you can inspect these headers to get the unique ID of a particular request, monitor rate limiting, and track your account balance. Venice recommends logging request IDs (CF-RAY header) in production deployments for more efficient troubleshooting with our support team, should the need arise.

Header	Type	Purpose	When Returned
Standard HTTP Headers
`Content-Type`	string	MIME type of the response body (`application/json`, `text/csv`, `image/png`, etc.)	Always
`Content-Encoding`	string	Encoding used to compress the response body (`gzip`, `br`)	When client sends `Accept-Encoding` header
`Content-Disposition`	string	How content should be displayed (e.g., `attachment; filename=export.csv`)	When downloading files or exports
`Date`	string	RFC 7231 formatted timestamp when the response was generated	Always
Request Identification
`CF-RAY`	string	Unique identifier for this API request, used for troubleshooting and support requests	Always
`x-venice-version`	string	Current version/revision of the Venice API service (e.g., `20250828.222653`)	Always
`x-venice-timestamp`	string	Server timestamp when the request was processed (ISO 8601 format)	When timestamp tracking is enabled
`x-venice-host-name`	string	Hostname of the server that processed the request	Error responses and debugging scenarios
Model Information
`x-venice-model-id`	string	Unique identifier of the AI model used for the request (e.g., `venice-01-lite`)	Inference endpoints using AI models
`x-venice-model-name`	string	Friendly/display name of the AI model used (e.g., `Venice Lite`)	Inference endpoints using AI models
`x-venice-model-router`	string	Router/backend service that handled the model inference	Inference endpoints when routing info available
`x-venice-model-deprecation-warning`	string	Warning message for models scheduled for deprecation	When using a deprecated model
`x-venice-model-deprecation-date`	string	Date when the model will be deprecated (ISO 8601 date)	When using a deprecated model
Rate Limiting Information
`x-ratelimit-limit-requests`	number	Maximum number of requests allowed in the current time window	All authenticated requests
`x-ratelimit-remaining-requests`	number	Number of requests remaining in the current time window	All authenticated requests
`x-ratelimit-reset-requests`	number	Unix timestamp when the request rate limit resets	All authenticated requests
`x-ratelimit-limit-tokens`	number	Maximum number of tokens (prompt + completion) allowed in the time window	All authenticated requests
`x-ratelimit-remaining-tokens`	number	Number of tokens remaining in the current time window	All authenticated requests
`x-ratelimit-reset-tokens`	number	Duration in seconds until the token rate limit resets	All authenticated requests
`x-ratelimit-type`	string	Type of rate limit applied (`user`, `api_key`, `global`)	When rate limiting is enforced
Pagination Headers
`x-pagination-limit`	number	Number of items per page	Paginated endpoints
`x-pagination-page`	number	Current page number (1-based)	Paginated endpoints
`x-pagination-total`	number	Total number of items across all pages	Paginated endpoints
`x-pagination-total-pages`	number	Total number of pages	Paginated endpoints
Account Balance Information
`x-venice-balance-diem`	string	Your DIEM token balance before the request was processed	All authenticated requests
`x-venice-balance-usd`	string	Your USD credit balance before the request was processed	All authenticated requests
Content Safety Headers
`x-venice-is-blurred`	string	Indicates if generated image was blurred due to content policies (`true`/`false`)	Image generation with Safe Venice enabled
`x-venice-is-content-violation`	string	Indicates if content violates Venice’s content policies (`true`/`false`)	Content generation endpoints
`x-venice-is-adult-model-content-violation`	string	Indicates if content violates adult model content policies (`true`/`false`)	Image generation endpoints
`x-venice-contains-minor`	string	Indicates if image contains minors (`true`/`false`)	Image analysis endpoints with age detection
Client Information
`x-venice-middleface-version`	string	Version of the Venice middleface client	Requests from Venice middleface clients
`x-venice-mobile-version`	string	Version of the Venice mobile app client	Requests from mobile applications
`x-venice-request-timestamp-ms`	number	Client-provided request timestamp in milliseconds	When client provides timestamp in request
`x-venice-control-instance`	string	Control instance identifier for debugging	Image generation endpoints for debugging
Authentication Headers
`x-auth-refreshed`	string	Indicates authentication token was refreshed during request (`true`/`false`)	When authentication tokens are auto-refreshed
`x-retry-count`	number	Number of retry attempts for the request	When request retries occur

Notes and an example of accessing headers

Header Name Case: HTTP headers are case-insensitive, but Venice uses lowercase with hyphens for consistency
String Values: Boolean values in headers are returned as strings ("true" or "false")
Numeric Values: Large numbers and balance values may be returned as strings to prevent precision loss
Optional Headers: Not all headers are returned in every response; presence depends on the endpoint and request context
Compression: Use Accept-Encoding: gzip, br in requests to receive compressed responses where supported

// After making an API request, access headers from the response object
const requestId = response.headers.get('CF-RAY');
const remainingRequests = response.headers.get('x-ratelimit-remaining-requests');
const remainingTokens = response.headers.get('x-ratelimit-remaining-tokens');
const usdBalance = response.headers.get('x-venice-balance-usd');

// Check for model deprecation warnings
const deprecationWarning = response.headers.get('x-venice-model-deprecation-warning');
if (deprecationWarning) {
  console.warn(`Model Deprecation: ${deprecationWarning}`);
}

Best Practices

Rate Limiting: Monitor x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens headers and implement exponential backoff
Balance Monitoring: Track x-venice-balance-usd and x-venice-balance-diem headers to avoid service interruptions
System Prompts: Test with and without Venice’s system prompts to find the best fit for your use case
API Keys: Keep your API keys secure and rotate them regularly
Request Logging: Log CF-RAY header values for troubleshooting with support
Model Deprecation: Check for x-venice-model-deprecation-warning headers when using models

API Stability

Venice maintains backward compatibility for v1 endpoints and parameters. For model lifecycle policy, deprecation notices, and migration guidance, see Deprecations.

Next Steps

Quickstart guide — from API key to a working integration
Endpoints — full reference with an interactive playground
Models — the full model catalog with pricing and capabilities
Rate limits — per-model request and token limits
OpenAPI spec (YAML) — the complete specification for codegen and RAG

Questions or feedback? Join us on Discord. _{Request fields not listed in this documentation may be passed through but are not validated or guaranteed to work.}

​Quickstart

​Authentication

​Differences from OpenAI

​System Prompts

​Disabling Venice System Prompts

​Venice Parameters

​Prompt Caching

​Response Headers

​Best Practices

​API Stability

​Next Steps

Quickstart

Authentication

Differences from OpenAI

System Prompts

Disabling Venice System Prompts

Venice Parameters

Prompt Caching

Response Headers

Best Practices

API Stability

Next Steps