Skip to main content
The Venice API is a REST API at https://api.venice.ai/api/v1 for private inference across uncensored, frontier models. It implements the OpenAI specification, so existing OpenAI clients and SDKs work by changing the base URL — then adds Venice-native features like server-side web search, characters, and wallet payments. The full spec is also available as an OpenAPI YAML file.

Quickstart

Point any OpenAI-compatible client at Venice’s base URL (https://api.venice.ai/api/v1). Create and manage keys in your API settings.
curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

All API requests require HTTP Bearer authentication:
Authorization: Bearer VENICE_API_KEY
Your API key is a secret. Do not share it or expose it in any client-side code.

Differences from OpenAI

Venice is OpenAI-compatible. The main additions and differences:
  • venice_parameters — Venice-only request options (web search, scraping, character personas, thinking controls). Reference below.
  • System prompts — Venice appends defaults tuned for natural, uncensored output; disable with include_venice_system_prompt: false. Details below.
  • Models — Use Venice model IDs directly rather than OpenAI mappings. Browse models.
  • Response headers — Balance, rate-limit, model, and content-safety metadata on every response. Reference below.
  • Private inference — TEE-backed and end-to-end-encrypted model options. Privacy models.
  • Payments — Credits, a daily DIEM allowance, or per-request USDC via x402. x402 guide.

System Prompts

Venice provides default system prompts designed to ensure uncensored and natural model responses. You have two options for handling system prompts:
  1. Default Behavior: Your system prompts are appended to Venice’s defaults
  2. Custom Behavior: Disable Venice’s system prompts entirely

Disabling Venice System Prompts

Use the venice_parameters option to remove Venice’s default system prompts:
curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [
      {"role": "system", "content": "Your custom system prompt"},
      {"role": "user", "content": "Why is the sky blue?"}
    ],
    "venice_parameters": {
      "include_venice_system_prompt": false
    }
  }'

Venice Parameters

The venice_parameters object allows you to access Venice-specific features not available in the standard OpenAI API:
ParameterTypeDescriptionDefault
character_slugstringThe character slug of a public Venice character (discoverable as “Public ID” on the published character page)-
strip_thinking_responsebooleanStrip <think></think> blocks from the response (models using legacy <think> tag format). See Reasoning Models.false
disable_thinkingbooleanOn supported reasoning models, disable thinking and strip the <think></think> blocks from the responsefalse
enable_web_searchstringEnable web search for this request (off, on, auto - auto enables based on model’s discretion)
Additional usage-based pricing applies, see pricing.
off
enable_web_scrapingbooleanEnable web scraping of up to 5 URLs detected in the user message. Scraped content augments responses and bypasses web search. Only successfully scraped URLs are billed.
Additional usage-based pricing applies, see pricing.
false
enable_x_searchbooleanEnable xAI’s native search (web + X/Twitter) for supported Grok models (e.g., grok-4-20-beta). Provides higher quality search results by using xAI’s search infrastructure. When enabled, Venice’s standard web search is bypassed.
Additional usage-based pricing applies, see pricing.
false
enable_web_citationsbooleanWhen web search is enabled, request that the LLM cite its sources using [REF]0[/REF] formatfalse
include_search_results_in_streambooleanExperimental: Include search results in the stream as the first emitted chunkfalse
return_search_results_as_documentsbooleanSurface search results in an OpenAI-compatible tool call named venice_web_search_documents for LangChain integrationfalse
include_venice_system_promptbooleanWhether to include Venice’s default system prompts alongside specified system promptstrue
These parameters can also be specified as model suffixes appended to the model name (e.g., zai-org-glm-5:enable_web_search=auto). See Model Feature Suffixes for details.

Prompt Caching

Venice supports prompt caching on select models to reduce latency and costs for repeated content. For supported models, Venice automatically caches system prompts—no code changes required. You can also manually mark content for caching using the cache_control property on message content.
ParameterTypeDescription
prompt_cache_keystringOptional routing hint to improve cache hit rates. When supplied, Venice routes requests to the same backend infrastructure, increasing the likelihood of cache hits across multi-turn conversations.
See Prompt Caching for details on how caching works, billing, and best practices.

Response Headers

All Venice API responses include HTTP headers with request, rate-limit, model, and account-balance metadata. In addition to error codes returned from API responses, you can inspect these headers to get the unique ID of a particular request, monitor rate limiting, and track your account balance. Venice recommends logging request IDs (CF-RAY header) in production deployments for more efficient troubleshooting with our support team, should the need arise.
HeaderTypePurposeWhen Returned
Standard HTTP Headers
Content-TypestringMIME type of the response body (application/json, text/csv, image/png, etc.)Always
Content-EncodingstringEncoding used to compress the response body (gzip, br)When client sends Accept-Encoding header
Content-DispositionstringHow content should be displayed (e.g., attachment; filename=export.csv)When downloading files or exports
DatestringRFC 7231 formatted timestamp when the response was generatedAlways
Request Identification
CF-RAYstringUnique identifier for this API request, used for troubleshooting and support requestsAlways
x-venice-versionstringCurrent version/revision of the Venice API service (e.g., 20250828.222653)Always
x-venice-timestampstringServer timestamp when the request was processed (ISO 8601 format)When timestamp tracking is enabled
x-venice-host-namestringHostname of the server that processed the requestError responses and debugging scenarios
Model Information
x-venice-model-idstringUnique identifier of the AI model used for the request (e.g., venice-01-lite)Inference endpoints using AI models
x-venice-model-namestringFriendly/display name of the AI model used (e.g., Venice Lite)Inference endpoints using AI models
x-venice-model-routerstringRouter/backend service that handled the model inferenceInference endpoints when routing info available
x-venice-model-deprecation-warningstringWarning message for models scheduled for deprecationWhen using a deprecated model
x-venice-model-deprecation-datestringDate when the model will be deprecated (ISO 8601 date)When using a deprecated model
Rate Limiting Information
x-ratelimit-limit-requestsnumberMaximum number of requests allowed in the current time windowAll authenticated requests
x-ratelimit-remaining-requestsnumberNumber of requests remaining in the current time windowAll authenticated requests
x-ratelimit-reset-requestsnumberUnix timestamp when the request rate limit resetsAll authenticated requests
x-ratelimit-limit-tokensnumberMaximum number of tokens (prompt + completion) allowed in the time windowAll authenticated requests
x-ratelimit-remaining-tokensnumberNumber of tokens remaining in the current time windowAll authenticated requests
x-ratelimit-reset-tokensnumberDuration in seconds until the token rate limit resetsAll authenticated requests
x-ratelimit-typestringType of rate limit applied (user, api_key, global)When rate limiting is enforced
Pagination Headers
x-pagination-limitnumberNumber of items per pagePaginated endpoints
x-pagination-pagenumberCurrent page number (1-based)Paginated endpoints
x-pagination-totalnumberTotal number of items across all pagesPaginated endpoints
x-pagination-total-pagesnumberTotal number of pagesPaginated endpoints
Account Balance Information
x-venice-balance-diemstringYour DIEM token balance before the request was processedAll authenticated requests
x-venice-balance-usdstringYour USD credit balance before the request was processedAll authenticated requests
Content Safety Headers
x-venice-is-blurredstringIndicates if generated image was blurred due to content policies (true/false)Image generation with Safe Venice enabled
x-venice-is-content-violationstringIndicates if content violates Venice’s content policies (true/false)Content generation endpoints
x-venice-is-adult-model-content-violationstringIndicates if content violates adult model content policies (true/false)Image generation endpoints
x-venice-contains-minorstringIndicates if image contains minors (true/false)Image analysis endpoints with age detection
Client Information
x-venice-middleface-versionstringVersion of the Venice middleface clientRequests from Venice middleface clients
x-venice-mobile-versionstringVersion of the Venice mobile app clientRequests from mobile applications
x-venice-request-timestamp-msnumberClient-provided request timestamp in millisecondsWhen client provides timestamp in request
x-venice-control-instancestringControl instance identifier for debuggingImage generation endpoints for debugging
Authentication Headers
x-auth-refreshedstringIndicates authentication token was refreshed during request (true/false)When authentication tokens are auto-refreshed
x-retry-countnumberNumber of retry attempts for the requestWhen request retries occur
  • Header Name Case: HTTP headers are case-insensitive, but Venice uses lowercase with hyphens for consistency
  • String Values: Boolean values in headers are returned as strings ("true" or "false")
  • Numeric Values: Large numbers and balance values may be returned as strings to prevent precision loss
  • Optional Headers: Not all headers are returned in every response; presence depends on the endpoint and request context
  • Compression: Use Accept-Encoding: gzip, br in requests to receive compressed responses where supported
// After making an API request, access headers from the response object
const requestId = response.headers.get('CF-RAY');
const remainingRequests = response.headers.get('x-ratelimit-remaining-requests');
const remainingTokens = response.headers.get('x-ratelimit-remaining-tokens');
const usdBalance = response.headers.get('x-venice-balance-usd');

// Check for model deprecation warnings
const deprecationWarning = response.headers.get('x-venice-model-deprecation-warning');
if (deprecationWarning) {
  console.warn(`Model Deprecation: ${deprecationWarning}`);
}

Best Practices

  1. Rate Limiting: Monitor x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens headers and implement exponential backoff
  2. Balance Monitoring: Track x-venice-balance-usd and x-venice-balance-diem headers to avoid service interruptions
  3. System Prompts: Test with and without Venice’s system prompts to find the best fit for your use case
  4. API Keys: Keep your API keys secure and rotate them regularly
  5. Request Logging: Log CF-RAY header values for troubleshooting with support
  6. Model Deprecation: Check for x-venice-model-deprecation-warning headers when using models

API Stability

Venice maintains backward compatibility for v1 endpoints and parameters. For model lifecycle policy, deprecation notices, and migration guidance, see Deprecations.

Next Steps

  • Quickstart guide — from API key to a working integration
  • Endpoints — full reference with an interactive playground
  • Models — the full model catalog with pricing and capabilities
  • Rate limits — per-model request and token limits
  • OpenAPI spec (YAML) — the complete specification for codegen and RAG
Questions or feedback? Join us on Discord. Request fields not listed in this documentation may be passed through but are not validated or guaranteed to work.