Skip to main content

Providers

The gateway presents a single OpenAI-compatible API to clients and translates requests to the appropriate backend provider. Three provider types are supported: OpenAI (and compatible APIs), Anthropic, and a local/self-hosted provider for any backend that implements /v1/chat/completions.

API surface

All clients interact with the gateway using the OpenAI chat completions format:

POST /v1/chat/completions chat completions (streaming and non-streaming)
GET /v1/models list available models from all providers
GET /health health check
GET /metrics Prometheus metrics (when enabled)

By default, the gateway doesn't require a client API key — authentication is between the gateway and the upstream providers. Optionally, the gateway can enforce its own virtual API keys.

Model routing

Models are routed to providers by prefix-matching on the model name:

PrefixProvider
gpt-*, o1-*, o3-*OpenAI
claude-*Anthropic
Everything elseLocal (configured as local)

Matching is case-insensitive. A request for gpt-4 goes to OpenAI; claude-haiku-4-5-20251001 goes to Anthropic; llama3 or qwen3-vl:30b go to the local provider.

If the target provider isn't configured, the gateway returns an error:

{"error": {"message": "provider 'openai' is not configured", "type": "invalid_request_error"}}

OpenAI provider

The OpenAI provider is a direct pass-through. Requests forward to POST {base_url}/v1/chat/completions with an Authorization: Bearer header. Responses are returned unmodified.

Any OpenAI-compatible API can be used as the OpenAI provider by setting base_url — for example, Azure OpenAI or a local vLLM server.

Model listing calls GET {base_url}/v1/models.

Anthropic provider

The Anthropic provider translates between the OpenAI format and Anthropic's Messages API. Clients send OpenAI-format requests and receive OpenAI-format responses regardless of which provider handles the request.

Request translation

The gateway maps OpenAI request fields to their Anthropic equivalents before forwarding:

OpenAI fieldAnthropic fieldNotes
modelmodelPassed through
messages (role: system)systemFirst system message becomes Anthropic's top-level system field
messages (role: user)messages (role: user)
messages (role: assistant)messages (role: assistant)
messages (role: tool)messages (role: user)Mapped to user role
max_tokensmax_tokensDefaults to 4096 if not set (Anthropic requires this field)
temperaturetemperature
top_ptop_p
stopstop_sequencesString or array

Response translation

The gateway maps Anthropic response fields back to the OpenAI format before returning to the client:

Anthropic fieldOpenAI fieldNotes
idid
content[].textchoices[0].message.contentText blocks are concatenated
usage.input_tokensusage.prompt_tokens
usage.output_tokensusage.completion_tokens
stop_reasonchoices[0].finish_reasonend_turn/stop_sequencestop; max_tokenslength

Streaming translation

Anthropic uses a different streaming event format than OpenAI. The gateway translates on the fly:

Anthropic eventAction
message_startCaptures the message ID for subsequent chunks
content_block_deltaEmitted as an OpenAI-format chat.completion.chunk with the delta text
message_deltaEmitted as a chunk with the finish_reason
message_stopEmitted as the [DONE] sentinel

Model listing

Anthropic doesn't have a public models listing endpoint. The provider returns a static list of current and legacy Claude models.

Error translation

Anthropic error types are mapped to their gateway equivalents:

Anthropic error typeGateway error typeHTTP status
authentication_errorauthentication_error401
rate_limit_errorrate_limit_error429
invalid_request_errorinvalid_request_error400
not_found_errornot_found_error404
(other)server_error500

Local / self-hosted provider

The local provider is a direct pass-through to any OpenAI-compatible backend. Chat completions go to POST {base_url}/v1/chat/completions. This means Ollama, vLLM, llama.cpp, SGLang, or any server exposing this endpoint can be used.

Model listing tries GET {base_url}/v1/models first, falling back to Ollama's native GET {base_url}/api/tags.

For multi-endpoint load balancing and failover, see Multi-endpoint load balancing.

Streaming

All three providers support streaming via Server-Sent Events (SSE). See Streaming for response format, headers, and how the gateway processes streaming requests.

Error handling

All providers translate upstream errors into a consistent OpenAI-compatible format:

{
"error": {
"message": "description of what went wrong",
"type": "error_type",
"param": null,
"code": null
}
}
Error typeHTTP statusTypical cause
invalid_request_error400Malformed request, missing model, provider not configured
authentication_error401Invalid API key
permission_error403Insufficient permissions
not_found_error404Model not found
rate_limit_error429Upstream rate limit hit
server_error500Provider returned an unexpected error
service_unavailable503Provider is down