Providers

The gateway presents a single OpenAI-compatible API to clients and translates requests to the appropriate backend provider. Three provider types are supported: OpenAI (and compatible APIs), Anthropic, and a local/self-hosted provider for any backend that implements /v1/chat/completions.

API surface

All clients interact with the gateway using the OpenAI chat completions format:

POST /v1/chat/completions    chat completions (streaming and non-streaming)
GET  /v1/models              list available models from all providers
GET  /health                 health check
GET  /metrics                Prometheus metrics (when enabled)

By default, the gateway doesn't require a client API key — authentication is between the gateway and the upstream providers. Optionally, the gateway can enforce its own virtual API keys.

Model routing

Models are routed to providers by prefix-matching on the model name:

Prefix	Provider
`gpt-`, `o1-`, `o3-*`	OpenAI
`claude-*`	Anthropic
Everything else	Local (configured as `local`)

Matching is case-insensitive. A request for gpt-4 goes to OpenAI; claude-haiku-4-5-20251001 goes to Anthropic; llama3 or qwen3-vl:30b go to the local provider.

If the target provider isn't configured, the gateway returns an error:

{"error": {"message": "provider 'openai' is not configured", "type": "invalid_request_error"}}

OpenAI provider

The OpenAI provider is a direct pass-through. Requests forward to POST {base_url}/v1/chat/completions with an Authorization: Bearer header. Responses are returned unmodified.

Any OpenAI-compatible API can be used as the OpenAI provider by setting base_url — for example, Azure OpenAI or a local vLLM server.

Model listing calls GET {base_url}/v1/models.

Anthropic provider

The Anthropic provider translates between the OpenAI format and Anthropic's Messages API. Clients send OpenAI-format requests and receive OpenAI-format responses regardless of which provider handles the request.

Request translation

The gateway maps OpenAI request fields to their Anthropic equivalents before forwarding:

OpenAI field	Anthropic field	Notes
`model`	`model`	Passed through
`messages` (role: system)	`system`	First system message becomes Anthropic's top-level `system` field
`messages` (role: user)	`messages` (role: user)
`messages` (role: assistant)	`messages` (role: assistant)
`messages` (role: tool)	`messages` (role: user)	Mapped to user role
`max_tokens`	`max_tokens`	Defaults to 4096 if not set (Anthropic requires this field)
`temperature`	`temperature`
`top_p`	`top_p`
`stop`	`stop_sequences`	String or array

Response translation

The gateway maps Anthropic response fields back to the OpenAI format before returning to the client:

Anthropic field	OpenAI field	Notes
`id`	`id`
`content[].text`	`choices[0].message.content`	Text blocks are concatenated
`usage.input_tokens`	`usage.prompt_tokens`
`usage.output_tokens`	`usage.completion_tokens`
`stop_reason`	`choices[0].finish_reason`	`end_turn`/`stop_sequence` → `stop`; `max_tokens` → `length`

Streaming translation

Anthropic uses a different streaming event format than OpenAI. The gateway translates on the fly:

Anthropic event	Action
`message_start`	Captures the message ID for subsequent chunks
`content_block_delta`	Emitted as an OpenAI-format `chat.completion.chunk` with the delta text
`message_delta`	Emitted as a chunk with the `finish_reason`
`message_stop`	Emitted as the `[DONE]` sentinel

Model listing

Anthropic doesn't have a public models listing endpoint. The provider returns a static list of current and legacy Claude models.

Error translation

Anthropic error types are mapped to their gateway equivalents:

Anthropic error type	Gateway error type	HTTP status
`authentication_error`	`authentication_error`	401
`rate_limit_error`	`rate_limit_error`	429
`invalid_request_error`	`invalid_request_error`	400
`not_found_error`	`not_found_error`	404
(other)	`server_error`	500

Local / self-hosted provider

The local provider is a direct pass-through to any OpenAI-compatible backend. Chat completions go to POST {base_url}/v1/chat/completions. This means Ollama, vLLM, llama.cpp, SGLang, or any server exposing this endpoint can be used.

Model listing tries GET {base_url}/v1/models first, falling back to Ollama's native GET {base_url}/api/tags.

For multi-endpoint load balancing and failover, see Multi-endpoint load balancing.

Streaming

All three providers support streaming via Server-Sent Events (SSE). See Streaming for response format, headers, and how the gateway processes streaming requests.

Error handling

All providers translate upstream errors into a consistent OpenAI-compatible format:

{
  "error": {
    "message": "description of what went wrong",
    "type": "error_type",
    "param": null,
    "code": null
  }
}

Error type	HTTP status	Typical cause
`invalid_request_error`	400	Malformed request, missing model, provider not configured
`authentication_error`	401	Invalid API key
`permission_error`	403	Insufficient permissions
`not_found_error`	404	Model not found
`rate_limit_error`	429	Upstream rate limit hit
`server_error`	500	Provider returned an unexpected error
`service_unavailable`	503	Provider is down

API surface​

Model routing​

OpenAI provider​

Anthropic provider​

Request translation​

Response translation​

Streaming translation​

Model listing​

Error translation​

Local / self-hosted provider​

Streaming​

Error handling​