Metrics

The gateway exposes OpenTelemetry metrics via a Prometheus exporter. When enabled, metrics are available at GET /metrics in the standard Prometheus text format.

Enabling metrics

Add the following to your config file:

metrics:
  enabled: true

Instruments

All metric names are prefixed with llm_gateway..

Request metrics

These metrics track individual requests through the gateway:

llm_gateway.requests (counter): Total chat completion requests.

Attribute	Values	Description
`provider`	`openai`, `anthropic`, `ollama`	Which provider handled the request
`model`	Model name	The model used
`streaming`	`true`, `false`	Whether the request was streaming
`key`	Key name or empty	The API key name (when virtual API keys are enabled)

llm_gateway.request.duration (histogram, seconds): End-to-end request duration including upstream provider latency.

Attribute	Values	Description
`provider`	`openai`, `anthropic`, `ollama`	Which provider handled the request
`model`	Model name	The model used
`key`	Key name or empty	The API key name (when virtual API keys are enabled)

llm_gateway.requests.inflight (up-down counter): Number of requests currently being processed. Incremented when a request enters the handler, decremented when it completes. Useful for understanding concurrency and detecting request pileups. No attributes.

Token metrics

These metrics track token consumption as reported by each provider:

llm_gateway.tokens.prompt (counter): Total prompt (input) tokens across all requests.

Attribute	Values	Description
`provider`	Provider name	Which provider reported the usage
`model`	Model name	The model used

llm_gateway.tokens.completion (counter): Total completion (output) tokens across all requests.

Attribute	Values	Description
`provider`	Provider name	Which provider reported the usage
`model`	Model name	The model used

Token metrics are recorded from the usage field in non-streaming responses. Streaming responses typically don't include token counts.

Routing metrics

This metric tracks how routing decisions are distributed across the cascade layers:

llm_gateway.routing.decisions (counter): Semantic routing decisions, counted each time the router selects a model.

Attribute	Values	Description
`method`	`explicit`, `heuristic`, `semantic`, `classifier`, `default`	Which routing layer made the decision

A high proportion of default decisions may indicate that thresholds are too strict or that route examples don't cover your traffic well.

Error metrics

This metric tracks errors returned by upstream providers, broken down by error category:

llm_gateway.provider.errors (counter): Errors returned by upstream providers.

Attribute	Values	Description
`error_type`	`invalid_request_error`, `authentication_error`, `rate_limit_error`, `server_error`, `not_found_error`, `service_unavailable`, `unknown`	The error category

Health metrics

This metric tracks endpoint availability in multi-endpoint mode:

llm_gateway.endpoint.healthy (up-down counter): Per-endpoint health status. Value is 1 for healthy endpoints and 0 for unhealthy endpoints.

Attribute	Values	Description
`endpoint`	Endpoint name	The endpoint being reported on

Prometheus scraping

Point your Prometheus instance at the gateway's /metrics endpoint:

# prometheus.yml
scrape_configs:
  - job_name: llm-gateway
    scrape_interval: 15s
    static_configs:
      - targets: ["localhost:8080"]

Useful queries

Some example PromQL queries to get started:

Requests per minute by provider:

rate(llm_gateway_requests_total[5m]) * 60

Average request duration by model:

rate(llm_gateway_request_duration_seconds_sum[5m]) / rate(llm_gateway_request_duration_seconds_count[5m])

Token throughput (tokens per second):

rate(llm_gateway_tokens_prompt_total[5m]) + rate(llm_gateway_tokens_completion_total[5m])

Error rate as a percentage of total requests:

rate(llm_gateway_provider_errors_total[5m]) / rate(llm_gateway_requests_total[5m]) * 100

Routing method distribution:

rate(llm_gateway_routing_decisions_total[5m])

Current in-flight requests:
```
llm_gateway_requests_inflight
```

Enabling metrics​

Instruments​

Request metrics​

Token metrics​

Routing metrics​

Error metrics​

Health metrics​

Prometheus scraping​

Useful queries​