Skip to main content

Metrics

The gateway exposes OpenTelemetry metrics via a Prometheus exporter. When enabled, metrics are available at GET /metrics in the standard Prometheus text format.

Enabling metrics

Add the following to your config file:

metrics:
enabled: true

Instruments

All metric names are prefixed with llm_gateway..

Request metrics

These metrics track individual requests through the gateway:

llm_gateway.requests (counter): Total chat completion requests.

AttributeValuesDescription
provideropenai, anthropic, ollamaWhich provider handled the request
modelModel nameThe model used
streamingtrue, falseWhether the request was streaming
keyKey name or emptyThe API key name (when virtual API keys are enabled)

llm_gateway.request.duration (histogram, seconds): End-to-end request duration including upstream provider latency.

AttributeValuesDescription
provideropenai, anthropic, ollamaWhich provider handled the request
modelModel nameThe model used
keyKey name or emptyThe API key name (when virtual API keys are enabled)

llm_gateway.requests.inflight (up-down counter): Number of requests currently being processed. Incremented when a request enters the handler, decremented when it completes. Useful for understanding concurrency and detecting request pileups. No attributes.

Token metrics

These metrics track token consumption as reported by each provider:

llm_gateway.tokens.prompt (counter): Total prompt (input) tokens across all requests.

AttributeValuesDescription
providerProvider nameWhich provider reported the usage
modelModel nameThe model used

llm_gateway.tokens.completion (counter): Total completion (output) tokens across all requests.

AttributeValuesDescription
providerProvider nameWhich provider reported the usage
modelModel nameThe model used

Token metrics are recorded from the usage field in non-streaming responses. Streaming responses typically don't include token counts.

Routing metrics

This metric tracks how routing decisions are distributed across the cascade layers:

llm_gateway.routing.decisions (counter): Semantic routing decisions, counted each time the router selects a model.

AttributeValuesDescription
methodexplicit, heuristic, semantic, classifier, defaultWhich routing layer made the decision

A high proportion of default decisions may indicate that thresholds are too strict or that route examples don't cover your traffic well.

Error metrics

This metric tracks errors returned by upstream providers, broken down by error category:

llm_gateway.provider.errors (counter): Errors returned by upstream providers.

AttributeValuesDescription
error_typeinvalid_request_error, authentication_error, rate_limit_error, server_error, not_found_error, service_unavailable, unknownThe error category

Health metrics

This metric tracks endpoint availability in multi-endpoint mode:

llm_gateway.endpoint.healthy (up-down counter): Per-endpoint health status. Value is 1 for healthy endpoints and 0 for unhealthy endpoints.

AttributeValuesDescription
endpointEndpoint nameThe endpoint being reported on

Prometheus scraping

Point your Prometheus instance at the gateway's /metrics endpoint:

# prometheus.yml
scrape_configs:
- job_name: llm-gateway
scrape_interval: 15s
static_configs:
- targets: ["localhost:8080"]

Useful queries

Some example PromQL queries to get started:

  • Requests per minute by provider:

    rate(llm_gateway_requests_total[5m]) * 60
  • Average request duration by model:

    rate(llm_gateway_request_duration_seconds_sum[5m]) / rate(llm_gateway_request_duration_seconds_count[5m])
  • Token throughput (tokens per second):

    rate(llm_gateway_tokens_prompt_total[5m]) + rate(llm_gateway_tokens_completion_total[5m])
  • Error rate as a percentage of total requests:

    rate(llm_gateway_provider_errors_total[5m]) / rate(llm_gateway_requests_total[5m]) * 100
  • Routing method distribution:

    rate(llm_gateway_routing_decisions_total[5m])
  • Current in-flight requests:

    llm_gateway_requests_inflight