Skip to main content

splunk subscriber

Forwards events to a Splunk HTTP Event Collector (HEC) endpoint. Each event is wrapped in a HEC envelope (event + host, source, sourcetype, index, time, fields), batched as NDJSON, gzip-compressed, and POSTed with Authorization: Splunk <token>. Batches retry on 429 and 5xx with exponential backoff.

Add to the subscribers block in config.yaml:

subscribers:
splunk:
enabled: true
url: "https://splunk.example.com:8088/services/collector/event"
# token is read from SPLUNK_HEC_TOKEN env var (preferred) or set token here
index: "main"
sourcetype: "ziti:event"
source: "openziti"
hostname: "ziti-prod-01"
fields: # static indexed fields applied to every event
env: prod
cluster: us-east-1
field_paths: # dynamic indexed fields built from each event
service: service_name # fieldName -> gjson path into the enriched payload
identity: identity_name
host: host_name
router: source_id
namespace_filter: [] # only forward these namespaces; empty = all
exclude_fields: # drop these paths from the inner event before send
- circuitId
- tags.sourceRouterId
# Per-event filter — drop before batching. include is any-of, exclude
# none-of. See ../ "Per-Subscriber Filtering" for the comparator
# reference.
include:
- { field: service_name, regex: "^prod-" } # only prod-* services
exclude: []
batch_size: 100 # events per HEC POST
flush_interval: 5s # max time a partial batch sits before being sent
compression: "gzip" # "gzip" (default) or "none"
skip_verify: false # true for self-signed Splunk certs
workers: 2 # parallel HEC POSTers
buffer_size: 1000

ack_enabled: false # see "Indexer Acknowledgment" below
ack_timeout: 60s
ack_poll: 5s

Available fields and defaults

FieldDefaultDescription
urlrequiredHEC event endpoint, typically https://<host>:8088/services/collector/event.
tokenrequiredHEC token. Prefer SPLUNK_HEC_TOKEN env var over inline.
index, source, sourcetype, hostnameunsetOptional HEC envelope attributes. Omit to let Splunk apply token defaults.
fieldsunsetStatic key: value map written to the HEC fields object on every event.
field_pathsunsetDynamic indexed fields: key -> gjson path. Resolved per-event from the enriched payload. Missing paths are silently skipped.
namespace_filter[]Restrict to specific event namespaces; empty allows all.
exclude_fields[]Dotted paths to strip from the inner event before send (e.g. tags.sourceRouterId). Does not affect field_paths resolution.
include[]Per-event predicates against the enriched event; any-of. Empty = pass everything. See Per-subscriber filtering.
exclude[]Per-event predicates; none-of — if any matches, drop the event.
batch_size100Events per POST. Capped indirectly by HEC's 1 MiB body limit — oversized batches are split.
flush_interval5sMax time a partial batch sits before being shipped.
compressiongzipSet to none to disable.
skip_verifyfalseSkip TLS verification (use only with self-signed Splunk).
workers2Parallel HEC POSTs. See Sizing for high-volume streams below.
buffer_size1000Subscriber channel capacity.
ack_enabledfalseWait for indexer acknowledgement before counting events as delivered.
ack_timeout60sDrop pending acks Splunk hasn't confirmed within this window.
ack_poll5sInterval between ack-status polls.

Indexed fields vs. search-time extraction

Anything you put in fields or field_paths becomes an indexed field in Splunk — searchable instantly via service=foo (no event. prefix needed) and usable in tstats for accelerated reporting. Anything inside the inner event payload is searchable only via search-time extraction (auto-extracted when KV_MODE=json is set on the sourcetype, which is the default for _json).

Promote 4–6 high-value, low-to-medium-cardinality dimensions (service, identity, host, router, env, namespace) via field_paths. Avoid promoting high-cardinality fields like circuitId, sessionId, or unique tokens — those bloat the TSIDX. Keep them inside the inner event payload where they remain searchable but not indexed.

Indexer acknowledgment

When ack_enabled: true, the subscriber generates a per-instance X-Splunk-Request-Channel UUID and includes it on every POST. After each successful POST, the returned ackId is recorded; a poller goroutine then queries /services/collector/ack every ack_poll seconds. Events are counted as delivered only after Splunk confirms the ack — so the delivered counter in the TUI reflects events actually written to durable storage, not just accepted at the HEC frontend.

This requires the HEC token to have indexer acknowledgement enabled in Splunk (Settings → Data Inputs → HTTP Event Collector → token → "Enable indexer acknowledgement"). Without that, every POST returns 400 {"text":"Data channel is missing","code":10}.

Throughput cost: ack mode forces Splunk to wait for the indexer to durably persist each batch before returning the response. Per-POST round-trip rises from ~10 ms to ~100–200 ms. With workers: 2 and batch_size: 50, expect ~500 events/sec sustained vs. several thousand without ack. Enable when durability matters; leave off for high-volume telemetry where occasional event loss is acceptable.

Pending acks older than ack_timeout are dropped with a warning — those events were sent and likely indexed, but the ack confirmation never returned. The default 60s is generous; lower it only if you need tighter delivery accounting.

Sizing for high-volume streams

The per-second throughput ceiling is approximately:

throughput ≈ workers × (batch_size / round_trip_seconds)

Per-POST round trip is ~10 ms with ack disabled, ~100–200 ms with ack enabled. So:

Modeworkersbatch_sizeApprox ceiling
ack off2100~20,000 ev/s
ack on250~500 ev/s
ack on4100~2,000 ev/s
ack on8200~8,000 ev/s

If you see Subscriber channel full, dropping events warnings (or rising QLEN in the TUI Throughput view), the inflow has exceeded the worker pool's drain rate. In order of effectiveness:

  1. Raise workers — linear scaling until network or Splunk indexer becomes the bottleneck. Each worker holds one persistent HTTP/2 conn to HEC.
  2. Raise batch_size — fewer round trips per event. Watch the 1 MiB body limit (the subscriber splits oversized batches automatically, but very large batches add latency). 200–500 is a reasonable upper bound.
  3. Raise buffer_size — does not raise steady-state ceiling but absorbs bursts. Useful when traffic is spiky rather than uniformly high.
  4. Disable ack — biggest single jump (~10×) but loses durability confirmation.

For a controller emitting > 1k events/sec sustained, start with workers: 4, batch_size: 100, buffer_size: 5000, and tune from there.