splunk subscriber
Forwards events to a Splunk HTTP Event Collector (HEC) endpoint. Each event is wrapped in a HEC envelope (event +
host, source, sourcetype, index, time, fields), batched as NDJSON, gzip-compressed, and POSTed with
Authorization: Splunk <token>. Batches retry on 429 and 5xx with exponential backoff.
Add to the subscribers block in config.yaml:
subscribers:
splunk:
enabled: true
url: "https://splunk.example.com:8088/services/collector/event"
# token is read from SPLUNK_HEC_TOKEN env var (preferred) or set token here
index: "main"
sourcetype: "ziti:event"
source: "openziti"
hostname: "ziti-prod-01"
fields: # static indexed fields applied to every event
env: prod
cluster: us-east-1
field_paths: # dynamic indexed fields built from each event
service: service_name # fieldName -> gjson path into the enriched payload
identity: identity_name
host: host_name
router: source_id
namespace_filter: [] # only forward these namespaces; empty = all
exclude_fields: # drop these paths from the inner event before send
- circuitId
- tags.sourceRouterId
# Per-event filter — drop before batching. include is any-of, exclude
# none-of. See ../ "Per-Subscriber Filtering" for the comparator
# reference.
include:
- { field: service_name, regex: "^prod-" } # only prod-* services
exclude: []
batch_size: 100 # events per HEC POST
flush_interval: 5s # max time a partial batch sits before being sent
compression: "gzip" # "gzip" (default) or "none"
skip_verify: false # true for self-signed Splunk certs
workers: 2 # parallel HEC POSTers
buffer_size: 1000
ack_enabled: false # see "Indexer Acknowledgment" below
ack_timeout: 60s
ack_poll: 5s
Available fields and defaults
| Field | Default | Description |
|---|---|---|
url | required | HEC event endpoint, typically https://<host>:8088/services/collector/event. |
token | required | HEC token. Prefer SPLUNK_HEC_TOKEN env var over inline. |
index, source, sourcetype, hostname | unset | Optional HEC envelope attributes. Omit to let Splunk apply token defaults. |
fields | unset | Static key: value map written to the HEC fields object on every event. |
field_paths | unset | Dynamic indexed fields: key -> gjson path. Resolved per-event from the enriched payload. Missing paths are silently skipped. |
namespace_filter | [] | Restrict to specific event namespaces; empty allows all. |
exclude_fields | [] | Dotted paths to strip from the inner event before send (e.g. tags.sourceRouterId). Does not affect field_paths resolution. |
include | [] | Per-event predicates against the enriched event; any-of. Empty = pass everything. See Per-subscriber filtering. |
exclude | [] | Per-event predicates; none-of — if any matches, drop the event. |
batch_size | 100 | Events per POST. Capped indirectly by HEC's 1 MiB body limit — oversized batches are split. |
flush_interval | 5s | Max time a partial batch sits before being shipped. |
compression | gzip | Set to none to disable. |
skip_verify | false | Skip TLS verification (use only with self-signed Splunk). |
workers | 2 | Parallel HEC POSTs. See Sizing for high-volume streams below. |
buffer_size | 1000 | Subscriber channel capacity. |
ack_enabled | false | Wait for indexer acknowledgement before counting events as delivered. |
ack_timeout | 60s | Drop pending acks Splunk hasn't confirmed within this window. |
ack_poll | 5s | Interval between ack-status polls. |
Indexed fields vs. search-time extraction
Anything you put in fields or field_paths becomes an indexed field in Splunk — searchable instantly via
service=foo (no event. prefix needed) and usable in tstats for accelerated reporting. Anything inside the inner
event payload is searchable only via search-time extraction (auto-extracted when KV_MODE=json is set on the
sourcetype, which is the default for _json).
Promote 4–6 high-value, low-to-medium-cardinality dimensions (service, identity, host, router, env, namespace) via
field_paths. Avoid promoting high-cardinality fields like circuitId, sessionId, or unique tokens — those bloat the
TSIDX. Keep them inside the inner event payload where they remain searchable but not indexed.
Indexer acknowledgment
When ack_enabled: true, the subscriber generates a per-instance X-Splunk-Request-Channel UUID and includes it on
every POST. After each successful POST, the returned ackId is recorded; a poller goroutine then queries
/services/collector/ack every ack_poll seconds. Events are counted as delivered only after Splunk confirms the ack —
so the delivered counter in the TUI reflects events actually written to durable storage, not just accepted at the HEC
frontend.
This requires the HEC token to have indexer acknowledgement enabled in Splunk (Settings → Data Inputs → HTTP Event
Collector → token → "Enable indexer acknowledgement"). Without that, every POST returns 400 {"text":"Data channel is missing","code":10}.
Throughput cost: ack mode forces Splunk to wait for the indexer to durably persist each batch before returning the
response. Per-POST round-trip rises from ~10 ms to ~100–200 ms. With workers: 2 and batch_size: 50, expect ~500
events/sec sustained vs. several thousand without ack. Enable when durability matters; leave off for high-volume
telemetry where occasional event loss is acceptable.
Pending acks older than ack_timeout are dropped with a warning — those events were sent and likely indexed, but the
ack confirmation never returned. The default 60s is generous; lower it only if you need tighter delivery accounting.
Sizing for high-volume streams
The per-second throughput ceiling is approximately:
throughput ≈ workers × (batch_size / round_trip_seconds)
Per-POST round trip is ~10 ms with ack disabled, ~100–200 ms with ack enabled. So:
| Mode | workers | batch_size | Approx ceiling |
|---|---|---|---|
| ack off | 2 | 100 | ~20,000 ev/s |
| ack on | 2 | 50 | ~500 ev/s |
| ack on | 4 | 100 | ~2,000 ev/s |
| ack on | 8 | 200 | ~8,000 ev/s |
If you see Subscriber channel full, dropping events warnings (or rising QLEN in the TUI Throughput view), the inflow
has exceeded the worker pool's drain rate. In order of effectiveness:
- Raise
workers— linear scaling until network or Splunk indexer becomes the bottleneck. Each worker holds one persistent HTTP/2 conn to HEC. - Raise
batch_size— fewer round trips per event. Watch the 1 MiB body limit (the subscriber splits oversized batches automatically, but very large batches add latency). 200–500 is a reasonable upper bound. - Raise
buffer_size— does not raise steady-state ceiling but absorbs bursts. Useful when traffic is spiky rather than uniformly high. - Disable ack — biggest single jump (~10×) but loses durability confirmation.
For a controller emitting > 1k events/sec sustained, start with workers: 4, batch_size: 100, buffer_size: 5000,
and tune from there.