`splunk` subscriber

Forwards events to a Splunk HTTP Event Collector (HEC) endpoint. Each event is wrapped in a HEC envelope (event + host, source, sourcetype, index, time, fields), batched as NDJSON, gzip-compressed, and POSTed with Authorization: Splunk <token>. Batches retry on 429 and 5xx with exponential backoff.

Add to the subscribers block in config.yaml:

subscribers:
  splunk:
    enabled: true
    url: "https://splunk.example.com:8088/services/collector/event"
    # token is read from SPLUNK_HEC_TOKEN env var (preferred) or set token here
    index: "main"
    sourcetype: "ziti:event"
    source: "openziti"
    hostname: "ziti-prod-01"
    fields:                       # static indexed fields applied to every event
      env: prod
      cluster: us-east-1
    field_paths:                  # dynamic indexed fields built from each event
      service: service_name       #   fieldName -> gjson path into the enriched payload
      identity: identity_name
      host: host_name
      router: source_id
    namespace_filter: []          # only forward these namespaces; empty = all
    exclude_fields:               # drop these paths from the inner event before send
      - circuitId
      - tags.sourceRouterId
    # Per-event filter — drop before batching. include is any-of, exclude
    # none-of. See ../ "Per-Subscriber Filtering" for the comparator
    # reference.
    include:
      - { field: service_name, regex: "^prod-" }   # only prod-* services
    exclude: []
    batch_size: 100               # events per HEC POST
    flush_interval: 5s            # max time a partial batch sits before being sent
    compression: "gzip"           # "gzip" (default) or "none"
    skip_verify: false            # true for self-signed Splunk certs
    workers: 2                    # parallel HEC POSTers
    buffer_size: 1000

    ack_enabled: false            # see "Indexer Acknowledgment" below
    ack_timeout: 60s
    ack_poll: 5s

Available fields and defaults

Field	Default	Description
`url`	required	HEC event endpoint, typically `https://<host>:8088/services/collector/event`.
`token`	required	HEC token. Prefer `SPLUNK_HEC_TOKEN` env var over inline.
`index`, `source`, `sourcetype`, `hostname`	unset	Optional HEC envelope attributes. Omit to let Splunk apply token defaults.
`fields`	unset	Static `key: value` map written to the HEC `fields` object on every event.
`field_paths`	unset	Dynamic indexed fields: `key -> gjson path`. Resolved per-event from the enriched payload. Missing paths are silently skipped.
`namespace_filter`	`[]`	Restrict to specific event namespaces; empty allows all.
`exclude_fields`	`[]`	Dotted paths to strip from the inner event before send (e.g. `tags.sourceRouterId`). Does not affect `field_paths` resolution.
`include`	`[]`	Per-event predicates against the enriched event; any-of. Empty = pass everything. See Per-subscriber filtering.
`exclude`	`[]`	Per-event predicates; none-of — if any matches, drop the event.
`batch_size`	`100`	Events per POST. Capped indirectly by HEC's 1 MiB body limit — oversized batches are split.
`flush_interval`	`5s`	Max time a partial batch sits before being shipped.
`compression`	`gzip`	Set to `none` to disable.
`skip_verify`	`false`	Skip TLS verification (use only with self-signed Splunk).
`workers`	`2`	Parallel HEC POSTs. See Sizing for high-volume streams below.
`buffer_size`	`1000`	Subscriber channel capacity.
`ack_enabled`	`false`	Wait for indexer acknowledgement before counting events as delivered.
`ack_timeout`	`60s`	Drop pending acks Splunk hasn't confirmed within this window.
`ack_poll`	`5s`	Interval between ack-status polls.

Indexed fields vs. search-time extraction

Anything you put in fields or field_paths becomes an indexed field in Splunk — searchable instantly via service=foo (no event. prefix needed) and usable in tstats for accelerated reporting. Anything inside the inner event payload is searchable only via search-time extraction (auto-extracted when KV_MODE=json is set on the sourcetype, which is the default for _json).

Promote 4–6 high-value, low-to-medium-cardinality dimensions (service, identity, host, router, env, namespace) via field_paths. Avoid promoting high-cardinality fields like circuitId, sessionId, or unique tokens — those bloat the TSIDX. Keep them inside the inner event payload where they remain searchable but not indexed.

Indexer acknowledgment

When ack_enabled: true, the subscriber generates a per-instance X-Splunk-Request-Channel UUID and includes it on every POST. After each successful POST, the returned ackId is recorded; a poller goroutine then queries /services/collector/ack every ack_poll seconds. Events are counted as delivered only after Splunk confirms the ack — so the delivered counter in the TUI reflects events actually written to durable storage, not just accepted at the HEC frontend.

This requires the HEC token to have indexer acknowledgement enabled in Splunk (Settings → Data Inputs → HTTP Event Collector → token → "Enable indexer acknowledgement"). Without that, every POST returns 400 {"text":"Data channel is missing","code":10}.

Throughput cost: ack mode forces Splunk to wait for the indexer to durably persist each batch before returning the response. Per-POST round-trip rises from ~10 ms to ~100–200 ms. With workers: 2 and batch_size: 50, expect ~500 events/sec sustained vs. several thousand without ack. Enable when durability matters; leave off for high-volume telemetry where occasional event loss is acceptable.

Pending acks older than ack_timeout are dropped with a warning — those events were sent and likely indexed, but the ack confirmation never returned. The default 60s is generous; lower it only if you need tighter delivery accounting.

Sizing for high-volume streams

The per-second throughput ceiling is approximately:

throughput ≈ workers × (batch_size / round_trip_seconds)

Per-POST round trip is ~10 ms with ack disabled, ~100–200 ms with ack enabled. So:

Mode	workers	batch_size	Approx ceiling
ack off	2	100	~20,000 ev/s
ack on	2	50	~500 ev/s
ack on	4	100	~2,000 ev/s
ack on	8	200	~8,000 ev/s

If you see Subscriber channel full, dropping events warnings (or rising QLEN in the TUI Throughput view), the inflow has exceeded the worker pool's drain rate. In order of effectiveness:

Raise workers — linear scaling until network or Splunk indexer becomes the bottleneck. Each worker holds one persistent HTTP/2 conn to HEC.
Raise batch_size — fewer round trips per event. Watch the 1 MiB body limit (the subscriber splits oversized batches automatically, but very large batches add latency). 200–500 is a reasonable upper bound.
Raise buffer_size — does not raise steady-state ceiling but absorbs bursts. Useful when traffic is spiky rather than uniformly high.
Disable ack — biggest single jump (~10×) but loses durability confirmation.

For a controller emitting > 1k events/sec sustained, start with workers: 4, batch_size: 100, buffer_size: 5000, and tune from there.

Available fields and defaults​

Indexed fields vs. search-time extraction​

Indexer acknowledgment​

Sizing for high-volume streams​

Available fields and defaults

Indexed fields vs. search-time extraction

Indexer acknowledgment

Sizing for high-volume streams