Skip to main content

HA cluster management

NetFoundry Self-Hosted supports high-availability (HA) controller clusters using the OpenZiti Raft consensus protocol. The nf-cluster command manages the full lifecycle of an HA cluster — checking status, adding controllers, removing controllers, and migrating a standalone controller to cluster mode.

Beta Feature

The nf-cluster command and HA cluster management features are currently in beta. Functionality and command syntax may change in future releases. Please report any issues to NetFoundry support.

info

HA requires the v3 ziti-controller Helm chart (^3.0.0), which uses a unified PKI with a single root of trust. All controllers in a cluster must run the same Ziti version.

Prerequisites

  • Existing standalone controller deployed with the v3 chart
  • DNS entries configured for each additional controller's advertise address
  • helm, kubectl, and jq available on the host

Check cluster status

View the current cluster state, including the primary controller, trust domain, and all cluster members:

nf-cluster status

Example output:

Cluster Status
Primary release: ziti-controller
Trust domain: ziti.example.com
Cluster mode: cluster-init
Controller count: 3

Cluster Members (from leader agent):
╭──────────────────┬─────────────────────────────────┬───────┬────────┬─────────┬───────────╮
│ ID │ ADDRESS │ VOTER │ LEADER │ VERSION │ CONNECTED │
├──────────────────┼─────────────────────────────────┼───────┼────────┼─────────┼───────────┤
│ ziti-controller │ tls:ctrl1.example.com:1280 │ true │ true │ v1.6.12 │ true │
│ ziti-controller2 │ tls:ctrl2.example.com:1280 │ true │ false │ v1.6.12 │ true │
│ ziti-controller3 │ tls:ctrl3.example.com:1280 │ true │ false │ v1.6.12 │ true │
╰──────────────────┴─────────────────────────────────┴───────┴────────┴─────────┴───────────╯

For just the member table without additional context:

nf-cluster list

Migrate to cluster mode

Installations created with nf-quickstart v1.0.0+ are already cluster-ready and can skip this step. For older installations, migrate the standalone controller to cluster mode before adding members:

nf-cluster migrate
warning

Migration requires a controller restart. This is a brief downtime event for the controller.

The command:

  1. Detects the primary controller automatically (does not assume a specific Helm release name).
  2. Checks the current cluster.mode — if already cluster-init, no action is needed.
  3. Prompts for a trust domain if one is not already configured (defaults to the controller's advertised address).
  4. Runs the two-phase migration: cluster-migrate then cluster-init.

Add a controller

Add a new controller to the cluster interactively:

nf-cluster add -h ctrl2.example.com

Or non-interactively with all options:

nf-cluster add -y -h ctrl2.example.com -p 1280

The command:

  1. Detects the primary controller and extracts its trust domain, advertised endpoint, edge root issuer, and trust bundle ConfigMap.
  2. Validates DNS resolution for the new controller's advertised address.
  3. Generates a values file (ziti-controller2-values.yml) with all required HA settings.
  4. Runs helm upgrade --install for the new controller.
  5. Waits for cert-manager to issue the web identity certificate.
  6. Restarts the new controller to complete the Raft cluster join.
  7. Verifies the join by querying the cluster member list.

If you already have a values file (for example, from a previous attempt or a custom configuration), pass it directly:

nf-cluster add -f controller2-values.yml

Add options

FlagDescription
-h <host>Advertised host for the new controller (required with -y)
-p <port>Advertised port (default: inherited from primary)
-N <name>Helm release name (default: auto-generated, e.g., ziti-controller2)
-f <file>Use a pre-built values file instead of generating one

Quorum considerations

An odd number of voting members is required for fault-tolerant quorum. A cluster with N voting members requires (N/2)+1 members to elect a leader and accept updates:

MembersTolerates failuresMinimum for quorum
101
202
312
523

Three nodes is the minimum recommended for production HA. nf-cluster add warns when the cluster has an even number of members. For more details on HA architecture, see the OpenZiti HA overview.

Support stack event logging

If the NetFoundry support stack is installed, nf-cluster add automatically detects it and includes the support stack event logging configuration on the new controller. No manual configuration is needed.

Remove a controller

Remove a joining controller from the cluster:

nf-cluster remove ziti-controller2

The command:

  1. Blocks removal of the primary controller (remove joining controllers first).
  2. Removes the node from the Raft cluster via the leader's agent.
  3. Uninstalls the Helm release.
  4. Optionally deletes the generated values file.

To see which controllers can be removed:

nf-cluster remove

This lists all controller releases in the namespace without removing anything.

Command reference

nf-cluster <command> [options]

Commands:
status Show cluster health and member details
list List cluster members
add Add a new controller to the cluster
remove <release> Remove a controller from the cluster
migrate Migrate a standalone controller to cluster mode

Global options:
-y Non-interactive mode
-n <namespace> Kubernetes namespace (default: ziti)

Troubleshooting

Controller join times out

The initial helm install may time out because the cluster join fails on the first pod startup — this is expected. Cert-manager issues certificates asynchronously, and the join requires valid certificates. The nf-cluster add command handles this automatically by waiting for certificates and restarting the pod.

If you are installing manually (without nf-cluster), wait for certificates and restart:

kubectl wait certificate.cert-manager.io/<release>-web-identity-cert \
--namespace ziti --for condition=Ready=True --timeout 120s

kubectl rollout restart deployment <release> -n ziti
kubectl rollout status deployment <release> -n ziti --timeout 120s

"unable to retrieve peer advertise address"

This error means the joining controller cannot determine its own advertised address. Verify that:

  • The clientApi.advertisedHost in the joining controller's values file resolves to an address that routes to that specific controller (not to the primary or a shared load balancer).
  • Each controller in the cluster has a unique clientApi.advertisedPort if they share the same hostname.

Primary controller not detected

nf-cluster identifies the primary by checking cluster.mode in Helm values. If the controller was installed with an older chart or without cluster configuration, run nf-cluster migrate first.