HA cluster management
NetFoundry Self-Hosted supports high-availability (HA) controller clusters using the OpenZiti Raft consensus protocol.
The nf-cluster command manages the full lifecycle of an HA cluster — checking status, adding controllers, removing
controllers, and migrating a standalone controller to cluster mode.
The nf-cluster command and HA cluster management features are currently in beta. Functionality and command syntax
may change in future releases. Please report any issues to NetFoundry support.
HA requires the v3 ziti-controller Helm chart (^3.0.0), which uses a unified PKI with a single root of trust. All
controllers in a cluster must run the same Ziti version.
Prerequisites
- Existing standalone controller deployed with the v3 chart
- DNS entries configured for each additional controller's advertise address
helm,kubectl, andjqavailable on the host
Check cluster status
View the current cluster state, including the primary controller, trust domain, and all cluster members:
nf-cluster status
Example output:
Cluster Status
Primary release: ziti-controller
Trust domain: ziti.example.com
Cluster mode: cluster-init
Controller count: 3
Cluster Members (from leader agent):
╭──────────────────┬─────────────────────────────────┬───────┬────────┬─────────┬───────────╮
│ ID │ ADDRESS │ VOTER │ LEADER │ VERSION │ CONNECTED │
├──────────────────┼─────────────────────────────────┼───────┼────────┼─────────┼───────────┤
│ ziti-controller │ tls:ctrl1.example.com:1280 │ true │ true │ v1.6.12 │ true │
│ ziti-controller2 │ tls:ctrl2.example.com:1280 │ true │ false │ v1.6.12 │ true │
│ ziti-controller3 │ tls:ctrl3.example.com:1280 │ true │ false │ v1.6.12 │ true │
╰──────────────────┴─────────────────────────────────┴───────┴────────┴─────────┴───────────╯
For just the member table without additional context:
nf-cluster list
Migrate to cluster mode
Installations created with nf-quickstart v1.0.0+ are already cluster-ready and can skip this step. For older
installations, migrate the standalone controller to cluster mode before adding members:
nf-cluster migrate
Migration requires a controller restart. This is a brief downtime event for the controller.
The command:
- Detects the primary controller automatically (does not assume a specific Helm release name).
- Checks the current
cluster.mode— if alreadycluster-init, no action is needed. - Prompts for a trust domain if one is not already configured (defaults to the controller's advertised address).
- Runs the two-phase migration:
cluster-migratethencluster-init.
Add a controller
Add a new controller to the cluster interactively:
nf-cluster add -h ctrl2.example.com
Or non-interactively with all options:
nf-cluster add -y -h ctrl2.example.com -p 1280
The command:
- Detects the primary controller and extracts its trust domain, advertised endpoint, edge root issuer, and trust bundle ConfigMap.
- Validates DNS resolution for the new controller's advertised address.
- Generates a values file (
ziti-controller2-values.yml) with all required HA settings. - Runs
helm upgrade --installfor the new controller. - Waits for cert-manager to issue the web identity certificate.
- Restarts the new controller to complete the Raft cluster join.
- Verifies the join by querying the cluster member list.
If you already have a values file (for example, from a previous attempt or a custom configuration), pass it directly:
nf-cluster add -f controller2-values.yml
Add options
| Flag | Description |
|---|---|
-h <host> | Advertised host for the new controller (required with -y) |
-p <port> | Advertised port (default: inherited from primary) |
-N <name> | Helm release name (default: auto-generated, e.g., ziti-controller2) |
-f <file> | Use a pre-built values file instead of generating one |
Quorum considerations
An odd number of voting members is required for fault-tolerant quorum. A cluster with N voting members requires (N/2)+1 members to elect a leader and accept updates:
| Members | Tolerates failures | Minimum for quorum |
|---|---|---|
| 1 | 0 | 1 |
| 2 | 0 | 2 |
| 3 | 1 | 2 |
| 5 | 2 | 3 |
Three nodes is the minimum recommended for production HA. nf-cluster add warns when the cluster has an even number
of members. For more details on HA architecture, see the
OpenZiti HA overview.
Support stack event logging
If the NetFoundry support stack is installed, nf-cluster add automatically detects it and includes the support stack
event logging configuration on the new controller. No manual configuration is needed.
Remove a controller
Remove a joining controller from the cluster:
nf-cluster remove ziti-controller2
The command:
- Blocks removal of the primary controller (remove joining controllers first).
- Removes the node from the Raft cluster via the leader's agent.
- Uninstalls the Helm release.
- Optionally deletes the generated values file.
To see which controllers can be removed:
nf-cluster remove
This lists all controller releases in the namespace without removing anything.
Command reference
nf-cluster <command> [options]
Commands:
status Show cluster health and member details
list List cluster members
add Add a new controller to the cluster
remove <release> Remove a controller from the cluster
migrate Migrate a standalone controller to cluster mode
Global options:
-y Non-interactive mode
-n <namespace> Kubernetes namespace (default: ziti)
Troubleshooting
Controller join times out
The initial helm install may time out because the cluster join fails on the first pod startup — this is expected.
Cert-manager issues certificates asynchronously, and the join requires valid certificates. The nf-cluster add command
handles this automatically by waiting for certificates and restarting the pod.
If you are installing manually (without nf-cluster), wait for certificates and restart:
kubectl wait certificate.cert-manager.io/<release>-web-identity-cert \
--namespace ziti --for condition=Ready=True --timeout 120s
kubectl rollout restart deployment <release> -n ziti
kubectl rollout status deployment <release> -n ziti --timeout 120s
"unable to retrieve peer advertise address"
This error means the joining controller cannot determine its own advertised address. Verify that:
- The
clientApi.advertisedHostin the joining controller's values file resolves to an address that routes to that specific controller (not to the primary or a shared load balancer). - Each controller in the cluster has a unique
clientApi.advertisedPortif they share the same hostname.
Primary controller not detected
nf-cluster identifies the primary by checking cluster.mode in Helm values. If the controller was installed with
an older chart or without cluster configuration, run nf-cluster migrate first.