Skip to main content

Restoring From Backup

The ./velero/velero_restore.sh script will step through the following:

  • Checks if AWS credentials are set
  • Installs the velero plugin to the velero namespace in the cluster if not already installed
  • Displays the list of available backups for selection
  • Restores the resources based on the selection
Important Note For Restoring the Ziti Controller PVC

In order for Velero to restore the Ziti controller PVC from the backup, it first needs to delete the existing PVC. The restore script will prompt for this option. If n is selected, the restore will skip restoring the PVC but restore all other resources. By default, Velero will skip restoring a resource if it already exists. See the restore reference documentation for more information.

Run the restore script and follow the prompts to select which backup you wish to restore from:

./velero/velero_restore.sh

Restores can also be run manually if you need to leverage specific flags from Velero:

velero restore create --from-backup <BACKUP NAME>

Verifying The Restore

Check that all deployments have come back online in the following 3 namespaces:

kubectl get deployments -n ziti
NAME READY UP-TO-DATE AVAILABLE AGE
ziti-controller 1/1 1 1 78m
ziti-router-1 1/1 1 1 78m
kubectl get deployments -n cert-manager
NAME READY UP-TO-DATE AVAILABLE AGE
cert-manager 1/1 1 1 78m
cert-manager-cainjector 1/1 1 1 78m
cert-manager-webhook 1/1 1 1 78m
trust-manager 1/1 1 1 78m
kubectl get deployments -n support
NAME READY UP-TO-DATE AVAILABLE AGE
grafana 1/1 1 1 5m7s
kibana-kb 1/1 1 1 5m7s
logstash 1/1 1 1 5m7s
rabbitmq 1/1 1 1 5m7s
ziti-edge-tunnel 1/1 1 1 5m6s

Notes On Restoring From Backup

Different issues can arise from restoring from backup depending on which Kubernetes provider is being used.

Common

  • In almost all installations, it will be necessary to restart the ziti-edge-tunnel deployment in the support namespace since the tunneler will likely come back online prior to the Ziti controller.
  • If the DNS address changes for the Ziti controller advertise address or the Edge Router advertise address, it may take a few minutes for client resources to come back online. For any hosting router or identity, a process restart will accelerate their recovery.

EKS

  • The Load Balancer addresses will likely change after restoring from backup. The DNS addresses for the controller advertise address and the router advertise address will likely need to be updated. The ziti-router-1 deployment will not come back online until it can successfully reach the controller over it's advertise address. This is normal in a restore scenario.

K3s

  • The trust-manager deployment in the cert-manager namespace can encounter an issue where it doesn't start back up after a restore. If this error exists in the logs: Error: container has runAsNonRoot and image has non-numeric user (cnb), cannot verify user is non-root, then the deployment may need to be updated to correct this problem after a restore.

    To fix this error, run:

    kubectl edit deployment/trust-manager -n cert-manager

    and add the following line under the securityContext block:

            securityContext:
    # add this
    runAsUser: 1000

    Save the file and restart the deploment by running:

    kubectl rollout restart deployment trust-manager -n cert-manager

Common Issues

  • Restore seems to have worked, but the restore job seems hung and never completes.

    Run the following:

    kubectl delete restore -n velero <restore name>
    # If the above command hangs, it may also be necessary to cancel the above and run:
    kubectl rollout restart deployments deployments/velero -n velero
    ``