Migration

This document provides an outline for how to migrate NetFoundry On-Prem from one cluster to another. The migration feature is built around Velero and is an extension of backups and restore.

Prerequisites

awscli installed
velero_cli installed

Migration process

Step 1: Backup on-prem from existing cluster

Load the aws credentials into the environment

Install Velero

k3s

velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.8.0 \
    --bucket <S3_BUCKET_NAME> --features=EnableRestic --default-volumes-to-fs-backup --use-node-agent \
    --backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1 \
    --secret-file <credentials-file>

EKS

velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.8.0 \
    --bucket <S3_BUCKET_NAME> --features=EnableCSI --use-volume-snapshots=true \
    --backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1 \
    --secret-file <credentials-file>

Backup all resources in all namespaces including persistent volumes
```
 velero backup create <backup-name> --include-cluster-resources
```
This command backs up everything including the cluster resources.
Destroy the existing cluster

Step 2: Restore to new cluster

Create a new cluster for restore
Load the aws credentials into the environment
In the new cluster, Install Velero
Run the restore script and follow the prompts to select the backup to restore
```
./velero/velero_restore.sh
```

Note:

EKS - The DNS addresses for the controller advertise address and the router advertise address should be updated with new Load Balancer addresses.

K3S - The new cluster should be in the same node and default storageclass is to be set.

Step 3: Verifying The Restore in new cluster

Check that all deployments have come back online in the following 3 namespaces:

ziti

kubectl get deployments -n ziti
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
ziti-controller   1/1     1            1           78m
ziti-router-1     1/1     1            1           78m

cert-manager

kubectl get deployments -n cert-manager
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
cert-manager              1/1     1            1           78m
cert-manager-cainjector   1/1     1            1           78m
cert-manager-webhook      1/1     1            1           78m
trust-manager             1/1     1            1           78m

support

kubectl get deployments -n support
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
grafana            1/1     1            1           5m7s
kibana-kb          1/1     1            1           5m7s
logstash           1/1     1            1           5m7s
rabbitmq           1/1     1            1           5m7s
ziti-edge-tunnel   1/1     1            1           5m6s

Notes On Migration Issues

Common

In almost all installations, it will be necessary to restart the ziti-edge-tunnel deployment in the support namespace since the tunneler will likely come back online prior to the Ziti controller.

EKS

The Load Balancer addresses will be new after migrating to the new cluster. The DNS addresses for the controller advertise address and the router advertise address will likely need to be updated. The ziti-router-1 deployment will not come back online until it can successfully reach the controller over it's advertise address. This is normal in a restore scenario.

K3s

The trust-manager deployment in the cert-manager namespace can encounter an issue where it doesn't start back up after a restore. If this error exists in the logs: Error: container has runAsNonRoot and image has non-numeric user (cnb), cannot verify user is non-root, then the deployment may need to be updated to correct this problem after a restore.

To fix this error, run:
```
kubectl edit deployment/trust-manager -n cert-manager
```
and add the following line under the securityContext block:
```
        securityContext:
          # add this
          runAsUser: 1000
```
Save the file and restart the deploment by running:
```
kubectl rollout restart deployment trust-manager -n cert-manager
```
statefulset elasticsearch-es-elastic-nodes can encounter an issue where it doesn't start back up after a restore. Due to this accessing kibana might show the error "Kibana server is not ready yet" run the below to fix the error:
```
kubectl rollout restart statefulset elasticsearch-es-elastic-nodes -n support
```

Common Issues

Restore seems to have worked, but the restore job seems hung and never completes.

Run the following:

kubectl delete restore -n velero <restore name>
# If the above command hangs, it may also be necessary to cancel the above and run:
kubectl rollout restart deployments deployments/velero -n velero
``

Prerequisites​

Migration process​

Step 1: Backup on-prem from existing cluster​

k3s​

EKS​

Step 2: Restore to new cluster​

Step 3: Verifying The Restore in new cluster​

Notes On Migration Issues​

Common​