Moving Kubernetes PersistentVolumeClaims between clusters across different regions

Limitations encountered moving a PersistentVolumeClaim across Google Kubernetes Engine clusters, and how to work around them.

Posted Nov 25, 2019 in Posts

Kubernetes Persistent Volume Claims (PVCs) abstract storage provisioning, and also support volume resize - but volume migration across clusters in different regions is not yet supported out of the box.

GKE and persistent storage

If you’re a Google Kubernetes Engine user, you’ve probably deployed some application that requires persistent storage, like a database. You’ll have noticed that, at least on GKE, there are mainly two recommended ways to manage the storage volume from Kubernetes:

  1. Creating a Google Compute Persistent Disk: first gcloud compute disks create <name> --size <size in GB> (--zone|--region) <zone or region> to create the volume, then using gcePersistentDisk in your YAML manifest to mount the GCE-managed disk;

  2. Setting up a StorageClass (or using the default) and creating a PersistentVolumeClaim in your YAML manifest.

Each option has a few advantages and drawbacks:

  1. using a GCE PD makes moving a disk between regions and clusters easy: just gcloud compute disks move <disk> --zone <source> --destination-zone <destination>, then remount the disk in the other Kubernetes cluster; on the other hand, it doesn’t support automatic volume resizing: in fact, resizing can get quite complicated as using gcloud compute disks resize would expand the disk but not the partition;

  2. using a Kubernetes PVC makes resizing easy and effortless, at least since v1.11, and better integrates operations in the developer workflow - as it’s just a matter of changing a line in your YAML; however, migrating PVCs from a cluster to another, possibly in a different zone or region, is not supported out of the box as the underlying PVC and GCE PD references use semi-random UIDs.

So what to do if you want easy resizing, thus choose to use a PVC, and then need to move the PVC from a cluster to another (maybe in a different GCP zone)? How do you move the data? To work out the steps, we first need to understand how PVCs/PVs/PDs are managed in GKE.

How GKE PVC, PV and GCE PD are created

Generally speaking, when you create a PersistentVolumeClaim in GKE (for example in us-centra1-a), Kubernetes takes care of creating a corresponding PersistentVolume, which in turn creates a GCE PD. Take the following example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: db-data
spec:
  storageClassName: standard
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi

After creation with kubectl apply, some new attributes pop up:

metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
  finalizers:
  - kubernetes.io/pvc-protection
  uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f
spec:
  storageClassName: standard
  volumeMode: Filesystem
  volumeName: pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
status:
  phase: Bound

The following PersistentVolume is also created as a result:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: gce-pd-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    failure-domain.beta.kubernetes.io/region: us-central1
    failure-domain.beta.kubernetes.io/zone: us-central1-a
  name: pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
  uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 200Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: db-data
    uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f
  gcePersistentDisk:
    fsType: ext4
    pdName: gke-cluster-name-pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - us-central1-a
        - key: failure-domain.beta.kubernetes.io/region
          operator: In
          values:
          - us-central1
  persistentVolumeReclaimPolicy: Retain
  storageClassName: standard
  volumeMode: Filesystem
status:
  phase: Bound

A Google Compute Engine Persistent Disk is also created in us-central1-a, called gke-cluster-name-pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f and of the corresponding size and type.

In summary, the final state is more or less the following:

  1. A PersistentVolumeClaim with metadata.uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f and volumeName: pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
  2. A PersistentVolume called pvc-<uid> and the same UID as the PVC, with a claimRef to the PVC using the UID and name as selector, mounting gcePersistentDisk.pdName: gke-<cluster name>-pvc-<uid>, and with the appropriate labels and nodeAffinity selectors for zone/region failure domains
  3. A GCE PersistentDisk called gke-<cluster name>-pvc-<uid> of the requested size.

In practice, this means that recreating the same binding of PVC with PV and GCE PD basically consists of cloning the definitions and moving the PD.

Migrating the PVC, PV and PD

First, we need to clean up the YAML returned by kubectl get pvc <name> -o yaml and remove some of the state-dependent metadata set by the resource controllers.

The ones common to both PVC and PV:

metadata.annotations."kubectl.kubernetes.io/last-applied-configuration"
metadata.creationTimestamp
metadata.resourceVersion
metadata.selfLink
metadata.uid
status

PVC-specific ones:

metadata.annotations."pv.kubernetes.io/bind-completed"
metadata.annotations."pv.kubernetes.io/bound-by-controller"
spec.dataSource # not sure about this one, but from what I've observed it's safe to remove

And the PV-specific ones:

metadata.annotations."pv.kubernetes.io/bound-by-controller"
spec.claimRef

For these simple YAML mutations I generally use mikefarah’s yq tool (available on GitHub), which makes removing an attribute like metadata.selfLink as easy as yq d metadata.selfLink (repeat for each attribute).

Now that we have ready-to-create YAML definitions of the PVC and PV, we must replace the region/zone names; this is as simple as cat pvc.yaml | sed -s 's/<src zone>/<dst zone>/g | sed -s 's/<src region>/<dst region>/g'.

With the YAMLs clean and patched, we can move the GCE Persistent Disk with gcloud compute disks move gke-<cluster name>-pvc-<uid> --zone <src zone> --destination-zone <dst zone>; this step can take several hours depending on the size of your volume.

Finally, we can recreate the PV and PVC with kubectl apply, and check that their status transitions to Bound (I also like to create a Pod mounting the volume and ls into it). You’re done! Now just delete the old PVC and PV from the source Kubernetes cluster to cleanup.

Final notes

Although PersistentVolumeClaims and PersistentVolume greatly help to simplify storage privisioning for stateful Kubernetes applications, there are still operational gaps in their lifecycle management.

Moving a simple PVC can require various steps, compared to a simple GCE disk move operation, and the process can be error-prone and costly (mainly cross-region PD data transfer costs). Crucially, this solution causes downtime while the GCE PD is being moved and is thus unavailable.

Some of the ideas for the safe rebinding of volumes in GKE were taken from this Kubernetes issue on GitHub.