Moving Kubernetes PersistentVolumeClaims between clusters across different regions
Limitations encountered moving a PersistentVolumeClaim across Google Kubernetes Engine clusters, and how to work around them.
Posted Nov 25, 2019 in PostsKubernetes Persistent Volume Claims (PVCs) abstract storage provisioning, and also support volume resize - but volume migration across clusters in different regions is not yet supported out of the box.
GKE and persistent storage
If you’re a Google Kubernetes Engine user, you’ve probably deployed some application that requires persistent storage, like a database. You’ll have noticed that, at least on GKE, there are mainly two recommended ways to manage the storage volume from Kubernetes:
Creating a Google Compute Persistent Disk: first
gcloud compute disks create <name> --size <size in GB> (--zone|--region) <zone or region>
to create the volume, then usinggcePersistentDisk
in your YAML manifest to mount the GCE-managed disk;Setting up a StorageClass (or using the default) and creating a
PersistentVolumeClaim
in your YAML manifest.
Each option has a few advantages and drawbacks:
using a GCE PD makes moving a disk between regions and clusters easy: just
gcloud compute disks move <disk> --zone <source> --destination-zone <destination>
, then remount the disk in the other Kubernetes cluster; on the other hand, it doesn’t support automatic volume resizing: in fact, resizing can get quite complicated as usinggcloud compute disks resize
would expand the disk but not the partition;using a Kubernetes PVC makes resizing easy and effortless, at least since v1.11, and better integrates operations in the developer workflow - as it’s just a matter of changing a line in your YAML; however, migrating PVCs from a cluster to another, possibly in a different zone or region, is not supported out of the box as the underlying PVC and GCE PD references use semi-random UIDs.
So what to do if you want easy resizing, thus choose to use a PVC, and then need to move the PVC from a cluster to another (maybe in a different GCP zone)? How do you move the data? To work out the steps, we first need to understand how PVCs/PVs/PDs are managed in GKE.
How GKE PVC, PV and GCE PD are created
Generally speaking, when you create a PersistentVolumeClaim
in GKE (for example in us-centra1-a
), Kubernetes takes care of creating a corresponding PersistentVolume
, which in turn creates a GCE PD. Take the following example:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: db-data
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
After creation with kubectl apply
, some new attributes pop up:
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
finalizers:
- kubernetes.io/pvc-protection
uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f
spec:
storageClassName: standard
volumeMode: Filesystem
volumeName: pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
status:
phase: Bound
The following PersistentVolume
is also created as a result:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/createdby: gce-pd-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: us-central1
failure-domain.beta.kubernetes.io/zone: us-central1-a
name: pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 200Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: db-data
uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f
gcePersistentDisk:
fsType: ext4
pdName: gke-cluster-name-pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- us-central1-a
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- us-central1
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
volumeMode: Filesystem
status:
phase: Bound
A Google Compute Engine Persistent Disk is also created in us-central1-a
, called gke-cluster-name-pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
and of the corresponding size and type.
In summary, the final state is more or less the following:
- A PersistentVolumeClaim with
metadata.uid: deadbeef-dead-b33f-8ee7-be3f5e4db37f
andvolumeName: pvc-deadbeef-dead-b33f-8ee7-be3f5e4db37f
- A PersistentVolume called
pvc-<uid>
and the same UID as the PVC, with aclaimRef
to the PVC using the UID and name as selector, mountinggcePersistentDisk.pdName: gke-<cluster name>-pvc-<uid>
, and with the appropriate labels andnodeAffinity
selectors for zone/region failure domains - A GCE PersistentDisk called
gke-<cluster name>-pvc-<uid>
of the requested size.
In practice, this means that recreating the same binding of PVC with PV and GCE PD basically consists of cloning the definitions and moving the PD.
Migrating the PVC, PV and PD
First, we need to clean up the YAML returned by kubectl get pvc <name> -o yaml
and remove some of the state-dependent metadata set by the resource controllers.
The ones common to both PVC and PV:
metadata.annotations."kubectl.kubernetes.io/last-applied-configuration"
metadata.creationTimestamp
metadata.resourceVersion
metadata.selfLink
metadata.uid
status
PVC-specific ones:
metadata.annotations."pv.kubernetes.io/bind-completed"
metadata.annotations."pv.kubernetes.io/bound-by-controller"
spec.dataSource # not sure about this one, but from what I've observed it's safe to remove
And the PV-specific ones:
metadata.annotations."pv.kubernetes.io/bound-by-controller"
spec.claimRef
For these simple YAML mutations I generally use mikefarah’s yq tool (available on GitHub), which makes removing an attribute like metadata.selfLink
as easy as yq d metadata.selfLink
(repeat for each attribute).
Now that we have ready-to-create YAML definitions of the PVC and PV, we must replace the region/zone names; this is as simple as cat pvc.yaml | sed -s 's/<src zone>/<dst zone>/g | sed -s 's/<src region>/<dst region>/g'
.
With the YAMLs clean and patched, we can move the GCE Persistent Disk with gcloud compute disks move gke-<cluster name>-pvc-<uid> --zone <src zone> --destination-zone <dst zone>
; this step can take several hours depending on the size of your volume.
Finally, we can recreate the PV and PVC with kubectl apply
, and check that their status transitions to Bound
(I also like to create a Pod mounting the volume and ls
into it). You’re done! Now just delete the old PVC and PV from the source Kubernetes cluster to cleanup.
Final notes
Although PersistentVolumeClaims and PersistentVolume greatly help to simplify storage privisioning for stateful Kubernetes applications, there are still operational gaps in their lifecycle management.
Moving a simple PVC can require various steps, compared to a simple GCE disk move operation, and the process can be error-prone and costly (mainly cross-region PD data transfer costs). Crucially, this solution causes downtime while the GCE PD is being moved and is thus unavailable.
Some of the ideas for the safe rebinding of volumes in GKE were taken from this Kubernetes issue on GitHub.