Authors: Ben Swartzlander (NetApp)
Kubernetes v1.22, released earlier this month, introduced a redesigned approach for volume populators. Originally implemented in v1.18, the API suffered from backwards compatibility issues. Kubernetes v1.22 includes a new API field called
dataSourceRef that fixes these problems.
Earlier Kubernetes releases already added a
dataSource field into thePersistentVolumeClaim API, used for cloning volumes and creating volumes from snapshots. You could use the
dataSource field when creating a new PVC, referencing either an existing PVC or a VolumeSnapshot in the same namespace. That also modified the normal provisioning process so that instead of yielding an empty volume, the new PVC contained the same data as either the cloned PVC or the cloned VolumeSnapshot.
Volume populators embrace the same design idea, but extend it to any type of object, as long as there exists a custom resourceto define the data source, and a populator controller to implement the logic. Initially, the
dataSource field was directly extended to allow arbitrary objects, if the
AnyVolumeDataSourcefeature gate was enabled on a cluster. That change unfortunately caused backwards compatibility problems, and so the new
dataSourceRef field was born.
In v1.22 if the
AnyVolumeDataSource feature gate is enabled, the
dataSourceRef field is added, which behaves similarly to the
dataSource field except that it allows arbitrary objects to be specified. The API server ensures that the two fields always have the same contents, and neither of them are mutable. The differences is that at creation time
dataSource allows only PVCs or VolumeSnapshots, and ignores all other values, while
dataSourceRef allows most types of objects, and in the few cases it doesn't allow an object (core objects other than PVCs) a validation error occurs.
When this API change graduates to stable, we would deprecate using
dataSource and recommend using
dataSourceRef field for all use cases. In the v1.22 release,
dataSourceRef is available (as an alpha feature) specifically for cases where you want to use for custom volume populators.
Every volume populator must have one or more CRDs that it supports. Administrators may install the CRD and the populator controller and then PVCs with a
dataSourceRef specifies a CR of the type that the populator supports will be handled by the populator controller instead of the CSI driver directly.
Underneath the covers, the CSI driver is still invoked to create an empty volume, which the populator controller fills with the appropriate data. The PVC doesn't bind to the PV until it's fully populated, so it's safe to define a whole application manifest including pod and PVC specs and the pods won't begin running until everything is ready, just as if the PVC was a clone of another PVC or VolumeSnapshot.
PVCs with data sources are still noticed by the external-provisioner sidecar for the related storage class (assuming a CSI provisioner is used), but because the sidecar doesn't understand the data source kind, it doesn't do anything. The populator controller is also watching for PVCs with data sources of a kind that it understands and when it sees one, it creates a temporary PVC of the same size, volume mode, storage class, and even on the same topology (if topology is used) as the original PVC. The populator controller creates a worker pod that attaches to the volume and writes the necessary data to it, then detaches from the volume and the populator controller rebinds the PV from the temporary PVC to the orignal PVC.
The following things are required to use volume populators:
- Enable the
- Install a CRD for the specific data source / populator
- Install the populator controller itself
Populator controllers may use the lib-volume-populatorlibrary to do most of the Kubernetes API level work. Individual populators only need to provide logic for actually writing data into the volume based on a particular CR instance. This library provides a sample populator implementation.
These optional components improve user experience:
- Install the VolumePopulator CRD
- Create a VolumePopulator custom respource for each specific data source
- Install the volume data source validatorcontroller (alpha)
The purpose of these components is to generate warning events on PVCs with data sources for which there is no populator.
To see how this works, you can install the sample "hello" populator and try it out.
First install the volume-data-source-validator controller.
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/master/client/config/crd/populator.storage.k8s.io_volumepopulators.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/master/deploy/kubernetes/rbac-data-source-validator.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/master/deploy/kubernetes/setup-data-source-validator.yaml
Next install the example populator.
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/lib-volume-populator/master/example/hello-populator/crd.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/lib-volume-populator/master/example/hello-populator/deploy.yaml
Create an instance of the
Hello CR, with some text.
apiVersion: hello.k8s.io/v1alpha1 kind: Hello metadata: name: example-hello spec: fileName: example.txt fileContents: Hello, world!
Create a PVC that refers to that CR as its data source.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: example-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Mi dataSourceRef: apiGroup: hello.k8s.io kind: Hello name: example-hello volumeMode: Filesystem
Next, run a job that reads the file in the PVC.
apiVersion: batch/v1 kind: Job metadata: name: example-job spec: template: spec: containers: - name: example-container image: busybox:latest command: - cat - /mnt/example.txt volumeMounts: - name: vol mountPath: /mnt restartPolicy: Never volumes: - name: vol persistentVolumeClaim: claimName: example-pvc
Wait for the job to complete (including all of its dependencies).
kubectl wait --for=condition=Complete job/example-job
And last examine the log from the job.
kubectl logs job/example-job Hello, world!
Note that the volume already contained a text file with the string contents from the CR. This is only the simplest example. Actual populators can set up the volume to contain arbitrary contents.
Developers interested in writing new poplators are encouraged to use thelib-volume-populator library and to only supply a small controller wrapper around the library, and a pod image capable of attaching to volumes and writing the appropriate data to the volume.
Individual populators can be extremely generic such that they work with every type of PVC, or they can do vendor specific things to rapidly fill a volume with data if the volume was provisioned by a specific CSI driver from the same vendor, for example, by communicating directly with the storage for that volume.
As this feature is still in alpha, we expect to update the out of tree controllers with more tests and documentation. The community plans to eventually re-implement the populator library as a sidecar, for ease of operations.
We hope to see some official community-supported populators for some widely-shared use cases. Also, we expect that volume populators will be used by backup vendors as a way to "restore" backups to volumes, and possibly a standardized API to do this will evolve.
The enhancement proposal,Volume Populators, includes lots of detail about the history and technical implementation of this feature.
Volume populators and data sources, within the documentation topic about persistent volumes, explains how to use this feature in your cluster.
Please get involved by joining the Kubernetes storage SIG to help us enhance this feature. There are a lot of good ideas already and we'd be thrilled to have more!