Blog: Kubernetes 1.27: Efficient SELinux volume relabeling (Beta)

Author: Jan Šafránek (Red Hat)

The problem

On Linux with Security-Enhanced Linux (SELinux) enabled, it's traditionally the container runtime that applies SELinux labels to a Pod and all its volumes. Kubernetes only passes the SELinux label from a Pod's securityContext fields to the container runtime.

The container runtime then recursively changes SELinux label on all files that are visible to the Pod's containers. This can be time-consuming if there are many files on the volume, especially when the volume is on a remote filesystem.

Note

If a container uses subPath of a volume, only that subPath of the whole volume is relabeled. This allows two pods that have two different SELinux labels to use the same volume, as long as they use different subpaths of it.

If a Pod does not have any SELinux label assigned in Kubernetes API, the container runtime assigns a unique random one, so a process that potentially escapes the container boundary cannot access data of any other container on the host. The container runtime still recursively relabels all pod volumes with this random SELinux label.

Improvement using mount options

If a Pod and its volume meet all of the following conditions, Kubernetes will_mount_ the volume directly with the right SELinux label. Such mount will happen in a constant time and the container runtime will not need to recursively relabel any files on it.

The operating system must support SELinux.
The feature gatesReadWriteOncePod and SELinuxMountReadWriteOncePod must be enabled. These feature gates are Beta in Kubernetes 1.27 and Alpha in 1.25.
The Pod must have at least seLinuxOptions.level assigned in its Pod Security Context or all Pod containers must have it set in their Security Contexts. Kubernetes will read the default user, role and type from the operating system defaults (typically system_u, system_r and container_t).
The volume must be a Persistent Volume withAccess ModeReadWriteOncePod.
The volume plugin or the CSI driver responsible for the volume supports mounting with SELinux mount options.

Mounting with SELinux context

When all aforementioned conditions are met, kubelet will pass -o context=<SELinux label> mount option to the volume plugin or CSI driver. CSI driver vendors must ensure that this mount option is supported by their CSI driver and, if necessary, the CSI driver appends other mount options that are needed for -o context to work.

For example, NFS may need -o context=<SELinux label>,nosharecache, so each volume mounted from the same NFS server can have a different SELinux label value. Similarly, CIFS may need -o context=<SELinux label>,nosharesock.

It's up to the CSI driver vendor to test their CSI driver in a SELinux enabled environment before setting seLinuxMount: true in the CSIDriver instance.

How can I learn more?

SELinux in containers: see excellentvisual SELinux guideby Daniel J Walsh. Note that the guide is older than Kubernetes, it describes_Multi-Category Security_ (MCS) mode using virtual machines as an example, however, a similar concept is used for containers.

See a series of blog posts for details how exactly SELinux is applied to containers by container runtimes:

Read the KEP: Speed up SELinux volume relabeling using mounts