Authors: Jim Angel (Google), Pushkar Joglekar (VMware), and Savitha Raghunathan (Red Hat)
The open source tools listed in this article are to serve as examples only and are in no way a direct recommendation from the Kubernetes community or authors.
USA's National Security Agency (NSA) and the Cybersecurity and Infrastructure Security Agency (CISA) released, "Kubernetes Hardening Guidance" on August 3rd, 2021. The guidance details threats to Kubernetes environments and provides secure configuration guidance to minimize risk.
The following sections of this blog correlate to the sections in the NSA/CISA guidance. Any missing sections are skipped because of limited opportunities to add anything new to the existing content.
Note: This blog post is not a substitute for reading the guide. Reading the published guidance is recommended before proceeding as the following content is complementary.
Note that the threats identified as important by the NSA/CISA, or the intended audience of this guidance, may be different from the threats that other enterprise users of Kubernetes consider important. This section is still useful for organizations that care about data, resource theft and service unavailability.
The guidance highlights the following three sources of compromises:
- Supply chain risks
- Malicious threat actors
- Insider threats (administrators, users, or cloud service providers)
The threat model tries to take a step back and review threats that not only exist within the boundary of a Kubernetes cluster but also include the underlying infrastructure and surrounding workloads that Kubernetes does not manage.
For example, when a workload outside the cluster shares the same physical network, it has access to the kubelet and to control plane components: etcd, controller manager, scheduler and API server. Therefore, the guidance recommends having network level isolation separating Kubernetes clusters from other workloads that do not need connectivity to Kubernetes control plane nodes. Specifically, scheduler, controller-manager, etcd only need to be accessible to the API server. Any interactions with Kubernetes from outside the cluster can happen by providing access to API server port.
List of ports and protocols for each of these components are defined in Ports and Protocolswithin the Kubernetes documentation.
Special note: kube-scheduler and kube-controller-manager uses different ports than the ones mentioned in the guidance
Kubernetes by default does not guarantee strict workload isolation between pods running in the same node in a cluster. However, the guidance provides several techniques to enhance existing isolation and reduce the attack surface in case of a compromise.
Several best practices related to basic security principle of least privilege i.e. provide only the permissions are needed; no more, no less, are worth a second look.
The guide recommends setting non-root user at build time instead of relying on setting
runAsUser at runtime in your Pod spec. This is a good practice and provides some level of defense in depth. For example, if the container image is built with user
10001and the Pod spec misses adding the
runAsuser field in its
Deployment object. In this case there are certain edge cases that are worth exploring for awareness:
- Pods can fail to start, if the user defined at build time is different from the one defined in pod spec and some files are as a result inaccessible.
- Pods can end up sharing User IDs unintentionally. This can be problematic even if the User IDs are non-zero in a situation where a container escape to host file system is possible. Once the attacker has access to the host file system, they get access to all the file resources that are owned by other unrelated pods that share the same UID.
- Pods can end up sharing User IDs, with other node level processes not managed by Kubernetes e.g. node level daemons for auditing, vulnerability scanning, telemetry. The threat is similar to the one above where host file system access can give attacker full access to these node level daemons without needing to be root on the node.
However, none of these cases will have as severe an impact as a container running as root being able to escape as a root user on the host, which can provide an attacker with complete control of the worker node, further allowing lateral movement to other worker or control plane nodes.
Kubernetes 1.22 introduced an alpha featurethat specifically reduces the impact of such a control plane component running as root user to a non-root user through user namespaces.
That (alpha stage) support for user namespaces / rootless mode is available with the following container runtimes:
Some distributions support running in rootless mode, like the following:
The NSA/CISA Kubernetes Hardening Guidance highlights an often overlooked feature
readOnlyRootFileSystem, with a working example in Appendix B. This example limits execution and tampering of containers at runtime. Any read/write activity can then be limited to few directories by using
tmpfs volume mounts.
However, some applications that modify the container filesystem at runtime, like exploding a WAR or JAR file at container startup, could face issues when enabling this feature. To avoid this issue, consider making minimal changes to the filesystem at runtime when possible.
Kubernetes Hardening Guidance also recommends running a scanner at deploy time as an admission controller, to prevent vulnerable or misconfigured pods from running in the cluster. Theoretically, this sounds like a good approach but there are several caveats to consider before this can be implemented in practice:
- Depending on network bandwidth, available resources and scanner of choice, scanning for vulnerabilities for an image can take an indeterminate amount of time. This could lead to slower or unpredictable pod start up times, which could result in spikes of unavailability when apps are serving peak load.
- If the policy that allows or denies pod startup is made using incorrect or incomplete data it could result in several false positive or false negative outcomes like the following:
- inside a container image, the
opensslpackage is detected as vulnerable. However, the application is written in Golang and uses the Go
cryptopackage for TLS. Therefore, this vulnerability is not in the code execution path and as such has minimal impact if it remains unfixed.
- A vulnerability is detected in the
opensslpackage for a Debian base image. However, the upstream Debian community considers this as a Minor impact vulnerability and as a result does not release a patch fix for this vulnerability. The owner of this image is now stuck with a vulnerability that cannot be fixed and a cluster that does not allow the image to run because of predefined policy that does not take into account whether the fix for a vulnerability is available or not
- A Golang app is built on top of a distrolessimage, but it is compiled with a Golang version that uses a vulnerable standard library. The scanner has no visibility into golang version but only on OS level packages. So it allows the pod to run in the cluster in spite of the image containing an app binary built on vulnerable golang.
- inside a container image, the
To be clear, relying on vulnerability scanners is absolutely a good idea but policy definitions should be flexible enough to allow:
- Creation of exception lists for images or vulnerabilities through labelling
- Overriding the severity with a risk score based on impact of a vulnerability
- Applying the same policies at build time to catch vulnerable images with fixable vulnerabilities before they can be deployed into Kubernetes clusters
Special considerations like offline vulnerability database fetch, may also be needed, if the clusters run in an air-gapped environment and the scanners require internet access to update the vulnerability database.
Since Kubernetes v1.21, the PodSecurityPolicyAPI and related features are deprecated, but some of the guidance in this section will still apply for the next few years, until cluster operators upgrade their clusters to newer Kubernetes versions.
The Kubernetes project is working on a replacement for PodSecurityPolicy. Kubernetes v1.22 includes an alpha feature called Pod Security Admissionthat is intended to allow enforcing a minimum level of isolation between pods.
Information about migrating from PodSecurityPolicy to the Pod Security Admission feature is available inMigrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller.
One important behavior mentioned in the guidance that remains the same between Pod Security Policy and its replacement is that enforcing either of them does not affect pods that are already running. With both PodSecurityPolicy and Pod Security Admission, the enforcement happens during the pod creation stage.
Some container workloads are less trusted than others but may need to run in the same cluster. In those cases, running them on dedicated nodes that include hardened container runtimes that provide stricter pod isolation boundaries can act as a useful security control.
Kubernetes supports an API called RuntimeClass that is stable / GA (and, therefore, enabled by default) stage as of Kubernetes v1.20. RuntimeClass allows you to ensure that Pods requiring strong isolation are scheduled onto nodes that can offer it.
Some third-party projects that you can use in conjunction with RuntimeClass are:
As discussed here and in the guidance, many features and tooling exist in and around Kubernetes that can enhance the isolation boundaries between pods. Based on relevant threats and risk posture, you should pick and choose between them, instead of trying to apply all the recommendations. Having said that, cluster level isolation i.e. running workloads in dedicated clusters, remains the strictest workload isolation mechanism, in spite of improvements mentioned earlier here and in the guide.
Kubernetes Networking can be tricky and this section focuses on how to secure and harden the relevant configurations. The guide identifies the following as key takeaways:
- Using NetworkPolicies to create isolation between resources,
- Securing the control plane
- Encrypting traffic and sensitive data
Network policies can be created with the help of network plugins. In order to make the creation and visualization easier for users, Cilium supports a web GUI tool. That web GUI lets you create Kubernetes NetworkPolicies (a generic API that nevertheless requires a compatible CNI plugin), and / or Cilium network policies (CiliumClusterwideNetworkPolicy and CiliumNetworkPolicy, which only work in clusters that use the Cilium CNI plugin). You can use these APIs to restrict network traffic between pods, and therefore minimize the attack vector.
Another scenario that is worth exploring is the usage of external IPs. Some services, when misconfigured, can create random external IPs. An attacker can take advantage of this misconfiguration and easily intercept traffic. This vulnerability has been reported in CVE-2020-8554. Using externalip-webhookcan mitigate this vulnerability by preventing the services from using random external IPs. externalip-webhookonly allows creation of services that don't require external IPs or whose external IPs are within the range specified by the administrator.
CVE-2020-8554 - Kubernetes API server in all versions allow an attacker who is able to create a ClusterIP service and set the
spec.externalIPsfield, to intercept traffic to that IP address. Additionally, an attacker who is able to patch the
status(which is considered a privileged operation and should not typically be granted to users) of a LoadBalancer service can set the
status.loadBalancer.ingress.ipto similar effect.
In addition to configuring ResourceQuotas and limits, consider restricting how many process IDs (PIDs) a given Pod can use, and also to reserve some PIDs for node-level use to avoid resource exhaustion. More details to apply these limits can be found in Process ID Limits And Reservations.
In the next section, the guide covers control plane hardening. It is worth noting that from Kubernetes 1.20, insecure port from API server, has been removed.
As a general rule, the etcd server should be configured to only trust certificates assigned to the API server. It limits the attack surface and prevents a malicious attacker from gaining access to the cluster. It might be beneficial to use a separate CA for etcd, as it by default trusts all the certificates issued by the root CA.
In addition to specifying the token and certificates directly,
.kubeconfigsupports dynamic retrieval of temporary tokens using auth provider plugins. Beware of the possibility of malicious shell code execution in a
kubeconfig file. Once attackers gain access to the cluster, they can steal ssh keys/secrets or more.
Kubernetes Secrets is the native way of managing secrets as a Kubernetes API object. However, in some scenarios such as a desire to have a single source of truth for all app secrets, irrespective of whether they run on Kubernetes or not, secrets can be managed loosely coupled with Kubernetes and consumed by pods through side-cars or init-containers with minimal usage of Kubernetes Secrets API.
The NSA/CISA guidance stresses monitoring and alerting based on logs. The key points include logging at the host level, application level, and on the cloud. When running Kubernetes in production, it's important to understand who's responsible, and who's accountable, for each layer of logging.
One area that deserves more focus is what exactly should alert or be logged. The document outlines a sample policy in Appendix L: Audit Policy that logs all RequestResponse's including metadata and request / response bodies. While helpful for a demo, it may not be practical for production.
Each organization needs to evaluate their own threat model and build an audit policy that complements or helps troubleshooting incident response. Think about how someone would attack your organization and what audit trail could identify it. Review more advanced options for tuning audit logs in the official audit logging documentation. It's crucial to tune your audit logs to only include events that meet your threat model. A minimal audit policy that logs everything at
metadata level can also be a good starting point.
Audit logging configurations can also be tested with kind following these instructions.
Logging is important for threat and anomaly detection. As the document outlines, it's a best practice to scan and alert on logs as close to real time as possible and to protect logs from tampering if a compromise occurs. It's important to reflect on the various levels of logging and identify the critical areas such as API endpoints.
Kubernetes API audit logging can stream to a webhook and there's an example in Appendix N: Webhook configuration. Using a webhook could be a method that stores logs off cluster and/or centralizes all audit logs. Once logs are centrally managed, look to enable alerting based on critical events. Also ensure you understand what the baseline is for normal activities.
While the guide stressed the importance of notifications, there is not a blanket event list to alert from. The alerting requirements vary based on your own requirements and threat model. Examples include the following events:
- Changes to the
securityContextof a Pod
- Updates to admission controller configs
- Accessing certain files / URLs
- Seccomp Security Profiles and You: A Practical Guide - Duffie Cooley
- TGI Kubernetes 119: Gatekeeper and OPA
- Abusing The Lack of Kubernetes Auditing Policies
- Enable seccomp for all workloads with a new v1.22 alpha feature
- This Week in Cloud Native: Auditing / Pod Security
Kubernetes releases three times per year, so upgrade-related toil is a common problem for people running production clusters. In addition to this, operators must regularly upgrade the underlying node's operating system and running applications. This is a best practice to ensure continued support and to reduce the likelihood of bugs or vulnerabilities.
Kubernetes supports the three most recent stable releases. While each Kubernetes release goes through a large number of tests before being published, some teams aren't comfortable running the latest stable release until some time has passed. No matter what version you're running, ensure that patch upgrades happen frequently or automatically. More information can be found in the version skew policy pages.
When thinking about how you'll manage node OS upgrades, consider ephemeral nodes. Having the ability to destroy and add nodes allows your team to respond quicker to node issues. In addition, having deployments that tolerate node instability (and a culture that encourages frequent deployments) allows for easier cluster upgrades.
Additionally, it's worth reiterating from the guidance that periodic vulnerability scans and penetration tests can be performed on the various system components to proactively look for insecure configurations and vulnerabilities.
To find the most recent Kubernetes supported versions, refer tohttps://k8s.io/releases, which includes minor versions. It's good to stay up to date with your minor version patches.
If you're running a managed Kubernetes offering, look for their release documentation and find their various security channels.
Subscribe to the Kubernetes Announce mailing list. The Kubernetes Announce mailing list is searchable for terms such as "Security Advisories". You can set up alerts and email notifications as long as you know what key words to alert on.
In summary, it is fantastic to see security practitioners sharing this level of detailed guidance in public. This guidance further highlights Kubernetes going mainstream and how securing Kubernetes clusters and the application containers running on Kubernetes continues to need attention and focus of practitioners. Only a few weeks after the guidance was published, an open source tool kubescape to validate cluster against this guidance became available.
This tool can be a great starting point to check the current state of your clusters, after which you can use the information in this blog post and in the guidance to assess where improvements can be made.
Finally, it is worth reiterating that not all controls in this guidance will make sense for all practitioners. The best way to know which controls matter is to rely on the threat model of your own Kubernetes environment.
A special shout out and thanks to Rory McCune (@raesene) for his inputs to this blog post