How-to Address Kubernetes Cluster Security
Building a working, reliable Kubernetes cluster is hard. Building it securely is even harder. Often the easiest way to get things working is to ignore security considerations, or even to bypass default security configurations. A typical game plan for a resource-constrained engineering team is: (1) get it working, (2) make it secure. Sadly step two is often neglected.
If you’re running Kubernetes, there are a few different attack vectors you’ll want to consider. There’s no such thing as an invincible cluster, so understanding what the trade-offs are, and where the biggest vulnerabilities lie, is crucial to having a security profile that suits your organization’s needs.
Denial of Service
The most obvious threat you’ll face when running cloud software is a denial of service (DoS) attack. Keep in mind, not every DoS attack is intentional — sometimes all it takes is a runaway deployment to start hitting your API with orders of magnitude more load than it’s used to.
You might think that Kubernetes, with its extensive autoscaling features, is hardened against DoS attacks, and to a certain degree that’s true. If you’ve got capacity to spare, your workloads should be able to scale up in response to a burst in traffic. And if you’re using a project like cluster-autoscaler, you can provision new nodes if your capacity starts to fill up. But keep in mind that new nodes cost money, and make sure you’ve got some hard limits set to prevent runaway costs.
There are a few steps you can take to make sure you’re prepared for an unexpected burst in traffic:
- Set up rate-limiting in your application, as well as in your service mesh or ingress controllers. This will prevent any single machine from using up too much bandwidth. For example, nginx-ingress can limit the requests per second or per minute, the payload size, and the number of concurrent connections from a single IP address.
- Run load testing to understand how well your application scales. You’ll want to set up your application on a staging cluster, and hit it with traffic. It’s a bit harder to test against distributed DoS (DDoS) attacks, where traffic could be coming from many different IPs at once, but there are a few services out there to help with this.
- Enlist a third-party service like Cloudflare to do DoS/DDoS protection
If you’ve got public-facing applications, there’s always a chance outsiders will find a way to exploit your application logic. Even if your applications are not public-facing, all it takes is a misconfigured network policy in your cloud to suddenly open up a whole new attack vector.
Unfortunately, Kubernetes can’t fix your code for you. But it can mitigate the damage caused by a security hole in your application. There are a few ways to make sure one compromised application doesn’t lead to a full-scale breach:
- Use network policies to restrict which pods can talk to one another. If an attacker gains access to a particular pod, but it’s isolated from the rest of your cluster, they won’t be able to advance any further.
- Audit the security configuration of each of your workloads. Tools like Fairwinds Polaris will check to make sure the workload is not running as root, that it doesn’t have access to the host network, and for other potentially insecure configurations.
- Build your images using the scratch base image. If an attacker is unable to make system calls, this
will leave them with very little to work with.
Not every threat comes from outside your organization. A disgruntled employee with the keys to your cluster can steal data or cause an extended outage. Without the right controls in place, former employees may still have access to your infrastructure. And even if everyone has good intentions, honest mistakes can have disastrous consequences. If you’re a small organization, taking measures to prevent insider threats might seem like overkill, but it’s important to establish good security practices before you grow.
The most important way to prevent insider threats is to avoid sharing credentials. If everyone is using the same credentials to interact with your cluster, an employee departure will disrupt the entire organization. Furthermore, you’ll have no ability to audit who did what, so every action in your cluster is effectively anonymous.
Once you do have separate credentials provisioned for each individual interacting with the cluster, it’s important to adhere to the principle of least privilege. Kubernetes has built in role-based access control (RBAC) to help you manage who can do what. Only grant an individual access to the resources and operations they need to perform their job. This will help not only in the case of a nefarious employee — it will help prevent honest mistakes.
Major Kubernetes services, like Google’s GCP and AWS’s EKS, come with a built-in link between their IAM profiles and Kubernetes credentials. Open source projects like rbac-manager can also help you keep your RBAC configurations simple and manageable.
When implementing a security policy for Kubernetes, the most important thing is to adhere to the principle of least privilege. Make sure users are rate-limited to a reasonable degree, make sure docker images are as isolated and pared down as possible, and make sure your employees only have access to the resources they need to do their job. With reasonable measures in place, even a gaping security hole in your application code can be confined to a small blast radius.
Looking for more? Check out this post, which goes deeper with some hands-on examples.
Written By: Robert Brennan
Fairwinds — The Kubernetes Enablement Company
ClusterOps Managed Kubernetes — ClusterOps is a fully-managed Kubernetes cluster management tool that integrates infrastructure as code, open source software, and SRE expertise as a subscription service.
ClusterOps Kubernetes Advisory — ClusterOps Advisory integrates Kubernetes expertise and open source software so you can confidently run reliable, scalable, and secure Kubernetes clusters.
Fairwinds Insights — We integrate trusted tools, collaboration workflows, and expertise into a single monitoring platform, so workloads always stay secure, reliable, and efficient.