Container Security 2018: Runtime Security Controls

After the focus on tools and processes in previous sections, we can now focus on containers in production systems. This includes which images are moved into production repositories, selecting and running containers, and the security of underlying host systems.

Runtime Security

The Control Plane: Our first order of business is ensuring the security of the control plane: tools for managing host operating systems, the scheduler, the container client, engine(s), the repository, and any additional deployment tools. As we advised for container build environment security, we recommend limiting access to specific administrative accounts: one with responsibility for operating and orchestrating containers, and another for system administration (including patching and configuration management). On-premise we recommend network and physical segregation, and for cloud and virtual systems we prefer logical segregation. The good news is that several third-party tools offer full identity and access management, LDAP/AD integration, and token-based SSO (i.e.: SAML) across systems.
Resource Usage Analysis: Many readers are familiar with this for performance, but it can also offer insight into basic code security. Does the container allow port 22 (administration) access? Does the container try to update itself? What external systems and utilities does it depend upon? Any external resource usage is a potential attack point for attackers, so it’s good hygiene to limit ingress and egress points. To manage the scope of what containers can access, third-party tools can monitor runtime access to environment resources – both inside and outside the container. Usage analysis is basically automated review of resource requirements. This is useful in a number of ways – especially for firms moving from a monolithic architecture to microservices. Analysis can help developers understand which references they can remove from their code, and help operations narrow down roles and access privileges.
Selecting the Right Image: We recommend establishing a trusted image repository and ensuring that your production environment can only pull containers from that trusted source. Ad hoc container management makes it entirely too easy for engineers to bypass security controls, so we recommend establishing trusted central repositories for production images. We also recommend scripting deployment to avoid manual intervention, and to ensure the latest certified container is always selected. This means checking application signatures in your scripts before putting containers into production, avoiding manual verification overhead or delay. Trusted repository and registry services can help by rejecting containers which are not properly signed. Fortunately many options are available, so pick one you like. Keep in mind that if you build many containers each day, a manual process will quickly break down. It is okay to have more than one image repository – if you are running across multiple cloud environments there are advantages to leveraging the native registry in each one.
Immutable Images: Developers often leave shell access to container images so they can log into containers running in production. Their motivation is often debugging and on-the-fly code changes, both bad for consistency and security. Immutable containers – which do not allow ssh connections – prevent interactive real-time manipulation. They force developers to fix code in the development pipeline, and remove a principal attack path. Attackers routinely scan for ssh access to take over containers, and leverage them to attack underlying hosts and other containers. We strongly suggest use of immutable containers without ‘port 22’ access, and making sure that all container changes take place (with logging) in the build process, rather than in production.
Input Validation: At startup containers accept parameters, configuration files, credentials, JSON, and scripts. In more aggressive scenarios ‘agile’ teams shove new code segments into containers as input variables, making existing containers behave in fun new ways. Validate that all input data is suitable and complies with policy, either manually or using a third-party security tool. You must ensure that each container receives the correct user and group IDs to map to the assigned view at the host layer. This can prevent someone from forcing a container to misbehave, or simply prevent dumb developer mistakes.
Blast Radius: The cloud enables you to run different containers under different cloud user accounts, limiting the resources available to any given container. If an account or container set is compromised, the same cloud service restrictions which prevent tenants from interfering with each other will limit damage between your different accounts and projects. For more information see our reference material on limiting blast radius with user accounts.
Container Group Segmentation: One of the principal benefits of container management systems is help scaling tasks across pools of shared servers. Each management platform offers a modular architecture, with scaling performed on node/minion/slave sub-groups, which in turn include a set of containers. Each node forms its own logical subnet, limiting network access between sets of containers. This segregation limits ‘blast radius’ by restricting which resources any container can access. It is up to application architects and security teams to leverage this construct to improve security. You can enforce this with network policies on the container manager service, or network security controls provided by your cloud vendor. Over and above this orchestration manager feature, third-party container security tools – whether running as an agent inside containers, or as part of underlying operation systems – can provide a type of logical network segmentation which further limits network connections between groups of containers. All together this offers fine-grained isolation of containers and container groups from each another.

Platform Security

Until recently, when someone talked about container security, they were really talking about how to secure the hypervisor and underlying operating system. So most articles and presentations on container security focuses on this single – admittedly important – facet. But we believe runtime security needs to encompass more than that, and we break the challenge into three areas: host OS hardening, isolation of namespaces, and segregation of workloads by trust level.

Host OS/Kernel Hardening: Hardening is how we protect a host operating system from attacks and misuse. It typically starts with selection of a hardened variant of the operating system you will use. But while these versions come with secure variants of both libraries and features, you will still have work to leverage your baseline configuration and remove unneeded features. At minimun you’ll want to ensure user authentication and access roles are set, that permissions for binary file access are properly set, logging of audit data is enabled, and the base OS bundle is fully patched. Review patch and configuration status of the virtualization libraries (such as libcontainer, libvirt, and LXC) your container engine relies on to protect itself.
Resource Isolation and Allocation: A critical element of container security is limiting container access to underlying operating system resources, particularly to prevent a container from snooping on – or stealing – data from other containers. The first step is making sure container privileges are assigned to a role. The container engine must run at the host operating system’s root user, but your containers must not, so set up user roles for your container groups. Next up is the resource isolation model for containers, which is built on two concepts: cgroups and namespaces. A namespace creates a virtual map of the resources any given task will be provided. It maps specific users and groups to subsets of resources (e.g.:, networks, files, IPC, etc.) within their namespace. We recommend default deny on inbound requests, and only allow containers with a legitimate need to communicate on open network channels. And you not mix container and non-container services on a single machine. You will create specific user IDs for containers and/or group IDs for different classes of containers, then assign IDs to containers at runtime. A container is then limited in how much of a resource it is allocated by a Control Group (i.e.: cgroup). The cgroup provides a mechanism to partition tasks into hierarchical groups, and control how much of any particular resource (such as memory or CPU cycles) a task can use. This helps protect one group of containers from being starved of resources by another.
Segregate Workloads: We discussed resource isolation at the kernel level, but you should also isolate container engine/OS groups and their containers at the network layer. For container isolation we recommend mapping groups of mutually trusted containers to separate machines and/or network security groups. For containers running critical services or management tools, consider running a limited number of containers per VM and grouping them by trust level/workload or into a dedicated cloud VPC, to limit attack surface and minimize an attacker’s ability to pivot in case of service or container compromise. As we mentioned above under Container Group Segregation, orchestration manager features and third-party products can help with segmentation. In extreme cases you can consider one container per VM or physical server for on-premise applications, but this sacrifices some benefits of using containers on virtual infrastructure.

Platform security and container isolation are both huge fields of study, and we have only scratched the surface. Operating system providers, Docker, and third-party security providers offer best practices, research papers, and blogs with great detail, often detailing issues with specific operating systems.

Orchestration Manager Security

This research effort is focused on container security, but any discussion of container security now must address securing containers within a specific orchestration management framework. There are many orchestration managers in active use: Kubernetes, Mesos, Swarm; as well as cloud-native container management systems offered as part of AWS, Azure, and GCP. Kubernetes is the dominant tool for managing clusters of containers, and with its rapid surge in popularity came many additional concerns for container security programs – both thanks to added environmental complexity and because its default security can generously be described as ‘poor’. There are public demonstrations of how to gain unauthorized root access on nodes; escalate privileges; bypass identity checks; and exfiltrate code, keys, and credentials. Our point is that many container managers need considerable tuning to be secure.

We have already discussed security aspects such as OS hardening, image safety, namespaces, and network isolation to limit potential ‘blast radius’. And we addressed hardening container code, trusted image repositories to keep administrators from accidentally running malicious containers, and immutable container images to prevent direct shell access. Now we can address specific orchestration manager areas you should secure.

Management Plane Security: Cluster management – whether you’re using Swarm, Kubernetes, or Apache Meso – will be handle via command line and APIs. For example the etcd key-value store and kubectl controller are fundamental to managing a Kubernetes cluster, but these tools can be misused a variety of ways by an attacker. In fact the graphical user interface on some platforms does not require user authentication, so disabling them is a common best practice. You’ll want to limit who can access administrative features, but developers or attackers can import their own command-line tools, so simple access controls are insufficient. Network isolation can help protect the master management server and control where administrative commands can run. Use network isolation, leverage more recent IAM services built into your cluster manager (RBAC for Kubernetes), and set up least privilege for node service accounts.
Segregate Workloads: We have already discussed namespaces and network isolation, but there are other, more basic controls which should be in place. With both on-premise Kubernetes and cloud container deployments, we often find a flat network architecture. We also find developers and QA personnel have direct access to production accounts and servers. We strongly recommend segregating development and production resources in general, but particularly within production orchestration systems, to segregate partition sensitive workloads to their own nodes or even cluster instances. Additionally, set network security policies (“security groups” in AWS) to ‘default deny’ inbound connections as a good starting point, and only add specific exceptions as needed by applications. This is the default network policy for most cloud services, because it reduces attack surface effectively. Default deny also reduces the likelihood of containers automatically updating themselves from external sources, and can prevent attackers from uploading new attack tools should they gain a foothold in your environment.
Limit Discovery: Cluster management tools collect a broad assortment of metadata on cluster configuration, containers, and nodes. This data is essential for cluster management, but can also offer a map to attackers probing your system. Limiting which services and users can access metadata, and ensuring requesting parties are fully authorized, helps reduce attack surface. Many platforms offer metadata proxies to filter and validate requests.
Upgrade and Patch: The engineers behind most container managers have responded well to known security issues, so newer versions tend to be much more secure. Virtualization is key to any container cluster, and these platforms build redundancy in, so you can leverage cluster management features to quickly patch and replace both cluster services and containers.
Logging: We recommend collecting logs from all containers and nodes. Many attacks focus on privilege escalation and obtaining certificates, so we also recommend monitoring all identity modification API calls and failures, to highlight attacks.
Test Yourself: Security checkers and CIS security benchmarks are available for containers and container orchestration managers, which you can use to get an idea of how well your baseline security stacks up. These provide a good initial step for validating cluster security. Unfortunately default container manager configurations tend to be insecure, and most administrators are not fully aware of all the features of any given cluster manager, so these checkers are a great way to get up to speed on appropriate security controls.

Keep in mind that these are very basic recommendations – we cannot do this topic justice within the scope of this paper. That said, we really want to raise reader awareness of proven attacks on all existing open source container management systems, and the considerable amount of work needed to secure a cluster. Beyond the basics, each container manager has its own security issues and nuances to learn to protect a cluster from specific types of attack.

Secrets Management

When you start up a container or orchestration manager it needs permissions to communicate with other containers, nodes, databases, and other network resources. In a highly modular service-oriented architecture a container without credentials to APIs, data encryption, or identity cannot get real work done. Bur we don’t want engineers hard-coding secrets into containers, nor do we want secrets sitting in files on servers. But provisioning machine identities is tricky: we need to securely pass sensitive data to ephemeral instances during startup.

The new products to address this issue are called ‘Secrets Management’ platforms. These products securely store encryption keys, API certificates, identity tokens, SSL certificates, and passwords. They can share secrets across groups of trusted services and users, leveraging existing directory services to determine who has access to what. Solutions are widely available, including commercial tools, and many orchestration manager and container ecosystem providers (most notably Docker) offer secrets management built-in.

We cannot fully address this topic within this paper either, so if you’d like more information, please see our paper on Secrets Management.

Our final post will discuss logging and monitoring.