This is part five of a series. You can read part one, part two, part three, or part four; or track the project on GitHub.
Real World Examples
Cloud computing covers such a wide range of different technologies that there are no shortage of examples to draw from. Here are a few generic examples from real-world deployments. These get slightly technical because we want to highlight practical, tactical techniques to prove we aren’t just making all this up:
Embedding and Validating a Security Agent Automatically
In a traditional environment we embed security agents by building them into standard images or requiring server administrators to install and register them. Both options are very prone to error and omission, and hard to validate because you often need to rely on manual scanning. Both issues become much easier to manage in cloud computing.
To embed the agent:
- The first option is to build the agent into images. Instead of using generic operating system images you build your own, then require users to only launch approved images. In a private cloud you can enforce this with absolute control of what they run. In public clouds it is a bit tougher to enforce, but you can quickly catch exceptions using our validation process.
- The second option, and our favorite, is to inject the agent when instances launch. Some operating systems support initialization scripts which are passed to the launching instance by the cloud controller. Depending again on your cloud platform, you can inject these scrips automatically when autoscaling, via a management portal, or manually at other times. The scripts install and configure software in the instance before it is accessible on the network.
- Either way you need an agent that understands how to work within cloud infrastructure and is capable of self-registering to the management server. The agent pulls system information and cloud metadata, then connects with its management server, which pushes configuration policies back to the agent so it can self-configure. This process is entirely automated the first time the agent runs.
- Configuration may be based on detected services running on the instance, metadata tags applied to the instance (in the cloud management plane), or other characteristics such as where it is on the network.
- We provide a detailed technical example of agent injection and self-configuration in our Software Defined Security paper.
The process is simple. Build the agent into images or inject it into launching instances, then have it connect to a management server to configure itself. The capabilities of these agents vary widely. Some replicate standard endpoint protection but others handle system configuration, administrative user management, log collection, network security, host hardening, and more.
Validating that all your instances are protected can be quite easy, especially if your tool supports API:
- Obtain a list of all running instances from the cloud controller. This is a simple API call.
- Obtain a list of all instances with the security agent. This should be an API call to your security management platform, but might require pulling a report if that isn’t supported.
- Compare the lists. You cannot hide in the cloud, so you know every single instance. Compare active instances against managed instances, and find the exceptions.
We also show how to do this in the paper linked above.
Controlling SaaS with SAML
Pretty much everyone uses some form of Software as a Service, but controlling access and managing users can be a headache. Unless you link up using federated identity, you need to manage user accounts on the SaaS platform manually. Adding, configuring, and removing users on yet another system, and one that is always Internet accessible, is daunting. Federated identity solves this problem:
- Enable federated identity extensions on your directory server. This is an option for Active Directory and most LDAP servers.
- Contact your cloud provider to obtain their SAML configuration and management requirements. SAML (Security Assertion Markup Language) is a semi-standard way for a relying party to allow access and activities based on approval from an identity provider.
- Configure SAML yourself or use a third-party tool compatible with your cloud provider(s) which does this for you. If you use several SaaS providers a tool will save a lot of effort.
- With SAML users don’t have a username and password with the cloud provider. The only way to log in is to first authenticate to your directory server, which then provides (invisible to the user) a token to allow access to the cloud provider. Users need to be in the office or on a VPN.
- If you want to enable remote users without VPN you can set up a cloud proxy and issue them a special URL to use instead of the SaaS provider’s standard address. This address redirects to your proxy, which then handles connecting back to your directory server for authentication and authorization. This is something you typically buy rather than build.
Why do this? Instead of creating users on the SaaS platform it enables you to use existing user accounts in your directory server and authorize access using standard roles and groups, just like you do for internal servers. You also now get to track logins, disable accounts from a single source (your directory server), and otherwise maintain control. It also means people can’t steal a user’s password and then access Salesforce from anywhere on the Internet
Compartmentalizing Cloud Management with IAM
One of the largest new risks in cloud computing is Internet-accessible management of your entire infrastructure. Most cloud administrators use cloud APIs and command line interfaces to manage the infrastructure (or PaaS, and even sometimes SaaS). This means access credentials are accessed through environment variables or even the registry. If they use a web interface that opens up browser-based attacks. Either way, without capability compartmentalization an attacker could take complete control over their infrastructure by merely hacking a laptop. With a few API calls or a script they could copy or destroy everything in minutes.
All cloud platforms support internal identity and access management to varying degrees – this is something you should look for during your selection process. You can use this to limit security risks – not just to break out development and operations teams. The following isn’t supported on all platforms yet but it gives you an idea of the options:
- Create a Security Group and assign it IAM rights, and restrict these rights from all other groups. “IAM rights” means the security team manages new users, changes user and group rights, and prevents privilege escalation. They can even revoke administrative access to running instances by modifying the associated rights.
- Use separate cloud development and production groups and accounts. Even if you use DevOps require users to switch accounts for different tasks.
- The development group can have complete control over a development environment, which is segregated from the operations environment. Restrict them to building and launching in cloud segments that are isolated from the Internet and only route back to your organization. Developers can have free access to create, destroy, and otherwise manage development instances.
- In your production environment break out administrative tasks. Restrict all snapshotting and termination of instances to separate roles. This prevents attackers from copying data or destroying servers unless they manage to get into one of those accounts.
- Security Group changes should be restricted to the security team (or another designated group). Cloud administrators can move instances into and out of different Security Groups if needed (although ideally you would also restrict this), but only a small team should set the rules for production.
- You will still need super-admin accounts, but these can be highly restricted and used as infrequently as possible.
- In general use different groups, with different credentials, for different parts of your infrastructure. For example in production you could break out management by application stack.
- If you need auditing on API calls, and your cloud platform doesn’t support it, require administrators to connect through a proxy server that logs activity.
Under these guidelines an attacker needs to break into multiple accounts to cause the worst damage. Notice that what we just described isn’t necessarily easy to manage at scale – this is an area where you would allocate the resources freed by reducing other risks such as patching.
Hypersegregation with Security Groups
Our last example is also one of the simplest and most powerful.
As mentioned earlier, a Security Group is essentially a basic stateless firewall implemented by the cloud platform. It’s like having a small cheap firewall in front of every server. When first using Security Groups, many users think of them like subnet firewalls, but that isn’t quite how they work. In a subnet the firewall in front of the group protects access to the systems in the group. A Security Group is more like a firewall policy applied on a per-system level. Instances in a Security Group can’t communicate with other instances in the same group unless you create an explicit rule to allow that.
Every single instance is, by default, firewalled off from every other one. This enables an incredible level of compartmentalization we like to call hypersegregation. (Because we are analysts and we tend to make up our own words).
For example, within an application stack you will likely have multiple instances of your web servers, application servers, and database servers. Each of those should be in a Security Group that allows it to only talk to the layers immediately above and below in the stack. The instances in the Security Group shouldn’t be allowed to talk to each other so cracking a server only allows very limited communications, over approved ports and protocols, to the servers directly above and below.
Security Groups also should not allow any public Internet access (except from the web server group). Administrative access is restricted to known addresses from either a jump server or your internal IP range.
Better yet, instead of always leaving every server open to administrative access, keep that closed unless needed. Then adjust the individual server’s Security Group to make the change. You do this through the cloud management plane, so an attacker would need to crack the management plane, obtain server credentials, and finally obtain access to the server itself.
This setup is nearly impossible to create with traditional infrastructure. We cannot afford all those physical firewalls, and creating that many switch-based rules is a non-starter at scale. We could do it using host firewall rules, but managing those across multiple platforms in a dynamic environment is insanely complex.
In this case the cloud offers substantially better security by default.
Where to Go from Here
This paper can only offer a high-level overview to highlight how cloud computing is different for security, and to give you ideas on how to adjust your security controls to leverage its advantages while accounting for the different risks. The real devil is in the details, and we always worry that we are over-simplifying with these overviews.
But every single thing we described is being used, today, in the real world. These aren’t cases of “maybe it will work”, but examples of what leading cloud users are implementing on a daily basis. Our examples are generally far more basic than what we have seen in practice.
The problem is that most security professionals don’t have the time or resources to become cloud security experts. Their days are filled with the ongoing minutiae of stopping attacks, meeting compliance requirements, and fighting fires. It becomes easy to dismiss cloud computing as yet another fad or trend we can manage as we always have, especially in light of vendors’ deluge of announcements that their products work just the same in the cloud.
But cloud computing is far not business as usual. It is an entirely new technology and operations model that fundamentally disrupts existing practices. One with a staggering rate of change, as well as entirely new platforms and capabilities emerging constantly. Two years ago big data was accessible only to those with top-line resources and massive datacenters. Now anyone can rent petabyte-scale data warehouses for a few hours of analysis. Using a web browser.
Adoption of the cloud will only accelerate, and it is vital that security professionals come up to speed on the technologies and adjust to meet new demands. The opportunities to improve security over existing practices are powerful and practical.
We will continue to cover this in depth in future research, digging into the specifics of how to handle cloud security and what it means to existing practices. Hopefully you will find it useful.
Reader interactions
One Reply to “The CISO’s Guide to the Cloud: Real World Examples and Where to Go from Here”
Is it possible that you use ‘Security Group’ in two different meanings here? First: ‘Create a Security Group and assign it IAM rights’ and secondly: ‘Security Group is essentially a basic stateless firewall implemented by the cloud platform’.
Is the first one rather a ‘group of security users’ or is it the same ‘security group’ as the firewall one?