This is the fourth post in a new series I’m posting for public feedback, licensed by Algosec. Well, that is if they like it – we are sticking to our Totally Transparent Research policy. I’m also live-writing the content on GitHub if you want to provide any feedback or suggestions. Click here for the first post in the series, here for post two.
There is no single ‘best’ way to secure a cloud or hybrid network. Cloud computing is moving faster than any other technology in decades, with providers constantly struggling to out-innovate each other with new capabilities. You cannot lock yourself into any single architecture, but instead need to build out a program capable of handling diverse and dynamic needs.
There are four major focus areas when building out this program.
- Start by understanding the key considerations for the cloud platform and application you are working with.
- Design the network and application architecture for security.
- Design your network security architecture including additional security tools (if needed) and management components.
- Manage security operations for your cloud deployments – including everything from staffing to automation.
Understand Key Considerations
Building applications in the cloud is decidedly not the same as building them on traditional infrastructure. Sure, you can do it, but the odds are high something will break. Badly. As in “update that resume” breakage. To really see the benefits of cloud computing, applications must be designed specifically for the cloud – including security controls.
For network security this means you need to keep a few key things in mind before you start mapping out security controls.
- Provider-specific limitations or advantages: All providers are different. Nothing is standard, and don’t expect it to ever become standard. One provider’s security group is another’s ACL. Some allow more granular management. There may be limits on the number of security rules available. A provider might offer both allow and deny rules, or allow only. Take the time to learn the ins and outs of your provider’s capabilities. They all offer plenty of documentation and training, and in our experience most organizations limit themselves to no more than one to three infrastructure providers, keeping the problem manageable.
- Application needs: Applications, especially those using the newer architectures we will mention in a moment, often have different needs than applications deployed on traditional infrastructure. For example application components in your private network segment may still need Internet access to connect to a cloud component – such as storage, a message bus, or a database. These needs directly affect architectural decisions – both security and otherwise.
- New architectures: Cloud applications use different design patterns than apps on traditional infrastructure. For example, as previously mentioned, components are typically distributed across diverse network locations for resiliency, and tied tightly to cloud-based load balancers. Early cloud applications often emulated traditional architectures but modern cloud applications make extensive use of advanced cloud features, particularly Platform as a Service, which may be deeply integrated into a particular cloud provider. Cloud-based databases, message queues, notification systems, storage, containers, and application platforms are all now common due to cost, performance, and agility benefits. You often cannot even control the network security of these services, which are instead fully managed by the cloud provider. Continuous deployment, DevOps, and immutable servers are the norm rather than exceptions. On the upside, used properly these architectures and patterns are far more secure, cost effective, resilient, and agile than building everything yourself, but you do need to understand how they work.
Data Analytics Design Pattern Example
A common data analytics design pattern highlights these differences (see the last section for a detailed example). Instead of keeping a running analytics pool and sending it data via SFTP, you start by loading data into cloud storage directly using an (encrypted) API call. This, using a feature of the cloud, triggers the launch of a pool of analytics servers and passes the job on to a message queue in the cloud. The message queue distributes the jobs to the analytics servers, which use a cloud-based notification service to signal when they are done, and the queue automatically redistributes failed jobs. Once it’s all done the results are stored in a cloud-based NoSQL database and the source files are archived. It’s similar to ‘normal’ data analytics except everything is event-driven, using features and components of the cloud service. This model can handle as many concurrent jobs as you need, but you don’t have anything running or racking up charges until a job enters the system.
- Elasticity and a high rate of change are standard in the cloud: Beyond auto scaling, cloud applications tend to alter the infrastructure around them to maximize the benefits of cloud computing. For example one of the best ways to update a cloud application is not to patch servers, but instead to create an entirely new installation of the app, based on a template, running in parallel; and then to switch traffic over from the current version. This breaks familiar security approaches, including relying on IP addresses for: server identification, vulnerability scanning, and logging. Server names and addresses are largely meaningless, and controls that aren’t adapted for cloud are liable to be useless.
- Managing and monitoring security changes: You either need to learn how to manage cloud security using the provider’s console and APIs, or choose security tools that integrate directly. This may become especially complex if you need to normalize security between your data center and cloud provider when building a hybrid cloud. Additionally, few cloud providers offer good tools to track security changes over time, so you will need to track them yourself or use a third-party tool.
Design the Network Architecture
Unlike traditional networks, security is built into cloud networks by default. Go to any major cloud provider, spin up a virtual network, launch a server, and the odds are very high it is already well-defended – with most or all access blocked by default.
Because security and core networking are so intertwined, and every cloud application has its own virtual network (or networks), the first step toward security is to work with the application team and design it into the architecture.
Here are some specific guidelines and recommendations:
- Accounts provide your first layer of segregation. With each cloud provider you establish multiple accounts, each for a different environment (e.g., dev, test, production, logging). This enables you to tailor cloud security controls and minimize administrator access. This isn’t a purely network security feature, but will affect network security because you can, for example, have tighter controls for environments closer to production data. The rule of thumb for accounts is to consider separate accounts for separate applications, and then separate accounts for a given application when you want to restrict how many people have administrator access. For example a dev account is more open with more administrators, while production is a different account with a much smaller group of admins. Within accounts, don’t forget about the physical architecture:
- Regions/locations are often used for resiliency, but may also be incorporated into the architecture for data residency requirements, or to reduce network latency to customers. Unlike accounts, we don’t normally use locations for security, but you do need to build network security within each location.
- Zones are the cornerstone of cloud application resiliency, especially when tied to auto scaling. You won’t use them as a security control, but again they affect security, as they often map directly to subnets. An auto scale group might keep multiple instances of a server in different zones, which are different subnets, so you cannot necessarily rely on subnets and addresses when designing your security.
- Virtual Networks (Virtual Private Clouds) are your next layer of security segregation. You can (and will) create and dedicate separate virtual networks for each application (potentially in different accounts), each with its own set of network security controls. This compartmentalization offers tremendous security advantages, but seriously complicates security management. It forces you to rely much more heavily on automation, because manually replicating security controls across accounts and virtual networks within each account takes tremendous discipline and effort. In our experience the security benefits of compartmentalization outweigh the risks created by management complexity – especially because development and operations teams already tend to rely on automation to create, manage, and update environments and applications in the first place. There are a few additional non-security-specific aspects to keep in mind when you design the architecture:
- Within a given virtual network, you can include public and private facing subnets, and connect them together. This is similar to DMZ topologies, except public-facing assets can still be fully restricted from the Internet, and private network assets are all by default walled off from each other. Even more interesting, you can spin up totally isolated private network segments that only connect to other application components through an internal cloud service such as a message queue, and prohibit all server-to-server traffic over the network.
- There is no additional cost to spin up new virtual networks (or at least if your provider charges for this, it’s time to move on), and you can create another with a few clicks or API calls. Some providers even allow you to bridge across virtual networks, assuming they aren’t running on the same IP address range. Instead of trying to lump everything into one account and one virtual network, it makes far more sense to use multiple networks for different applications, and even within a given application architecture.
- Within a virtual network you also have complete control over subnets. While they may play a role in your security design, especially as you map out public and private network segments, make sure you also design them to support zones for availability.
- Flat networks aren’t flat in the cloud. Everything you deploy in the virtual network is surrounded by its own policy-based firewall which blocks all connections by default, so you don’t need to rely on subnets themselves as much for segregation between application components. Public vs. private subnets are one thing, but creating a bunch of smaller subnets to isolate application components quickly leads to diminishing returns.
You may need enterprise datacenter connections for hybrid clouds. These VPN or direct connections route traffic directly from your data center to the cloud, and vice-versa. You simply set your routing tables to send traffic to the appropriate destination, and SDN-based virtual networks allow you to set distinct subnet ranges to avoid address conflicts with existing assets.
Whenever possible, we actually recommend avoiding hybrid cloud deployments. It isn’t that there is anything wrong with them, but they make it much more difficult to support account and virtual network segregation. For example if you use separate accounts or virtual networks for your different dev/test/prod environments, you will tend to do so using templates to automatically build out your architecture, and they will perfectly mimic each other – down to individual IP addresses. But if you connect them directly to your data center you need to shift to non-overlapping address ranges to avoid conflicts, and they can’t be as automated or consistent. (This consistency is a cornerstone of continuous deployment and DevOps).
Additionally, hybrid clouds complicate security. We have actually seen them, not infrequently, reduce the overall security level of the cloud, because assets in the datacenter aren’t as segregated as on the cloud network, and cloud providers tend to be more secure than most organizations can achieve in their own infrastructure. Instead of cracking your cloud provider, someone only needs to crack a system on your corporate network, and use that to directly bridge to the cloud.
So when should you consider a hybrid deployment? Any time your application architecture requires direct address-based access to an internal asset that isn’t Internet-accessible. Alternatively, sometimes you need a cloud asset on a static, non-Internet-routable address – such as an email server or other service that isn’t designed to work with auto scaling – which internal things need to connect to. (We strongly recommend you minimize these – they don’t benefit from cloud computing, so there usually isn’t a good reason to deploy them there). And yes, this means hybrid deployments are extremely common unless you are building everything from scratch. We try to minimize their use – but that doesn’t mean they don’t play a very important role.
For security there are a few things to keep in mind when building a hybrid deployment:
- VPN traffic will traverse the Internet. VPNs are very secure, but you do need to keep them up-to-date with the latest patches and make sure you use strong, up-to-date certificates.
- Direct connections may reduce latency, but decide whether you trust your network provider, and whether you need to encrypt traffic.
- Don’t let your infrastructure reduce the security of your cloud. If you mandate multi-factor authentication in the cloud but not on your LAN, that’s a loophole. Is your entire LAN connected to the cloud? Could someone compromise a single workstation and then start attacking your cloud through your direct connection? Do you have security group or other firewall rules to keep your cloud assets as segregated from datacenter assets as they are from each other? Remember, cloud providers tend to be exceptionally good at security, and everything you deploy in the cloud is isolated by default. Don’t allow hybrid connection to become the weak link and reduce this compartmentalization.
- You may still be able to use multiple accounts and virtual networks for segregation, by routing different datacenter traffic to different accounts and/or virtual networks. But your on-premise VPN hardware or your cloud provider might not support this, so check before building it into your architecture.
- Cloud and on-premise network security controls may look similar on the surface, but they have deep implementation differences. If you want unified management you need to understand these differences, and be able to harmonize based on security goals – not by trying to force a standard implementation across very different technologies.
- Cloud computing offers many more ways to integrate into your existing operations than you might think. For example instead of using SFTP and setting up public servers to receive data dumps, consider installing your cloud provider’s command-line tools and directly transferring data to their object storage service (fully locked down, of course). Now you don’t need to maintain the burden of either an Internet-accessible FTP server or a hybrid cloud connection.
It’s hard to fully convey the breadth and depth of options for building security into your architectures, even without additional security tools. This isn’t mere theory – we have a lot of real-world experience with different architectures creating much higher security levels than can be achieved on traditional infrastructure at any reasonable cost.
Design the Network Security Architecture
At this point you should have a well-segregated environment where effectively every application, and every environment (e.g., dev/test) for every application, is running on its own virtual network. These assets are mostly either in auto scale groups which spread them around zones and subnets for resiliency; or connect to secure cloud services such as databases, message queues, and storage. These architectures alone, in our experience, are materially more secure than your typical starting point on traditional infrastructure.
Now it’s time to layer on the additional security controls we covered earlier under Cloud Networking 101. Instead of repeating the pros and cons, here are some direct recommendations about when to use each option:
- Security groups: These should be used by default, and set to
denyby default. Only open up the absolute minimum access needed. Cloud services allow you to right-size resources far more easily than on your own hardware, so we find most organizations tend to deploy far fewer number services on each instance, which directly translates to opening up fewer network ports per instance. A large number of cloud deployments we have evaluated use only a good base architecture and security groups for network security.
- ACLs: These mostly make sense in hybrid deployments, where you need to closely match or restrict communications between the data center and the cloud. Security groups are usually a better choice, and we only recommend falling back to ACLs or subnet-level firewalling when you cannot achieve your security objectives otherwise.
- Virtual Appliances: Whenever you need capabilities beyond basic firewalls, this is where you are likely to end up. But we find host agents often make more sense when they offer the same capabilities, because virtual appliances become costly bottlenecks which restrict your cloud architecture options. Don’t deploy one merely because you have a checkbox requirement for a particular tool – ensure it makes sense first. Over time we do see them becoming more “cloud friendly”, but when we rip into requirements on projects, we often find there are better, more cloud-appropriate ways to meet the same security objectives.
- Host security agents are often a better option than a virtual appliance because they don’t restrict virtual networking architectural options. But you need to ensure you have a way to deploy them consistently. Also, make sure you pick cloud-specific tools designed to work with features such as auto scaling. These tools are particularly useful to cover network monitoring gaps, meet IDS/IPS requirements, and satisfy all your normal host security needs.
Of course you will need some way of managing these controls, even if you stick to only capabilities and features offered by your cloud provider.
Security groups and ACLs are managed via API or your cloud provider’s console. They use the same management plane as the rest of the cloud, but this won’t necessarily integrate out of the box with the way you manage things internally. You can’t track these across multiple accounts and virtual networks unless you use a purpose-built tool or write your own code. We will talk about specific techniques for management in the next section, but make sure you plan out how to manage these controls when you design your architecture.
Platform as a Service introduces its own set of security differences. For example in some situations you still define security groups and/or ACLs for the platform (as with a cloud load balancer); but in other cases access to the platform is only via API, and may require an outbound public Internet connection, even from a private network segment. PaaS also tends to rely more on DNS rather than IP addresses, to help the cloud provider maintain flexibility. We can’t give you any hard and fast rules here. Understand what’s required to connect to the platform, and then ensure your architecture allows those connections. When you can manage security treat it like any other cluster of servers, and stick with the minimum privileges possible.
We cannot cover anything near every option for every cloud in a relatively short (believe it or not) paper like this, but for the most part once you understand these fundamentals and the core differences of working in software-defined environments, it gets much easier to adapt to new tools and technologies.
Especially once you realize that you start by integrating security into the architecture, instead of trying to layer it on after the fact.
Manage Cloud (and Hybrid) Network Security Operations
Building in security is one thing, but keeping it up to date over time is an entirely different – and harder – problem. Not only do applications and deployments change over time, but cloud providers have this pesky habit of “innovating” for “competitive advantage”. Someday things might slow down, but it definitely won’t be within the lifespan of this particular research.
Here are some suggestions on managing cloud network security for the long haul.
Organization and Staffing
It’s a good idea to make sure you have cloud experts on your network security team, people trained for the platforms you support. They don’t need to be new people, and depending on your scale this doesn’t need to be their full-time focus, but you definitely need the skills. We suggest you build your team with both security architects (to help in design) and operators (to implement and fix).
Cloud projects occur outside the constraints of your data center, including normal operations, which means you might need to make some organizational changes so security is engaged in projects. A security representative should be assigned and integrated into each cloud project. Think about how things normally work – someone starts a new project and security gets called when they need access or firewall rule changes. With cloud computing network security isn’t blocking anything (unless they need access to an on-premise resource) and entire projects can happen without security or ops every being directly involved. You need to adapt policies and organizational structure to minimize this risk. For example, work with procurement to require a security evaluation and consultation before any new cloud account is opened.
Because so much of cloud network security relies on architecture, it isn’t just important to have a security architect on the team – it is essential they be engaged in projects early. It goes without saying that this needs to be a collaborative role. Don’t merely write up some pre-approved architectures, and then try to force everyone to work within those constraints. You’ll lose that fight before you even know it started.
We hinted at this in the section above: one of the first challenges is to find all the cloud projects, and then keep finding new ones as they pop up over time. You need to enumerate the existing cloud network security controls. Here are a couple ways we have seen clients successfully keep tabs on cloud computing:
- If your critical assets (such as the customer database) are well locked down, you can use this to control cloud projects. If they want access to the data/application/whatever, they need to meet your security requirements.
- Procurement and Accounting are your next best options. At some point someone needs to pay the (cloud) piper, and you can work with Accounting to identify payments to cloud providers and tie them back to the teams involved. Just make sure you differentiate between those credit card charges to Amazon for office supplies, and the one to replicate your entire datacenter up into AWS.
- Hybrid connections to your data center are pretty easy to track using established process. Unless you let random employees plug in VPN routers.
- Lastly, we suppose you could try setting a policy that says “don’t cloud without telling us”. I mean, if you trust your people and all. It could work. Maybe. It’s probably good to have to keep the auditors happy anyway.
The next discovery challenge is to figure out how the cloud networks are architected and secured:
- First, always start with the project team. Sit down with them and perform an architecture and implementation review.
- It’s a young market, but there are some assessment tools that can help. Especially to analyze security groups and network security, and compare against best practices.
- You can use your cloud provider’s console in many cases, but most of them don’t provide a good overall network view. If you don’t have a tool to help, you can use scripts and API calls to pull down the raw configuration and manually analyze it.
Integrating with Development
In the broadest, sense there are two kinds of cloud deployments: applications you build and run in the cloud (or hybrid), and core infrastructure (like file and mail servers) you transition to the cloud. Developers play the central role in the former, but they are also often involved in the latter.
The cloud is essentially software defined everything. We build and manage all kinds of cloud deployments using code. Even if you start by merely transitioning a few servers into virtual machines at a cloud provider, you will always end up defining and managing much of your environment in code.
This is an incredible opportunity for security. Instead of sitting outside the organization and trying to protect things by building external walls, we gain much greater ability to manage security using the exact same tools development and operations use to define, build, and run the infrastructure and services. Here are a few key ways to integrate with development and ensure security is integrated:
- Create a handbook of design patterns for the cloud providers you support, including security controls and general requirements. Keep adding new patterns as you work on new projects. Then make this library available to business units and development teams so they know which architectures already have general approval from security.
- A cloud security architect is essential, and this person or team should engage early with development teams to help build security into every initial design. We hate to have to say it, but their role really needs to be collaborative. Lay down the law with a bunch of requirements that interfere with the project’s execution, and you definitely won’t be invited back to the table.
- A lot of security can be automated and templated by working with development. For example monitoring and automation code can be deployed on projects without the project team having to develop them from scratch. Even integrating third party tools can often be managed programmatically.
Change is constant in cloud computing. The foundational concept dynamic adjustment of capacity (and configuration) to meet changing demands. When we say “enforce policies” we mean that, for a given project, once you design the security you are able to keep it consistent. Just because clouds change all the time doesn’t mean it’s okay to let a developer drop all the firewalls by mistake.
The key policy enforcement difference between traditional networking and the cloud is that in traditional infrastructure security has exclusive control over firewalls and other security tools. In the cloud, anyone with sufficient authorization in the cloud platform (management plane) can make those changes. Even applications can potentially change their own infrastructure around them. That’s why you need to rely on automation to detect and manage change.
You lose the single point of control. Heck, your own developers can create entire networks from their desktops. Remember when someone occasionally plugged in their own wireless router or file server? It’s a bit like that, but more like building their own datacenter over lunch. Here are some techniques for managing these changes:
- Use access controls to limit who can change what on a given cloud project. It is typical to allow developers a lot of freedom in the dev environment, but lock down any network security changes in production, using your cloud provider’s IAM features.
- To the greatest extent possible, try to use cloud provider specific templates to define your infrastructure. These files contain a programmatic description of your environment, including complete network and network security configurations. You load them into the cloud platform and it builds the environment for you. This is a very common way to deploy cloud applications, and essential in organizations using DevOps to enforce consistency.
- When this isn’t possible you will need to use a tool or manually pull the network architecture and configuration (including security) and document them. This is your baseline.
- Then you need to automate change monitoring using a tool or the features of your cloud and/or network security provider:
- Cloud platforms are slowly adding monitoring and alerting on security changes, but these capabilities are still new and often manual. This is where cloud-specific training and staffing can really pay off, and there are also third-party tools to monitor these changes for you.
- When you use virtual appliances or host security, you don’t rely on your cloud provider, so you may be able to hook change management and policy enforcement into your existing approaches. These are security-specific tools, so unlike cloud provider features the security team will often have exclusive access and be responsible for making changes themselves.
- Did we mention automation? We will talk about it more in a minute, because it’s the only way to maintain cloud security.
Normalizing On-Premise and Cloud Security
Organizations have a lot of security requirements for very good reasons, and need to ensure those controls are consistently applied. We all have developed a tremendous amount of network security experience over decades running our own networks, which is still relevant when moving into the cloud. The challenge is to carry over the requirements and experience, without assuming everything is the same in the cloud, or letting old patterns prevent us from taking full advantage of cloud computing.
- Start by translating whatever rules sets you have on-premise into a comparable version for the cloud. This takes a few steps:
- Figure out which rules should still apply, and what new rules you need. For example a policy to deny all
sshtraffic from the Internet won’t work if that’s how you manage public cloud servers. Instead a policy that limits
sshaccess to your corporate CIDR block makes more sense. Another example is the common restriction that back-end servers shouldn’t have any Internet access at all, which may need updating if they need to connect to PaaS components of their own architecture.
- Then adjust your policies into enforceable rulesets. For example security groups and ACLs work differently, so how you enforce them changes. Instead of setting subnet-based policies with a ton of rules, tie security group policies to instances by function. We once encountered a client who tried to recreate very complex firewall rulesets into security groups, exceeding their provider’s rule count limit. Instead we recommended a set of policies for different categories of instances.
- Watch out for policies like “deny all traffic from this IP range”. Those can be very difficult to enforce using cloud-native tools, and if you really have those requirements you will likely need a network security virtual appliance or host security agent. In many projects we find you can resolve the same level of risk with smarter architectural decisions (e.g., using immutable servers, which we will describe in a moment).
- Don’t just drop in a virtual appliance because you are used to it and know how to build its rules. Always start with what your cloud provider offers, then layer on additional tools as needed.
- If you migrate existing applications to the cloud the process is a bit more complex. You need to evaluate existing security controls, discover and analyze application dependencies and network requirements, and then translate them for a cloud deployment, taking into account all the differences we have been discussing.
- Figure out which rules should still apply, and what new rules you need. For example a policy to deny all
- Once you translate the rules, normalize operations. This means having a consistent process to deploy, manage, and monitor your network security over time. Fully covering this is beyond to scope of this research, as it depends on how you manage network security operations today. Just remember that you are trying to blend what you do now with what the cloud project’s requirements, not simply enforce your existing processes under an entirely new operating model.
We hate to say it, but we will – this is a process of transition. We find customers who start on a project-by-project basis are more successful, because they can learn as they go, and build up a repository of knowledge and experience.
Automation and Immutable Network Security
Cloud security automation isn’t merely fodder for another paper – it’s an entirely new body of knowledge we are all only just beginning to build.
Any organization that moves to the cloud in any significant way learns quickly that automation is the only way to survive. How else can you manage multiple copies of a single project in different environments – never mind dozens or hundreds of different projects, each running in their own sets of cloud accounts across multiple providers?
Then, keep all those projects compliant with regulatory requirements and your internal security policies.
Yeah, it’s like that.
Fortunately this isn’t an insoluble problem. Every day we see more examples of companies successfully using the cloud at scale, and staying secure and compliant. Today they largely build their own libraries of tools and scripts to continually monitor and enforce changes. We also see some emerging tools to help with this management, and expect to see many more in the near future.
A core developing concept tied to automation is immutable security, and we have used it ourselves.
One of the core problems in security is managing change. We design something, build in security, deploy it, validate that security, and lock everything down. This inevitably drifts as it’s patched, updated, improved, and otherwise modified. Immutable security leverages automation, DevOps techniques, and inherent cloud characteristics to break this cycle. To be honest, it’s really just DevOps applied to security, and all the principles are in wide use already.
For example an immutable server is one that is never logged into or changed in production. If you go back to about auto scaling, we deploy servers based on standard images. Changing one of those servers after deployment doesn’t make sense, because those changes wouldn’t be in the image, so new versions launched by auto scaling wouldn’t include them. Instead DevOps creates a new image with all the changes, then alters the auto scale group rules to deploy new instances based on the new image, and man optionally prune off the older versions.
In other words no more patching, and no more logging into servers. You take a new known-good state, and completely replace what is in production.
Think about how this applies to network security. We can build templates to automatically deploy entire environments at our cloud providers. We can write network security policies, then override any changes automatically, even across multiple cloud accounts. This pushes the security effort earlier into design and development, and enables much more consistent enforcement in operations. And we use the exact same toolchain as Development and Operations to deploy our security controls, rather than trying to build our own on the side and overlay enforcement afterwards.
This might seem like an aside, but these automation principles are the cornerstone of real-world cloud security, especially at scale. This is a capability we never have in traditional infrastructure, where we cannot simply stamp out new environments automatically, and need to hand-configure everything.