Resilient Cloud Network Architectures: FundamentalsBy Mike Rothman
As much as we like to believe we have evolved as a species, people continue to be scared of things they don’t understand. Yes, many organizations have embraced the cloud whole hog and are rushing headlong into the cloud age. But it’s a big world, and millions of others remain paralyzed – not really understanding cloud computing, and taking the general approach that it can’t be secure because, well, it just can’t. Or it’s too new. Or some for other unfounded and incorrect reason. Kind of like when folks insisted that the Earth was the center of the universe.
This blog series builds on our recent Pragmatic Security for Cloud and Hybrid Networks paper, focusing on cloud-native network architectures that provide security and availability in ways you cannot accomplish in a traditional data center. This evolution will take place over the next decade, and organizations will need to support hybrid networks for some time.
But for those ready, willing, and able to step forward into the future today, the cloud is waiting to break the traditional rules of how technology has been developed, deployed, scaled, and managed. We have been aggressive in proselytizing our belief that the move towards the cloud is the single biggest disruption in technology for the next few decades. Yes, even bigger than the move from mainframes to client/server (we’re old – we know). So our Resilient Cloud Network Architectures series will provide the basics of cloud network security, with a few design patterns to illustrate.
We would like to thank Resilient Systems for provisionally agreeing to license the content in this paper. As always, we’ll build the content using our Totally Transparent Research methodology, mean we will post everything to the blog first, and allow you (our readers) to poke holes in it. Once it has been sufficiently prodded, we will publish a paper for your reference.
If we bust out the old dictionary to define resilient, we get:
able to become strong, healthy, or successful again after something bad happens able to return to an original shape after being pulled, stretched, pressed, bent, etc.
In the context of computing, you want to deploy technology that can not just become strong again, but resist attack in the first place. Recoverability is also key: if something bad happens you want to return service quickly, if it causes an outage at all. For network architecture we always fall back on the cloud computing credo: Design for failure. A resilient network architecture both makes it harder to compromise an application and minimizes downtime in case of an issue.
Key aspects of cloud computing which provide security and availability include:
- Network Isolation: Using the inherent ability of the cloud to restrict connections (via software firewalls, which are called security groups and described below), you can build a network architecture that fully isolates the different tiers of an application stack. That prevents a compromise in one application (or database) from leaking or attacking information stored in another.
- Account Isolation: Another important feature of the cloud is the ability to use multiple accounts per application. Each of your different environments (Dev, Test, Production, Logging, etc.) can use different accounts, which provides valuable isolation because you cannot access cloud infrastructure across accounts without explicit authorization.
- Immutability: An immutable server is one that is never logged into or changed in production. In cloud-native DevOps environments servers are deployed in auto-scale groups based on standard images. This prevents human error and configuration drift from creating exploitation paths. You take a new known-good state, and completely replace older images in production. No more patching and no more logging into servers.
- Regions: You could build multiple data centers around the world to provide redundancy. But that’s not a cheap option, and rarely feasible. To do the same thing in the cloud, you basically just replicate an entire environment in a different region via an API call or a couple clicks in a cloud console. Regions are available all over the world, with multiple availability zones within each, to further minimize single points of failure. You can load balance between zones and regions, leveraging auto-scaling to keep your infrastructure running the same images in real time. We will explain this design pattern in our next post.
The key takeaway is that cloud computing provides architectural options which are either impossible or economically infeasible in a traditional data center, to provide greater protection and availability. This series we will describe the fundamentals of cloud networking for context, and then dig into design patterns which provide both security and availability – which we define as resilience.
Understanding Cloud Networks
The key difference between a network in your data center and one in the cloud is that cloud customers never access the ‘real’ network or hardware. Cloud computing uses virtual networks to abstract the networks you see and manage from the (invisible) underlying physical resources. When your server gets IP address 10.0.1.12, that IP address does not exist on routing hardware – it’s a virtual address on a virtual network. Everything is handled in software.
Cloud networking varies across cloud providers, but differs from traditional networks in visibility, management, and velocity of change. You cannot tap into a cloud provider’s virtual network, so you’ll need to think differently to monitor your networks. Additionally, cloud networks are typically managed via scripts or programs, making Application Programming Interfaces (API) calls, rather than a graphical console or command line. That enables developers to do pretty much anything, including standing up networks and reconfiguring them – instantly via code.
Finally, cloud networks change much faster than physical networks because cloud environments change faster, including spinning up and shutting down servers via automation. So traditional workflows to govern network change don’t really map to your cloud network. It can be confusing because cloud networks look like traditional networks, with their own routing tables and firewalls. But looks are deceiving – although familiar constructs have been carried over, there are fundamental differences.
Cloud Network Architectures
In order to choose the right solution to address your requirements, you need to understand the types of cloud network architectures and the different technologies that enable them. There are two basic types of cloud network architectures:
- Public Cloud Networks: These are entirely Internet-facing. You connect to your instances (servers) via the public Internet with no special routing; every instance has a public IP address.
- Private Cloud Networks: Also called “virtual private clouds” or VPCs, these look like internal LANs using private IP addresses. You access these networks via some kind of non-public connection – typically a VPN.
Cloud networks are enabled and supported by the following technologies:
- Internet Gateways: A gateway connects your cloud network to the Internet. You don’t normally manage it directly – your cloud provider does it for you because their tools move packets from ‘your’ internal network to the Internet.
- Internal Gateways: These devices connect existing datacenters to your private cloud network. You access networks via a VPN provided by the cloud provider or a direct connection, which looks a lot like a traditional point-to-point connection from days gone by.
- Virtual Private Networks: You can also set up your own overlay network to bridge your private and public cloud networks within your cloud provider. This provides a private segment with access for users, developers, and administrators.
These terms will come into play when we present design patterns in our next post.
Network Security Controls
Your network is different in the cloud, so your network security controls will be different as well. But you can take some comfort from the familiar categories. Cloud network controls fall generally into five buckets:
- Perimeter Security: These controls generally provide coarse protection from very common network-based attacks, including Denial of Service. Your cloud provider provides and manages these controls; you have no visibility or control.
- Software Firewalls: These firewalls are built into the cloud platform (they are called security groups in AWS) and protect cloud assets such as instances. They offer basic access control via ports/protocols and sources/destinations, and are designed to handle auto-scaling and cloud environments. Thy combine the best of network and host firewalls, allowing you to deploy policies on individual servers (or even network interfaces) like a host firewall, but manage them like network firewalls. They will be your main tool to provide virtual network isolation, described above.
- Access Control Lists: While a software firewall works at a per-instance (or per-object) level, ACLs restrict communications between subnets of your virtual network. Old-school networking folks will be familiar with using ACLs to control access into and out of the subnets in a (virtual) cloud network.
- Virtual Appliances: A number of traditional network security tools, including IDS/IPS, WAF, and NGFW, are available as virtual appliances to improve network security, but they require you to route cloud traffic through these devices.
- Host Security Agents: These agents are built into immutable server images, and provide visibility and protection to each server/instance in a cloud environment.
The thing about cloud networking is that you don’t need to apply the same controls, or even configurations, to an entire network. You can make architectural and security control decisions per project. You might decide an entirely cloud-based VPC is best for one application, while for another you choose to build an overlay VPN to connect a totally different VPC to your datacenter to support a hybrid environment. You might need to route one application’s traffic through an inspection point to prevent data leakage, while for another you rely exclusively on security groups to provide full isolation between different layers of your cloud stack. The permutations are infinite, and provide flexibility you cannot have in your data center.
These fundamentals should provide the context you’ll need to understand the design patterns we will present in our next post.