Between teaching classes and working with clients, I spend a fair bit of time talking about particular cloud providers. The analyst in me never wants to be biased, but the reality is there are big differences in terms of capabilities, and some of them matter.
Throwing out all the non-security differentiators, when you look at cloud providers for enterprises there are some critical security capabilities you need for security and compliance. Practically speaking, these quickly narrow down your options.
My criteria are more IaaS-focused, but it should be obvious which also apply to PaaS and SaaS:
- API/admin logging: This is the single most important compliance control, a critical security control, and the single biggest feature gap for even many major providers. If there isn’t a log of all management activity, ideally including that by the cloud provider itself, you never really know what’s happening with your assets. Your only other options are to constantly snapshot your environment and look for changes, or run all activity through a portal and still figure out a way to watch for activity outside that portal (yes, people really do that sometimes).
- Elasticity and autoscaling: If it’s an IaaS provider and it doesn’t have autoscaling, run away. That isn’t the cloud. If it’s a PaaS or SaaS provider who lacks elasticity (can’t scale cleanly up or down to what you need), keep looking. For IaaS this is a critical capability because it enables immutable servers, which are one of the cloud’s best security benefits. For IaaS and PaaS it’s more of a non-security advantage.
- APIs for all security features: Everything in the cloud should be programmatically manageable. Cloud security can’t scale without automation, and you can’t automate without APIs.
- Granular entitlements: An entitlement is an access right/grant. The provider should offer more than just ‘admin’. Ideally down to each feature or API call, especially for IaaS and PaaS.
- Good, easy, SAML support that maps to the granular entitlements: Federated identity is the only reasonable way to manage all your users in the cloud. Fortunately, we nearly always see this one available. Unfortunately, some cloud providers make it a pain in the ass to set up.
- Multiple accounts and cross-account access: One of the best ways to compartmentalize cloud deployments is to use entirely different accounts for different projects and environments, then connect them together with granular entitlements when needed. This limits the blast radius if someone gets into the account and does something bad. I frequently recommend multiple accounts for a single cloud project, and this is considered normal. It does, however, require security automation, which ties into my API requirement.
- Software Defined Networking: Most major IaaS providers give you near complete control over your virtual networks. But some legacy providers lack an SDN, leaving you are stuck with VLANs or other technologies that don’t provide the customization you need to really make things work. Read my paper on cloud network security if you want to understand more.
- Regions/locations in different countries: Unless the cloud provider only want business in their country of origin, this is required for legal and jurisdictional reasons. Thanks to Brian Honan for catching my omission.
This list probably looks a hell of a lot different than any of the other ones you’ve seen. That’s because these are the foundational building blocks you realize you need once you start working on real cloud projects.
I’m probably missing some, but if I break this out all I’m really talking about are:
- Good audit logs.
- Decent compartmentalization/segregation at different levels.
- Granular rights to enforce least privilege.
- A way to manage everything and integrate it into operations.
Please let me know in the comments or via Twitter if you think I’m missing anything. I’m trying to keep it relatively concise.
Posted at Thursday 12th November 2015 10:28 pm
(4) Comments •
In my recent paper on cloud network security I came down pretty hard on hybrid networks. I have been saying similar things in many presentations, including my most recent RSA session. Enough that I got a request for clarification. Here is some additional detail I will add to the paper; feedback or criticism is appreciated.
Hybrid deployments often play an essential, yet complex, role in an organization’s transition to cloud computing. On the one hand they allow an organization to extend its existing resources directly into the cloud, with fully compatible network addressing and routing. They allow the cloud to access internal assets directly, and internal assets to access cloud assets, without having to reconfigure everything from scratch.
But that also means hybrid deployments bridge risks across environments. Internal problems can extend to the cloud provider, and compromise of something on the cloud side extends to the data center. It’s a situation ripe for error, especially in organizations which already struggle with network compartmentalization. Also, you are bridging two completely different environments – one software defined, the other still managed with boxes and wires.
That’s why we recommend trying to avoid hybrid deployments, to retain the single greatest security advantage of cloud computing: compartmentalization. Modern cloud deployments typically use multiple cloud provider accounts for a single project. If anything goes wrong you can blow away the entire account, and start over. Control failures in any account are isolated to that account, and attacks at the network and management levels are also isolated. Those are typically impossible to replicate with hybrid.
All that said, nearly every large enterprise we work with still needs some hybrid deployments. There are too many existing internal resources and requirements to drop ship them all to a cloud provider. Applications, assets, and services designed for traditional infrastructure which would all need to be completely re-architected to operate correctly, with acceptable resilience, in the cloud.
Yes, someday hybrid clouds will be rare. And for any new project we highly recommend designing to work in an isolated, dedicated set of cloud accounts. But until we all finish this massive 20-year project of moving nearly everything into the public cloud, hybrid is a practical reality.
Thinking about the associated risks, bridging networks and reducing compartmentalization, focuses your security requirements. You need to understand those connections, and the network security controls across them. They are two different systems using a common vocabulary, with important implementation differences. Management planes for non-network functions won’t integrate (traditional environments don’t have one). Host, application, and data security are specific to the assets involved and where they are hosted; risks extend whenever they are connected, regardless of deployment model. A hybrid cloud doesn’t change SQL injection detection or file integrity monitoring – you implement them as needed in each environment.
The definition of hybrid is connection and extension via networking; understanding those connections, how the security rules are set up on each side, and how to ensure the security of two totally different environments works together, is the focus.
Posted at Monday 26th October 2015 9:39 pm
(2) Comments •
This is one of those papers I’ve been wanting to write for a while. When I’m out working with clients, or teaching classes, we end up spending a ton of time on just how different networking is in the cloud, and how to manage it. On the surface we still see things like subnets and routing tables, but now everything is wired together in software, with layers of abstraction meant to look the same, but not really work the same.
This paper covers the basics and even includes some sample diagrams for Microsoft Azure and Amazon Web Services, although the bulk of the paper is cloud-agnostic.
From the report:
Over the last few decades we have been refining our approach to network security. Find the boxes, find the wires connecting them, drop a few security boxes between them in the right spots, and move on. Sure, we continue to advance the state of the art in exactly what those security boxes do, and we constantly improve how we design networks and plug everything together, but overall change has been incremental. How we think about network security doesn’t change – just some of the particulars.
Until you move to the cloud.
While many of the fundamentals still apply, cloud computing releases us from the physical limitations of those boxes and wires by fully abstracting the network from the underlying resources. We move into entirely virtual networks, controlled by software and APIs, with very different rules. Things may look the same on the surface, but dig a little deeper and you quickly realize that network security for cloud computing requires a different mindset, different tools, and new fundamentals.
Many of which change every time you switch cloud providers.
Special thanks to Algosec for licensing the research. As usual everything was written completely independently using our Totally Transparent Research process. It’s only due to these licenses that we are able to give this research away for free.
The landing page for the paper is here.
Direct download: Pragmatic Security for Cloud and Hybrid Networks (pdf)
Posted at Monday 5th October 2015 5:43 pm
(0) Comments •
This is the fourth post in a new series I’m posting for public feedback, licensed by Algosec. Well, that is if they like it – we are sticking to our Totally Transparent Research policy. I’m also live-writing the content on GitHub if you want to provide any feedback or suggestions. Click here for the first post in the series, [here for post two](https://securosis.com/blog/pragmatic-security-for-cloud-and-hybrid-networks-cloud-networking-101, post 3, post 4.
To finish off this research it’s time to show what some of this looks like. Here are some practical design patterns based on projects we have worked on. The examples are specific to Amazon Web Services and Microsoft Azure, rather than generic templates. Generic patterns are less detailed and harder to explain, and we would rather you understand what these look like in the real world.
Basic Public Network on Microsoft Azure
This is a simplified example of a public network on Azure. All the components run on Azure, with nothing in the enterprise data center, and no VPN connections. Management of all assets is over the Internet. We can’t show all the pieces and configuration settings in this diagram, so here are some specifics:
- The Internet Gateway is set in Azure by default (you don’t need to do anything). Azure also sets up default service endpoints for the management ports to manage your instances. These connections are direct to each instance and don’t run through the load balancer. They will (should) be limited to only your current IP address, and the ports are closed to the rest of the world. In this example we have a single public facing subnet.
- Each instance gets a public IP address and domain name, but you can’t access anything that isn’t opened up with a defined service endpoint. Think of the endpoint as port forwarding, which it pretty much is.
- The service endpoint can point to the load balancer, which in turn is tied to the auto scale group. You set rules on instance health, performance, and availability; the load balancer and auto scale group provision and deprovision servers as needed, and handle routing. The IP addresses of the instances change as these updates take place.
- Network Security Groups (NSGs) restrict access to each instance. In Azure you can also apply them to subnets. In this case we would apply them on a per-server basis. Traffic would be restricted to whatever services are being provided by the application, and would
deny traffic between instances on the same subnet. Azure allows such internal traffic by default, unlike Amazon.
- NSGs can also restrict traffic to the instances, locking it down to only from the load balancer and thus disabling direct Internet access. Ideally you never need to log into the servers because they are in an auto scale group, so you can also disable all the management/administration ports.
There is more, but this pattern produces a hardened server, with no administrative traffic, protected with both Azure’s default protections and Network Security Groups. Note that on Azure you are often much better off using their PaaS offerings such as web servers, instead of manually building infrastructure like this.
Basic Private Network on Amazon Web Services
Amazon works a bit differently than Azure (okay – much differently). This example is a Virtual Private Cloud (VPC, their name for a virtual network) that is completely private, without any Internet routing, connected to a data center through a VPN connection.
- This shows a class B network with two smaller subnets. In AWS you would place each subnet in a different Availability Zone (what we called a ‘zone’) for resilience in case one goes down – they are separate physical data centers.
- You configure the VPN gateway through the AWS console or API, and then configure the client side of the VPN connection on your own hardware. Amazon maintains the VPN gateway in AWS; you don’t directly touch or maintain it, but you do need to maintain everything on your side of the connection (and it needs to be a hardware VPN).
- You adjust the routing table on your internal network to send all traffic for the 10.0.0.0/16 network over the VPN connection to AWS. This is why it’s called a ‘virtual’ private cloud. Instances can’t see the Internet, but you have that gateway that’s Internet accessible.
- You also need to set your virtual routing table in AWS to send Internet traffic back through your corporate network if you want any of your assets to access the Internet for things like software updates. Sometimes you do, sometimes you don’t – we don’t judge.
- By default instances are protected with a Security Group that denies all inbound traffic and allows all outbound traffic. Unlike in Azure, instances on the same subnet can’t talk to each other. You cannot connect to them through the corporate network until you open them up. AWS Security Groups offer
allow rules only. You cannot explicitly deny traffic – only open up allowed traffic. In Azure you create Service Endpoints to explicitly route traffic, then use network security groups to allow or deny on top of that (within the virtual network). AWS uses security groups for both functions – opening a security group allows traffic through the private IP (or public IP if it is public facing).
- Our example uses no ACLs but you could put an ACL in place to block the two subnets from talking to each other. ACLs in AWS are there by default, but allow all traffic. An ACL in AWS is not stateful, so you need to create rules for all bidrectional traffic. ACLs in AWS work better as a
- A public network on AWS looks relatively similar to our Azure sample (which we designed to look similar). The key differences are how security groups and service endpoints function.
Hybrid Cloud on Azure
This builds on our previous examples. In this case the web servers and app servers are separated, with app servers on a private subnet. We already explained the components in our other examples, so there is only a little to add:
- The key security control here is a Network Security Group to restrict access to the app servers from ONLY the web servers, and only to the specific port and protocol required.
- The NSG should be applied to each instance, not to the subnets, to prevent a “flat network” and block peer traffic that could be used in an attack.
- The app servers can connect to your datacenter, and that is where you route all Internet traffic. That gives you just as much control over Internet traffic as with virtual machines in your own data center.
- You will want to restrict traffic from your organization’s network to the instances (via the NSGs) so you don’t become the weak link for an attack.
A Cloud Native Data Analytics Architecture
Our last example shows how to use some of the latest features of Amazon Web Services to create a new cloud-native design for big data transfers and analytics.
- In this example there is a private subnet in AWS, without either Internet access or a connection to the enterprise data center. Images will be created in either another account or a VPC, and nothing will be manually logged into.
- When an analytics job is triggered, a server in the data center takes the data and sends it to Amazon S3, their object storage service, using command line tools or custom code. This is an encrypted connection by default, but you could also encrypt the data using the AWS Key Management Service (or any encryption tool you want). We have clients using both options.
- The S3 bucket in AWS is tightly restricted to either only the IP address of the sending server, or a set of AWS IAM credentials – or both. AWS manages S3 security so you don’t worry about network attacks, merely enable access. S3 isn’t like a public FTP server – if you lock it down (easy to do) it isn’t visible except from authorized sources.
- A service called AWS Lambda monitors the S3 bucket. Lambda is a container for event-driven code running inside Amazon that can trigger based on internal things, including a new file appearing in an S3 bucket. You only pay for Lambda when your code is executing, so there is no cost to have it wait for events.
- When a new file appears the Lambda function triggers and launches analysis instances based on a standard image. The analysis instances run in a private subnet, with security group settings that block all inbound access.
- When the analysis instances launch the Lambda code sends them the location of the data in S3 to analyze. The instances connect to S3 through something known as a VPC Endpoint, which is totally different from an Azure service endpoint. A VPC endpoint allows instances in a totally private subnet to talk to S3 without Internet access (which was required until recently). As of this writing only S3 has a VPC endpoint, but we know Amazon is working on endpoints for additional services such as their Simple Queue Service (we suspect AWS hasn’t confirmed exactly which services are next on the list).
- The instances boot, grab the data, then do their work. When they are done they go through the S3 VPC Endpoint to drop their results into a second S3 bucket.
- The first bucket only allows writes from the data center, and reads from the private subnet. The second bucket reverses that and only allows reads from the data center and writes from the subnet. Everything is a one-way closed loop.
- The instance can then trigger another Lambda function to send a notification back to your on-premise data center or application that the job is complete, and code in the data center can grab the results. There are several ways to do this – for example the results could go into a database, instead.
- Once everything is complete Lambda moves the original data into Glacier, Amazon’s super-cheap long-term archival storage. In this scenario it is of course encrypted. (For this network-focused research we are skipping over most of the encryption options for this architecture, but they aren’t overly difficult).
Think about what we have described: the analysis servers have no Internet access, spin up only as needed, and can only read in new data and write out results. They automatically terminate when finished, so there is no persistent data sitting unused on a server or in memory. All Internet-facing components are native Amazon services, so we don’t need to maintain their network security. Everything is extremely cost-effective, even for very large data sets, because we only process when we need it; big data sets are always stored in the cheapest option possible, and automatically shifted around to minimize storage costs. The system is event-driven so if you load 5 jobs at once, it runs all 5 at the same time without any waiting or slowdown, and if there are no jobs the components are just programmatic templates, in the absolute most cost-effective state.
This example does skip some options that would improve resiliency in exchange for better network security. For example we would normally recommend using Simple Queue Service to manage the jobs (Lambda would send them over), because SQS handles situations such as an instance failing partway through processing. But this is security research, not availability focused.
This research isn’t the tip of the iceberg; it’s more like the first itty bitty little ice crystal on top of an iceberg, which stretches to the depths of the deepest ocean trench. But if you remember the following principles you will be fine as you dig into securing your own cloud and hybrid deployments:
- The biggest differences between cloud and traditional networks is the combination of abstraction (virtualization) and automation. Things look the same but don’t function the same.
- Everything is managed by software, providing tremendous flexibility, and enabling you to manage network security using the exact same tools that Development and Operations use to manage their pieces of the puzzle.
- You can achieve tremendous security through architecture. Virtual networks (and multiple cloud accounts) support incredible degrees of compartmentalization, where every project has its own dedicated network or networks.
- Security groups enhance that by providing the granularity of host firewalls, without the risks of relying on operating systems. They provide better manageability than even most network firewalls.
- Platform as a Service and cloud-provider-specific services open up entirely new architectural options. Don’t try to build things the way you always have. Actually, if you find yourself doing that, you should probably rethink your decision to use the cloud.
Don’t be intimidated by cloud computing, but don’t think you can or should implement network security the way you always have. Your skills and experiences are still important, and provide a base to build on as you learn all the new options available within the cloud.
Posted at Tuesday 29th September 2015 2:03 pm
(0) Comments •
This is the fourth post in a new series I’m posting for public feedback, licensed by Algosec. Well, that is if they like it – we are sticking to our Totally Transparent Research policy. I’m also live-writing the content on GitHub if you want to provide any feedback or suggestions. Click here for the first post in the series, [here for post two](https://securosis.com/blog/pragmatic-security-for-cloud-and-hybrid-networks-cloud-networking-101, post 3).
There is no single ‘best’ way to secure a cloud or hybrid network. Cloud computing is moving faster than any other technology in decades, with providers constantly struggling to out-innovate each other with new capabilities. You cannot lock yourself into any single architecture, but instead need to build out a program capable of handling diverse and dynamic needs.
There are four major focus areas when building out this program.
- Start by understanding the key considerations for the cloud platform and application you are working with.
- Design the network and application architecture for security.
- Design your network security architecture including additional security tools (if needed) and management components.
- Manage security operations for your cloud deployments – including everything from staffing to automation.
Understand Key Considerations
Building applications in the cloud is decidedly not the same as building them on traditional infrastructure. Sure, you can do it, but the odds are high something will break. Badly. As in “update that resume” breakage. To really see the benefits of cloud computing, applications must be designed specifically for the cloud – including security controls.
For network security this means you need to keep a few key things in mind before you start mapping out security controls.
- Provider-specific limitations or advantages: All providers are different. Nothing is standard, and don’t expect it to ever become standard. One provider’s security group is another’s ACL. Some allow more granular management. There may be limits on the number of security rules available. A provider might offer both allow and deny rules, or allow only. Take the time to learn the ins and outs of your provider’s capabilities. They all offer plenty of documentation and training, and in our experience most organizations limit themselves to no more than one to three infrastructure providers, keeping the problem manageable.
- Application needs: Applications, especially those using the newer architectures we will mention in a moment, often have different needs than applications deployed on traditional infrastructure. For example application components in your private network segment may still need Internet access to connect to a cloud component – such as storage, a message bus, or a database. These needs directly affect architectural decisions – both security and otherwise.
- New architectures: Cloud applications use different design patterns than apps on traditional infrastructure. For example, as previously mentioned, components are typically distributed across diverse network locations for resiliency, and tied tightly to cloud-based load balancers. Early cloud applications often emulated traditional architectures but modern cloud applications make extensive use of advanced cloud features, particularly Platform as a Service, which may be deeply integrated into a particular cloud provider. Cloud-based databases, message queues, notification systems, storage, containers, and application platforms are all now common due to cost, performance, and agility benefits. You often cannot even control the network security of these services, which are instead fully managed by the cloud provider. Continuous deployment, DevOps, and immutable servers are the norm rather than exceptions. On the upside, used properly these architectures and patterns are far more secure, cost effective, resilient, and agile than building everything yourself, but you do need to understand how they work.
Data Analytics Design Pattern Example
A common data analytics design pattern highlights these differences (see the last section for a detailed example). Instead of keeping a running analytics pool and sending it data via SFTP, you start by loading data into cloud storage directly using an (encrypted) API call. This, using a feature of the cloud, triggers the launch of a pool of analytics servers and passes the job on to a message queue in the cloud. The message queue distributes the jobs to the analytics servers, which use a cloud-based notification service to signal when they are done, and the queue automatically redistributes failed jobs. Once it’s all done the results are stored in a cloud-based NoSQL database and the source files are archived. It’s similar to ‘normal’ data analytics except everything is event-driven, using features and components of the cloud service. This model can handle as many concurrent jobs as you need, but you don’t have anything running or racking up charges until a job enters the system.
- Elasticity and a high rate of change are standard in the cloud: Beyond auto scaling, cloud applications tend to alter the infrastructure around them to maximize the benefits of cloud computing. For example one of the best ways to update a cloud application is not to patch servers, but instead to create an entirely new installation of the app, based on a template, running in parallel; and then to switch traffic over from the current version. This breaks familiar security approaches, including relying on IP addresses for: server identification, vulnerability scanning, and logging. Server names and addresses are largely meaningless, and controls that aren’t adapted for cloud are liable to be useless.
- Managing and monitoring security changes: You either need to learn how to manage cloud security using the provider’s console and APIs, or choose security tools that integrate directly. This may become especially complex if you need to normalize security between your data center and cloud provider when building a hybrid cloud. Additionally, few cloud providers offer good tools to track security changes over time, so you will need to track them yourself or use a third-party tool.
Design the Network Architecture
Unlike traditional networks, security is built into cloud networks by default. Go to any major cloud provider, spin up a virtual network, launch a server, and the odds are very high it is already well-defended – with most or all access blocked by default.
Because security and core networking are so intertwined, and every cloud application has its own virtual network (or networks), the first step toward security is to work with the application team and design it into the architecture.
Here are some specific guidelines and recommendations:
- Accounts provide your first layer of segregation. With each cloud provider you establish multiple accounts, each for a different environment (e.g., dev, test, production, logging). This enables you to tailor cloud security controls and minimize administrator access. This isn’t a purely network security feature, but will affect network security because you can, for example, have tighter controls for environments closer to production data. The rule of thumb for accounts is to consider separate accounts for separate applications, and then separate accounts for a given application when you want to restrict how many people have administrator access. For example a dev account is more open with more administrators, while production is a different account with a much smaller group of admins. Within accounts, don’t forget about the physical architecture:
- Regions/locations are often used for resiliency, but may also be incorporated into the architecture for data residency requirements, or to reduce network latency to customers. Unlike accounts, we don’t normally use locations for security, but you do need to build network security within each location.
- Zones are the cornerstone of cloud application resiliency, especially when tied to auto scaling. You won’t use them as a security control, but again they affect security, as they often map directly to subnets. An auto scale group might keep multiple instances of a server in different zones, which are different subnets, so you cannot necessarily rely on subnets and addresses when designing your security.
- Virtual Networks (Virtual Private Clouds) are your next layer of security segregation. You can (and will) create and dedicate separate virtual networks for each application (potentially in different accounts), each with its own set of network security controls. This compartmentalization offers tremendous security advantages, but seriously complicates security management. It forces you to rely much more heavily on automation, because manually replicating security controls across accounts and virtual networks within each account takes tremendous discipline and effort. In our experience the security benefits of compartmentalization outweigh the risks created by management complexity – especially because development and operations teams already tend to rely on automation to create, manage, and update environments and applications in the first place. There are a few additional non-security-specific aspects to keep in mind when you design the architecture:
- Within a given virtual network, you can include public and private facing subnets, and connect them together. This is similar to DMZ topologies, except public-facing assets can still be fully restricted from the Internet, and private network assets are all by default walled off from each other. Even more interesting, you can spin up totally isolated private network segments that only connect to other application components through an internal cloud service such as a message queue, and prohibit all server-to-server traffic over the network.
- There is no additional cost to spin up new virtual networks (or at least if your provider charges for this, it’s time to move on), and you can create another with a few clicks or API calls. Some providers even allow you to bridge across virtual networks, assuming they aren’t running on the same IP address range. Instead of trying to lump everything into one account and one virtual network, it makes far more sense to use multiple networks for different applications, and even within a given application architecture.
- Within a virtual network you also have complete control over subnets. While they may play a role in your security design, especially as you map out public and private network segments, make sure you also design them to support zones for availability.
- Flat networks aren’t flat in the cloud. Everything you deploy in the virtual network is surrounded by its own policy-based firewall which blocks all connections by default, so you don’t need to rely on subnets themselves as much for segregation between application components. Public vs. private subnets are one thing, but creating a bunch of smaller subnets to isolate application components quickly leads to diminishing returns.
You may need enterprise datacenter connections for hybrid clouds. These VPN or direct connections route traffic directly from your data center to the cloud, and vice-versa. You simply set your routing tables to send traffic to the appropriate destination, and SDN-based virtual networks allow you to set distinct subnet ranges to avoid address conflicts with existing assets.
Whenever possible, we actually recommend avoiding hybrid cloud deployments. It isn’t that there is anything wrong with them, but they make it much more difficult to support account and virtual network segregation. For example if you use separate accounts or virtual networks for your different dev/test/prod environments, you will tend to do so using templates to automatically build out your architecture, and they will perfectly mimic each other – down to individual IP addresses. But if you connect them directly to your data center you need to shift to non-overlapping address ranges to avoid conflicts, and they can’t be as automated or consistent. (This consistency is a cornerstone of continuous deployment and DevOps).
Additionally, hybrid clouds complicate security. We have actually seen them, not infrequently, reduce the overall security level of the cloud, because assets in the datacenter aren’t as segregated as on the cloud network, and cloud providers tend to be more secure than most organizations can achieve in their own infrastructure. Instead of cracking your cloud provider, someone only needs to crack a system on your corporate network, and use that to directly bridge to the cloud.
So when should you consider a hybrid deployment? Any time your application architecture requires direct address-based access to an internal asset that isn’t Internet-accessible. Alternatively, sometimes you need a cloud asset on a static, non-Internet-routable address – such as an email server or other service that isn’t designed to work with auto scaling – which internal things need to connect to. (We strongly recommend you minimize these – they don’t benefit from cloud computing, so there usually isn’t a good reason to deploy them there). And yes, this means hybrid deployments are extremely common unless you are building everything from scratch. We try to minimize their use – but that doesn’t mean they don’t play a very important role.
For security there are a few things to keep in mind when building a hybrid deployment:
- VPN traffic will traverse the Internet. VPNs are very secure, but you do need to keep them up-to-date with the latest patches and make sure you use strong, up-to-date certificates.
- Direct connections may reduce latency, but decide whether you trust your network provider, and whether you need to encrypt traffic.
- Don’t let your infrastructure reduce the security of your cloud. If you mandate multi-factor authentication in the cloud but not on your LAN, that’s a loophole. Is your entire LAN connected to the cloud? Could someone compromise a single workstation and then start attacking your cloud through your direct connection? Do you have security group or other firewall rules to keep your cloud assets as segregated from datacenter assets as they are from each other? Remember, cloud providers tend to be exceptionally good at security, and everything you deploy in the cloud is isolated by default. Don’t allow hybrid connection to become the weak link and reduce this compartmentalization.
- You may still be able to use multiple accounts and virtual networks for segregation, by routing different datacenter traffic to different accounts and/or virtual networks. But your on-premise VPN hardware or your cloud provider might not support this, so check before building it into your architecture.
- Cloud and on-premise network security controls may look similar on the surface, but they have deep implementation differences. If you want unified management you need to understand these differences, and be able to harmonize based on security goals – not by trying to force a standard implementation across very different technologies.
- Cloud computing offers many more ways to integrate into your existing operations than you might think. For example instead of using SFTP and setting up public servers to receive data dumps, consider installing your cloud provider’s command-line tools and directly transferring data to their object storage service (fully locked down, of course). Now you don’t need to maintain the burden of either an Internet-accessible FTP server or a hybrid cloud connection.
It’s hard to fully convey the breadth and depth of options for building security into your architectures, even without additional security tools. This isn’t mere theory – we have a lot of real-world experience with different architectures creating much higher security levels than can be achieved on traditional infrastructure at any reasonable cost.
Design the Network Security Architecture
At this point you should have a well-segregated environment where effectively every application, and every environment (e.g., dev/test) for every application, is running on its own virtual network. These assets are mostly either in auto scale groups which spread them around zones and subnets for resiliency; or connect to secure cloud services such as databases, message queues, and storage. These architectures alone, in our experience, are materially more secure than your typical starting point on traditional infrastructure.
Now it’s time to layer on the additional security controls we covered earlier under Cloud Networking 101. Instead of repeating the pros and cons, here are some direct recommendations about when to use each option:
- Security groups: These should be used by default, and set to
deny by default. Only open up the absolute minimum access needed. Cloud services allow you to right-size resources far more easily than on your own hardware, so we find most organizations tend to deploy far fewer number services on each instance, which directly translates to opening up fewer network ports per instance. A large number of cloud deployments we have evaluated use only a good base architecture and security groups for network security.
- ACLs: These mostly make sense in hybrid deployments, where you need to closely match or restrict communications between the data center and the cloud. Security groups are usually a better choice, and we only recommend falling back to ACLs or subnet-level firewalling when you cannot achieve your security objectives otherwise.
- Virtual Appliances: Whenever you need capabilities beyond basic firewalls, this is where you are likely to end up. But we find host agents often make more sense when they offer the same capabilities, because virtual appliances become costly bottlenecks which restrict your cloud architecture options. Don’t deploy one merely because you have a checkbox requirement for a particular tool – ensure it makes sense first. Over time we do see them becoming more “cloud friendly”, but when we rip into requirements on projects, we often find there are better, more cloud-appropriate ways to meet the same security objectives.
- Host security agents are often a better option than a virtual appliance because they don’t restrict virtual networking architectural options. But you need to ensure you have a way to deploy them consistently. Also, make sure you pick cloud-specific tools designed to work with features such as auto scaling. These tools are particularly useful to cover network monitoring gaps, meet IDS/IPS requirements, and satisfy all your normal host security needs.
Of course you will need some way of managing these controls, even if you stick to only capabilities and features offered by your cloud provider.
Security groups and ACLs are managed via API or your cloud provider’s console. They use the same management plane as the rest of the cloud, but this won’t necessarily integrate out of the box with the way you manage things internally. You can’t track these across multiple accounts and virtual networks unless you use a purpose-built tool or write your own code. We will talk about specific techniques for management in the next section, but make sure you plan out how to manage these controls when you design your architecture.
Platform as a Service introduces its own set of security differences. For example in some situations you still define security groups and/or ACLs for the platform (as with a cloud load balancer); but in other cases access to the platform is only via API, and may require an outbound public Internet connection, even from a private network segment. PaaS also tends to rely more on DNS rather than IP addresses, to help the cloud provider maintain flexibility. We can’t give you any hard and fast rules here. Understand what’s required to connect to the platform, and then ensure your architecture allows those connections. When you can manage security treat it like any other cluster of servers, and stick with the minimum privileges possible.
We cannot cover anything near every option for every cloud in a relatively short (believe it or not) paper like this, but for the most part once you understand these fundamentals and the core differences of working in software-defined environments, it gets much easier to adapt to new tools and technologies.
Especially once you realize that you start by integrating security into the architecture, instead of trying to layer it on after the fact.
Manage Cloud (and Hybrid) Network Security Operations
Building in security is one thing, but keeping it up to date over time is an entirely different – and harder – problem. Not only do applications and deployments change over time, but cloud providers have this pesky habit of “innovating” for “competitive advantage”. Someday things might slow down, but it definitely won’t be within the lifespan of this particular research.
Here are some suggestions on managing cloud network security for the long haul.
Organization and Staffing
It’s a good idea to make sure you have cloud experts on your network security team, people trained for the platforms you support. They don’t need to be new people, and depending on your scale this doesn’t need to be their full-time focus, but you definitely need the skills. We suggest you build your team with both security architects (to help in design) and operators (to implement and fix).
Cloud projects occur outside the constraints of your data center, including normal operations, which means you might need to make some organizational changes so security is engaged in projects. A security representative should be assigned and integrated into each cloud project. Think about how things normally work – someone starts a new project and security gets called when they need access or firewall rule changes. With cloud computing network security isn’t blocking anything (unless they need access to an on-premise resource) and entire projects can happen without security or ops every being directly involved. You need to adapt policies and organizational structure to minimize this risk. For example, work with procurement to require a security evaluation and consultation before any new cloud account is opened.
Because so much of cloud network security relies on architecture, it isn’t just important to have a security architect on the team – it is essential they be engaged in projects early. It goes without saying that this needs to be a collaborative role. Don’t merely write up some pre-approved architectures, and then try to force everyone to work within those constraints. You’ll lose that fight before you even know it started.
We hinted at this in the section above: one of the first challenges is to find all the cloud projects, and then keep finding new ones as they pop up over time. You need to enumerate the existing cloud network security controls. Here are a couple ways we have seen clients successfully keep tabs on cloud computing:
- If your critical assets (such as the customer database) are well locked down, you can use this to control cloud projects. If they want access to the data/application/whatever, they need to meet your security requirements.
- Procurement and Accounting are your next best options. At some point someone needs to pay the (cloud) piper, and you can work with Accounting to identify payments to cloud providers and tie them back to the teams involved. Just make sure you differentiate between those credit card charges to Amazon for office supplies, and the one to replicate your entire datacenter up into AWS.
- Hybrid connections to your data center are pretty easy to track using established process. Unless you let random employees plug in VPN routers.
- Lastly, we suppose you could try setting a policy that says “don’t cloud without telling us”. I mean, if you trust your people and all. It could work. Maybe. It’s probably good to have to keep the auditors happy anyway.
The next discovery challenge is to figure out how the cloud networks are architected and secured:
- First, always start with the project team. Sit down with them and perform an architecture and implementation review.
- It’s a young market, but there are some assessment tools that can help. Especially to analyze security groups and network security, and compare against best practices.
- You can use your cloud provider’s console in many cases, but most of them don’t provide a good overall network view. If you don’t have a tool to help, you can use scripts and API calls to pull down the raw configuration and manually analyze it.
Integrating with Development
In the broadest, sense there are two kinds of cloud deployments: applications you build and run in the cloud (or hybrid), and core infrastructure (like file and mail servers) you transition to the cloud. Developers play the central role in the former, but they are also often involved in the latter.
The cloud is essentially software defined everything. We build and manage all kinds of cloud deployments using code. Even if you start by merely transitioning a few servers into virtual machines at a cloud provider, you will always end up defining and managing much of your environment in code.
This is an incredible opportunity for security. Instead of sitting outside the organization and trying to protect things by building external walls, we gain much greater ability to manage security using the exact same tools development and operations use to define, build, and run the infrastructure and services. Here are a few key ways to integrate with development and ensure security is integrated:
- Create a handbook of design patterns for the cloud providers you support, including security controls and general requirements. Keep adding new patterns as you work on new projects. Then make this library available to business units and development teams so they know which architectures already have general approval from security.
- A cloud security architect is essential, and this person or team should engage early with development teams to help build security into every initial design. We hate to have to say it, but their role really needs to be collaborative. Lay down the law with a bunch of requirements that interfere with the project’s execution, and you definitely won’t be invited back to the table.
- A lot of security can be automated and templated by working with development. For example monitoring and automation code can be deployed on projects without the project team having to develop them from scratch. Even integrating third party tools can often be managed programmatically.
Change is constant in cloud computing. The foundational concept dynamic adjustment of capacity (and configuration) to meet changing demands. When we say “enforce policies” we mean that, for a given project, once you design the security you are able to keep it consistent. Just because clouds change all the time doesn’t mean it’s okay to let a developer drop all the firewalls by mistake.
The key policy enforcement difference between traditional networking and the cloud is that in traditional infrastructure security has exclusive control over firewalls and other security tools. In the cloud, anyone with sufficient authorization in the cloud platform (management plane) can make those changes. Even applications can potentially change their own infrastructure around them. That’s why you need to rely on automation to detect and manage change.
You lose the single point of control. Heck, your own developers can create entire networks from their desktops. Remember when someone occasionally plugged in their own wireless router or file server? It’s a bit like that, but more like building their own datacenter over lunch. Here are some techniques for managing these changes:
- Use access controls to limit who can change what on a given cloud project. It is typical to allow developers a lot of freedom in the dev environment, but lock down any network security changes in production, using your cloud provider’s IAM features.
- To the greatest extent possible, try to use cloud provider specific templates to define your infrastructure. These files contain a programmatic description of your environment, including complete network and network security configurations. You load them into the cloud platform and it builds the environment for you. This is a very common way to deploy cloud applications, and essential in organizations using DevOps to enforce consistency.
- When this isn’t possible you will need to use a tool or manually pull the network architecture and configuration (including security) and document them. This is your baseline.
- Then you need to automate change monitoring using a tool or the features of your cloud and/or network security provider:
- Cloud platforms are slowly adding monitoring and alerting on security changes, but these capabilities are still new and often manual. This is where cloud-specific training and staffing can really pay off, and there are also third-party tools to monitor these changes for you.
- When you use virtual appliances or host security, you don’t rely on your cloud provider, so you may be able to hook change management and policy enforcement into your existing approaches. These are security-specific tools, so unlike cloud provider features the security team will often have exclusive access and be responsible for making changes themselves.
- Did we mention automation? We will talk about it more in a minute, because it’s the only way to maintain cloud security.
Normalizing On-Premise and Cloud Security
Organizations have a lot of security requirements for very good reasons, and need to ensure those controls are consistently applied. We all have developed a tremendous amount of network security experience over decades running our own networks, which is still relevant when moving into the cloud. The challenge is to carry over the requirements and experience, without assuming everything is the same in the cloud, or letting old patterns prevent us from taking full advantage of cloud computing.
- Start by translating whatever rules sets you have on-premise into a comparable version for the cloud. This takes a few steps:
- Figure out which rules should still apply, and what new rules you need. For example a policy to deny all
ssh traffic from the Internet won’t work if that’s how you manage public cloud servers. Instead a policy that limits
ssh access to your corporate CIDR block makes more sense. Another example is the common restriction that back-end servers shouldn’t have any Internet access at all, which may need updating if they need to connect to PaaS components of their own architecture.
- Then adjust your policies into enforceable rulesets. For example security groups and ACLs work differently, so how you enforce them changes. Instead of setting subnet-based policies with a ton of rules, tie security group policies to instances by function. We once encountered a client who tried to recreate very complex firewall rulesets into security groups, exceeding their provider’s rule count limit. Instead we recommended a set of policies for different categories of instances.
- Watch out for policies like “deny all traffic from this IP range”. Those can be very difficult to enforce using cloud-native tools, and if you really have those requirements you will likely need a network security virtual appliance or host security agent. In many projects we find you can resolve the same level of risk with smarter architectural decisions (e.g., using immutable servers, which we will describe in a moment).
- Don’t just drop in a virtual appliance because you are used to it and know how to build its rules. Always start with what your cloud provider offers, then layer on additional tools as needed.
- If you migrate existing applications to the cloud the process is a bit more complex. You need to evaluate existing security controls, discover and analyze application dependencies and network requirements, and then translate them for a cloud deployment, taking into account all the differences we have been discussing.
- Once you translate the rules, normalize operations. This means having a consistent process to deploy, manage, and monitor your network security over time. Fully covering this is beyond to scope of this research, as it depends on how you manage network security operations today. Just remember that you are trying to blend what you do now with what the cloud project’s requirements, not simply enforce your existing processes under an entirely new operating model.
We hate to say it, but we will – this is a process of transition. We find customers who start on a project-by-project basis are more successful, because they can learn as they go, and build up a repository of knowledge and experience.
Automation and Immutable Network Security
Cloud security automation isn’t merely fodder for another paper – it’s an entirely new body of knowledge we are all only just beginning to build.
Any organization that moves to the cloud in any significant way learns quickly that automation is the only way to survive. How else can you manage multiple copies of a single project in different environments – never mind dozens or hundreds of different projects, each running in their own sets of cloud accounts across multiple providers?
Then, keep all those projects compliant with regulatory requirements and your internal security policies.
Yeah, it’s like that.
Fortunately this isn’t an insoluble problem. Every day we see more examples of companies successfully using the cloud at scale, and staying secure and compliant. Today they largely build their own libraries of tools and scripts to continually monitor and enforce changes. We also see some emerging tools to help with this management, and expect to see many more in the near future.
A core developing concept tied to automation is immutable security, and we have used it ourselves.
One of the core problems in security is managing change. We design something, build in security, deploy it, validate that security, and lock everything down. This inevitably drifts as it’s patched, updated, improved, and otherwise modified. Immutable security leverages automation, DevOps techniques, and inherent cloud characteristics to break this cycle. To be honest, it’s really just DevOps applied to security, and all the principles are in wide use already.
For example an immutable server is one that is never logged into or changed in production. If you go back to about auto scaling, we deploy servers based on standard images. Changing one of those servers after deployment doesn’t make sense, because those changes wouldn’t be in the image, so new versions launched by auto scaling wouldn’t include them. Instead DevOps creates a new image with all the changes, then alters the auto scale group rules to deploy new instances based on the new image, and man optionally prune off the older versions.
In other words no more patching, and no more logging into servers. You take a new known-good state, and completely replace what is in production.
Think about how this applies to network security. We can build templates to automatically deploy entire environments at our cloud providers. We can write network security policies, then override any changes automatically, even across multiple cloud accounts. This pushes the security effort earlier into design and development, and enables much more consistent enforcement in operations. And we use the exact same toolchain as Development and Operations to deploy our security controls, rather than trying to build our own on the side and overlay enforcement afterwards.
This might seem like an aside, but these automation principles are the cornerstone of real-world cloud security, especially at scale. This is a capability we never have in traditional infrastructure, where we cannot simply stamp out new environments automatically, and need to hand-configure everything.
Posted at Monday 28th September 2015 6:57 pm
(1) Comments •
This is the second post in a new series I’m posting for public feedback, licensed by Algosec. Well, that is if they like it – we are sticking to our Totally Transparent Research policy. I’m also live-writing the content on GitHub if you want to provide any feedback or suggestions. Click here for the first post in the series.
There isn’t one canonical cloud networking stack out there; each cloud service provider uses their own mix of technologies to wire everything up. Some of these might use known standards, tech, and frameworks, while others might be completely proprietary and so secret that you, as the customer, don’t ever know exactly what is going on under the hood.
Building cloud scale networks is insanely complex, and the different providers clearly see networking capabilities as a competitive differentiator.
So instead of trying to describe all the possible options, we’ll keep things at a relatively high level and focus on common building blocks we see relatively consistently on the different platforms.
Types of Cloud Networks
When you shop providers, cloud networks roughly fit into two buckets:
- Software Defined Networks (SDN) that fully decouple the virtual network from the underlying physical networking and routing.
- VLAN-based Networks that still rely on the underlying network for routing, lacking the full customization of an SDN.
Most providers today offer full SDNs of different flavors, so we’ll focus more on those, but we do still encounter some VLAN architectures and need to cover them at a high level.
Software Defined Networks
As we mentioned, Software Defined Networks are a form of virtual networking that (usually) takes advantage of special features in routing hardware to fully abstract the virtual network you see from the underlying physical network. To your instance (virtual server) everything looks like a normal network. But instead of connecting to a normal network interface it connects to a virtual network interface which handles everything in software.
SDNs don’t work the same as a physical network (or even an older virtual network). For example, in an SDN you can create two networks that use the same address spaces and run on the same physical hardware but never see each other. You can create an entirely new subnet not by adding hardware but with a single API call that “creates” the subnet in software.
How do they work? Ask your cloud provider. Amazon Web Services, for example, intercepts every packet, wraps it and tags it, and uses a custom mapping service to figure out where to actually send the packet over the physical network with multiple security checks to ensure no customer ever sees someone else’s packet. (You can watch a video with great details at this link). Your instance never sees the real network and AWS skips a lot of the normal networking (like ARP requests/caching) within the SDN itself.
SDN allows you to take all your networking hardware, abstract it, pool it together, and then allocate it however you want. On some cloud providers, for example, you can allocate an entire class B network with multiple subnets, routed to the Internet behind NAT, in just a few minutes or less. Different cloud providers use different underlying technologies and further complicate things since they all offer different ways of managing the network.
Why make things so complicated? Actually, it makes management of your cloud network much easier, while allowing cloud providers to give customers a ton of flexibility to craft the virtual networks they need for different situations. The providers do the heavy lifting, and you, as the consumer, work in a simplified environment. Plus, it handles issues unique to cloud, like provisioning network resources faster than existing hardware can handle configuration changes (a very real problem), or multiple customers needing the same private IP address ranges to better integrate with their existing applications.
Virtual LANs (VLANs)
Although they do not offer the same flexibility as SDNs, a few providers still rely on VLANS. Customers must evaluate their own needs, but VLAN-based cloud services should be considered outdated compared to SDN-based cloud services.
VLANs let you create segmentation on the network and can isolate and filter traffic, in effect just cutting off your own slice of the existing network rather than creating your own virtual environment. This means you can’t do SDN-level things like creating two networks on the same hardware with the same address range.
- VLANs don’t offer the same flexibility. You can create segmentation on the network and isolate and filter traffic, but can’t do SDN-level things like create two networks on the same hardware with the same address range.
- VLANs are built into standard networking hardware, which is why that’s where many people used to start. No special software needed.
- Customers don’t get to control their addresses and routing very well
- They can’t be trusted for security segmentation.
Because VLANs are built into standard networking hardware, they used to be where most people started when creating cloud computing as no special software was required. But customers on VLANs don’t get to control their addresses and routing very well, and they scale and perform terribly when you plop a cloud on top of them. They are mostly being phased out of cloud computing due to these limitations.
Defining and Managing Cloud Networks
While we like to think of one big cloud out there, there is more than one kind of cloud network and several technologies that support them. Each provides different features and presents different customization options. Management can also vary between vendors, but there are certain basic characteristics that they exhibit. Different providers use different terminology, so we’ve tried out best to pick ones that will make sense once you look at particular offerings.
Cloud Network Architectures
An understanding of the types of cloud network architectures and the different technologies that enable them is essential to fitting your needs with the right solution.
There are two basic types of cloud network architectures.
- Public cloud networks are Internet facing. You connect to your instances/servers via the public Internet and no special routing needed; every instance has a public IP address.
- Private cloud networks (sometimes called “virtual private cloud”) use private IP addresses like you would use on a LAN. You have to have a back-end connection — like a VPN — to connect to your instances. Most providers allow you to pick your address ranges so you can use these private networks as an extension of your existing network. If you need to bridge traffic to the Internet, you route it back through your data center or you use Network Address Translation to a public network segment, similarly to how home networks use NAT to bridge to the Internet.
These are enabled and supported by the following technologies.
- Internet connectivity (Internet Gateway) which hooks your cloud network to the Internet. You don’t tend to directly manage it, your cloud provider does it for you.
- Internal Gateways/connectivity connect your existing datacenter to your private network in the cloud. These are often VPN based, but instead of managing the VPN server yourself, the cloud provider handles it (you just manage the configuration). Some providers also support direct connections through partner broadband network providers that will route directly between your data center and the private cloud network, instead of using a VPN (which are on leased lines).
- Virtual Private Networks - Instead of using the cloud provider’s, you can always set up your own, assuming you can bridge the private and public networks in the cloud provider. This kind of setup is very common, especially if you don’t want to directly connect your data center and cloud, but still want a private segment and allow access to it for your users, developers and administrators.
Cloud providers all break up their physical infrastructure differently. Typically they have different data centers (which might be a collection of multiple data centers clumped together) in different regions. A region or location is the physical location of the data center(s), while a zone is a sub-section of that region used for designing availability. These are for:
- Performance - By allowing you to take advantage of physical proximity, you can improve performance of applications that conduct high levels of traffic.
- Regulatory requirements - Flexibility in the geographic location of your data stores can help meet local legal and regulatory requirements around data residency.
- Disaster recovery and maintaining availability - Most providers charge for some or all network traffic if you communicate across regions and locations, which would make disaster recovery expensive. That’s why they provide local “zones” that break out an individual region into isolated pieces with their own network, power, and so forth. A problem might take out one zone in a region, but shouldn’t take out any others, giving customers a way to build for resiliency without having to span continents or oceans. Plus, you don’t tend to pay for the local network traffic between zones.
Managing Cloud Networks
Managing these networks depends on all of the components listed above. Each vendor will have its own set of tools based on certain general principles.
- Everything is managed via APIs, which are typically REST (representational state transfer)-based.
- You can fully define and change everything remotely via these APIs and it happens nearly instantly in most cases.
- Cloud platforms also have web UIs, which are simply front ends for the same APIs you might code to but tend to automate a lot of the heavy lifting for you.
- Key for security is protecting these management interfaces since someone can otherwise completely reconfigure your network while sitting at a hipster coffee shop, making them, by definition, evil (you can usually spot them by the ski masks, according to our clip art library).
Hybrid Cloud Architectures
As mentioned, your data center may be connected to the cloud. Why? Sometimes you need more resources and you don’t want them on the public Internet. This is a common practice for established companies that aren’t starting from scratch and need to mix and match resources.
There are two ways to accomplish this.
- VPN connections - You connect to the cloud via a dedicated VPN, which is nearly always hardware-based and hooked into your local routers to span traffic to the cloud. The cloud provider, as mentioned, handles their side of the VPN, but you still have to configure some of it. All traffic goes over the Internet but is isolated.
- Direct network connections - These are typically set up over leased lines. They aren’t necessarily more secure and are much more expensive but they can reduce latency, or make your router-hugging network manager feel happy.
While cloud services can provide remarkable flexibility, they also require plenty of customization and present their own challenges for security.
Nearly every Infrastructure as a Service provider supports auto scaling, which is one of the single most important features at the core of the benefits of cloud computing. You can define your own rules in your cloud for when to add or remove instances of a server. For example, you can set a rule that says to add servers when you hit 80 percent CPU load. It can then terminate those instances when load drops (clearly you need to architect appropriately for this kind of behavior).
This creates application elasticity since your resources can automatically adapt based on demand instead of having to leave servers running all the time just in case demand increases. Your consumption now aligns with demand, instead of traditional architectures, which leave a lot of hardware sitting around, unused, until demand is high enough. This is the heart of IaaS. This is what you’re paying for.
Such flexibility creates complexity. If you think about it, you won’t necessarily know the exact IP address of all your servers since they may appear and disappear within minutes. You may even design in complexity when you design for availability — by creating rules to keep multiple instances in multiple subnets across multiple zones available in case one of them drops out. Within those virtual subnets, you might have multiple different types of instances with different security requirements. This is pretty common in cloud computing.
Fewer static routes, highly dynamic addressing and servers that might only “live” for less than an hour… all this challenges security. It requires new ways of thinking, which is what the rest of this paper will focus on.
Our goal here is to start getting you comfortable with how different cloud networks can be. On the surface, depending on your provider, you may still be managing subnets, routing tables, and ACLs. But underneath, these are now (probably) database entries implemented in software, not the hardware you might be used to.
Posted at Tuesday 22nd September 2015 5:13 pm
(1) Comments •
This is the start in a new series I’m posting for public feedback, licensed by Algosec. Well, that is if they like it – we are sticking to our Totally Transparent Research policy. I’m also live-writing the content on GitHub if you want to provide any feedback or suggestions. With that, here’s the content…
For a few decades we have been refining our approach to network security. Find the boxes, find the wires connecting them, drop a few security boxes between them in the right spots, and move on. Sure, we continue to advance the state of the art in exactly what those security boxes do, and we constantly improve how we design networks and plug everything together, but overall change has been incremental. How we think about network security doesn’t change – just some of the particulars.
Until you move to the cloud.
While many of the fundamentals still apply, cloud computing releases us from the physical limitations of those boxes and wires by fully abstracting the network from the underlying resources. We move into entirely virtual networks, controlled by software and APIs, with very different rules. Things may look the same on the surface, but dig a little deeper and you quickly realize that network security for cloud computing requires a different mindset, different tools, and new fundamentals.
Many of which change every time you switch cloud providers.
The challenge of cloud computing and network security
Cloud networks don’t run magically on pixie dust, rainbows, and unicorns – they rely on the same old physical network components we are used to. The key difference is that cloud customers never access the ‘real’ network or hardware. Instead they work inside virtual constructs – that’s the nature of the cloud.
Cloud computing uses virtual networks by default. The network your servers and resources see is abstracted from the underlying physical resources. When you server gets IP address 10.0.0.12, that isn’t really that IP address on the routing hardware – it’s a virtual IP address on a virtual network. Everything is handled in software, and most of these virtual networks are Software Defined Networks (SDN). We will go over SDN in more depth in the next section.
These networks vary across cloud providers, but they are all fundamentally different from traditional networks in a few key ways:
- Virtual networks don’t provide the same visibility as physical networks because packets don’t move around the same way. We can’t plug a wire into the network to grab all the traffic – there is no location all traffic traverses, and much of the traffic is wrapped and encrypted anyway.
- Cloud networks are managed via Application Programming Interfaces – not by logging in and provisioning hardware the old-fashioned way. A developer has the power to stand up an entire class B network, completely destroy an entire subnet, or add a network interface to a server and bridge to an entirely different subnet on a different cloud account, all within minutes with a few API calls.
- Cloud networks change faster than physical networks, and constantly. It isn’t unusual for a cloud application to launch and destroy dozens of servers in under an hour – faster than traditional security and network tools can track – or even build and destroy entire networks just for testing.
- Cloud networks look like traditional networks, but aren’t. Cloud providers tend to give you things that look like routing tables and firewalls, but don’t work quite like your normal routing tables and firewalls. It is important to know the differences.
Don’t worry – the differences make a lot of sense once you start digging in, and most of them provide better security that’s more accessible than on a physical network, so long as you know how to manage them.
The role of hybrid networks
A hybrid network bridges your existing network into your cloud provider. If, for example, you want to connect a cloud application to your existing database, you can connect your physical network to the virtual network in your cloud.
Hybrid networks are extremely common, especially as traditional enterprises begin migrating to cloud computing and need to mix and match resources instead of building everything from scratch. One popular example is setting up big data analytics in your cloud provider, where you only pay for processing and storage time, so you don’t need to buy a bunch of servers you will only use once a quarter.
But hybrid networks complicate management, both in your data center and in the cloud. Each side uses a different basic configuration and security controls, so the challenge is to maintain consistency across both, even though the tools you use – such as your nifty next generation firewall – might not work the same (if at all) in both environments.
This paper will explain how cloud network security is different, and how to pragmatically manage it for both pure cloud and hybrid cloud networks. We will start with some background material and cloud networking 101, then move into cloud network security controls, and specific recommendations on how to use them. It is written for readers with a basic background in networking, but if you made it this far you’ll be fine.
Posted at Wednesday 16th September 2015 7:42 pm
(0) Comments •
This is our third post on AWS security best practices, to be compiled into a short paper. See also our first post, on defending the management plane and our second post, on using built-in AWS tools.
Finish with Additional Security Tools
AWS provides an excellent security foundation but most deployments require a common set of additional tools:
- Amazon’s monitoring tools (CloudTrail, CloudWatch, and Config) offer incomplete coverage, and no correlation or analysis. Integrate their feeds into existing log management, SIEM, monitoring, and alerting tools that natively support and correlate AWS logs and feeds, so they can fill gaps by tracking activity AWS currently misses.
- Use a host configuration management tool designed to work in the cloud to automatically configure and update instances.
- Embed agents into approved AMIs or bootstrap through installation scripts.
- Insert baseline security policies so all instances meet security configuration requirements. This is also a good way to insert security agents.
- Enhance host security in key areas using tools and packages designed to work in highly dynamic cloud deployments:
- Agents should be lightweight, communicate with the AWS metadata service for important information, and configure themselves on installation.
- Host Integrity Monitoring can detect unauthorized changes to instances.
- Logging and alerting collect local audit activity and alerts on policy violations.
- Host firewalls fill gaps left by security group limitations, such as rule set sizes.
- Some tools can additionally secure administrator access to hosts without relying solely on
- For web applications use a cloud-based Web Application Firewall.
- Some services also provide DDoS protection. Although AWS can support high levels of traffic, DDoS protection stops traffic before it hits your instances… and your AWS bill.
- Choose security assessments and scanning tools that tie into AWS APIs and comply with Amazon’s scanning requirements.
- Look for tools that not only scan instances, but can assess the AWS environment.
Where to Go from Here
These fundamentals are just the surface of what is possible with cloud security. Explore advanced techniques like Software Defined Security, DevOps integration, and secure cloud architectures.
Posted at Wednesday 17th December 2014 6:19 pm
(0) Comments •
This is a short series on where to start with AWS security. We plan to release it as a concise white paper soon. It doesn’t cover everything but is designed to kickstart and prioritize your cloud security program on Amazon. We do plan to write a much deeper paper next year, but we received several requests for something covering the fundamentals, so here you go…
Building on a Secure Foundation
Amazon Web Services is one of the most secure public cloud platforms available, with deep datacenter security and many user-accessible security features. Building your own secure services on AWS requires properly using what AWS offers, and adding additional controls to fill the gaps.
Amazon’s datacenter security is extensive – better than many organizations achieve for their in-house datacenters. Do your homework, but unless you have special requirements you can feel comfortable with their physical, network, server, and services security. AWS datacenters currently hold over a dozen security and compliance certifications, including SOC 1/2/3, PCI-DSS, HIPAA, FedRAMP, ISO 27001, and ISO 9001.
Never forget that you are still responsible for everything you deploy on top of AWS, and for properly configuring AWS security features. AWS is fundamentally different than even a classical-style virtual datacenter, and understanding these differences is key for effective cloud security. This paper covers the foundational best practices to get you started and help focus your efforts, but these are just the beginning of comprehensive cloud security.
Defend the Management Plane
One of the biggest risks in cloud computing is an attacker gaining access to the cloud management plane: the web interface and APIs to configure and control your cloud. Fail to lock down this access and you might as well just hand over your datacenter to the bad guys.
Fortunately Amazon provides an extensive suite of capabilities to protect the management plane at multiple levels, including both preventative and monitoring controls. Unfortunately the best way to integrate these into existing security operations isn’t always clear; it can also be difficult to identify any gaps. Here are our start-to-finish recommendations.
Control access and compartmentalize
- The most important step is to enable Multifactor Authentication (MFA) for your
root account. For
root accounts we recommend using a hardware token which is physically secured in a known location which key administrators can access in case of emergency.
- Also configure your Security Challenge Questions with random answers which aren’t specific to any individual. Write down the answers and also store them in a secure but accessible location.
- Then create separate administrator accounts using Amazon’s Identity and Access Management (IAM) for super-admins, and also turn on MFA for each of those accounts. These are the admin accounts you will use from day to day, saving your
root account for emergencies.
- Create separate AWS accounts for development, testing, and production, and other cases where you need separation of duties. Then tie the accounts together using Amazon’s consolidated billing. This is a very common best practice.
Locking down your
root account means you always keep control of your AWS management, even in case an administrator account is compromised. Using MFA on all administrator accounts means you won’t be compromised even if an attacker manages to steal a password. Using different AWS accounts for different environments and projects compartmentalizes risks while supporting cross-account access when necessary.
Amazon’s IAM policies are incredibly granular, down to individual API calls. They also support basic logic, such as tying a policy to resources with a particular tag. It can get complicated quickly, so aside from ‘super-admin’ accounts there are several other IAM best practices:
- Use the concept of least privilege and assign different credentials based on job role or function. Even if someone needs full administrative access sometimes, that shouldn’t be what they use day to day.
- Use IAM Roles when connecting instances and other AWS components together. This establishes temporary credentials which AWS rotates automatically.
- Also use roles for cross account access. This allows a user or service in one AWS account to access resources in another, without having to create another account, and ties access to policies.
- Apply object-level restrictions using IAM policies with tags. Tag objects and the assigned IAM policies are automatically enforced.
- For administrative functions use different accounts and credentials for each AWS region and service.
- If you have a user directory you can integrate it with AWS using SAML 2.0 for single sign-on. But be careful; this is most suitable for accounts that don’t need deep access to AWS resources, because you lose the ability to compartmentalize access using different accounts and credentials.
- Never embed Access Keys and Secret Keys in application code. Use IAM Roles, the Security Token Service, and other tools to eliminate static credentials. Many attackers are now scanning the Internet for credentials embedded in applications, virtual images, and even posted on code-sharing sites.
These are only a starting point, focused on
root and key administrator accounts. Tying them to multifactor authentication is your best defense against most management plane attacks.
Amazon provides three tools to monitor management activity within AWS. Enable all of them:
- CloudTrail logs all management (API) activity on AWS services, including Amazon’s own connections to your assets. Where available it provides complete transparency for both your organization’s and Amazon’s access.
- CloudWatch monitors the performance and utilization of your AWS assets, and ties tightly into billing. Set billing alarms to detect unusually high levels of activity. You can also send system logs to CloudWatch but this isn’t recommended as a security control.
- Config is a new service that discovers services and configurations, and tracks changes over time. It is a much cleaner way to track configuration activity than CloudTrail.
CloudTrail and Config don’t cover all regions and services, so understand where the gaps are. As of this writing Config is still in preview, with minimal coverage, but both services expand capabilities regularly. These features provide important data feeds but most organizations use additional tools for overall collection and analysis, including log management and SIEM.
- As a next step many organizations use a management portal (open source or commercial) instead of allowing direct access to AWS. This gives much tighter control over access and monitoring. Proxy administrator access with Privileged User Management, “jump boxes” or similar tools.
Posted at Thursday 4th December 2014 5:00 pm
(0) Comments •
Last week I wrote up my near epic fail on Amazon Web Services where I ‘let’ someone launch a bunch of Litecoin mining instances in my account.
Since then I received some questions on my forensics process, so I figure this is a good time to write up the process in more detail. Specifically, how to take a snapshot and use it for forensic analysis.
I won’t cover all the steps at the AWS account layer – this post focuses on what you should do for a specific instance, not your entire management plane.
The first step, which I skipped, is to collect all the metadata associated with the instance. There is an easy way, a hard way (walk through the web UI and take notes manually), and the way I’m building into my nifty tool for all this that I will release at RSA (or sooner, if you know where to look).
The best way is to use the AWS command line tools for your operating system. Then run the command
aws ec2 describe-instances --instance-ids i-5203422c (inserting your instance ID). Note that you need to follow the instructions linked above to properly configure the tool and your credentials.
I suggest piping the output to a file (e.g.,
aws ec2 describe-instances --instance-ids i-5203422c > forensic-metadata.log) for later examination.
You should also get the console output, which is stored by AWS for a short period on boot/reboot/termination.
aws ec2 get-console-output --instance-id i-5203422c. This might include a bit more information if the attacker mucked with logs inside the instance, but won’t be useful for a hacked instance because it is only a boot log. This is a good reason to use a tool that collects instance logs outside AWS.
That is the basics of the metadata for an instance. Those two pieces collect the most important bits. The best option would be CloudTrail logs, but that is fodder for a future post.
Now on to the instance itself. While you might log into it and poke around, I focused on classical storage forensics. There are four steps:
- Take a snapshot of all storage volumes.
- Launch an instance to examine the volumes.
- Attach the volumes.
- Perform your investigation.
If you want to test any of this, feel free to use the snapshot of the hacked instance that was running in my account (well, one of 10 instances). The snapshot ID you will need is
Snapshot the storage volumes
I will show all this using the web interface, but you can also manage all of it using the command line or API (which is how I now do it, but that code wasn’t ready when I had my incident).
There is a slightly shorter way to do this in the web UI by going straight to volumes, but that way is easier to botch, so I will show the long way and you can figure out the shorter alternative yourself.
- Click Instances in your EC2 management console, then check the instance to examine.
- Look at the details on the bottom, click the Block Devices, then each entry. Pull the Volume ID for every attached volume.
- Switch to Volumes and then snapshot each volume you identified in the steps above.
- Label each snapshot so you remember it. I suggest date and time, “Forensics”, and perhaps the instance ID.
You can also add a name to your instance, then skip direct to Volumes and search for volumes attached to it.
Remember, once you take a snapshot, it is read-only – you can create as many copies as you like to work on without destroying the original. When you create a volume from an instance it doesn’t overwrite the snapshot, it gets another copy injected into storage.
Snapshots don’t capture volatile memory, so if you need RAM you need to either play with the instance itself or create a new image from that instance and launch it – perhaps the memory will provide more clues. That is a different process for another day.
Launch a forensics instance
Launch the operating system of your choice, in the same region as your snapshot. Load it with whatever tools you want. I did just a basic analysis by poking around.
Attach the storage volumes
- Go to Snapshots in the management console.
- Click the one you want, right-click, and then “Create Volume from Snapshot”. Make sure you choose the same Availability Zone as your forensics instance.
- Seriously, make sure you choose the same Availability Zone as your instance. People always mess this up. (By ‘people’, I of course mean ‘I’).
- Go back to Volumes.
- Select the new volume when it is ready, and right click/attach.
- Select your forensics instance. (Mine is stopped in the screenshot – ignore that).
- Set a mount point you will remember.
Perform your investigation
- Create a mount point for the new storage volumes, which are effectively external hard drives. For example,
sudo mkdir /forensics.
- Mount the new drive, e.g.,
sudo mount /dev/xvdf1 /forensics. Amazon may change the device mapping when you attach the drive (technically your operating system does that, not AWS, and you get a warning when you attach).
- Remember, use
sudo bash (or the appropriate equivalent for your OS) if you want to peek into a user account in the attached volume.
And that’s it. Remember you can mess with the volume all you want, then attach a new one from the snapshot again for another pristine copy. If you need a legal trail my process probably isn’t rigorous enough, but there should be enough here that you can easily adapt.
Again, try it with my snapshot if you want some practice on something with interesting data inside. And after RSA check back for a tool which automates nearly all of this.
Posted at Monday 13th January 2014 9:35 pm
(0) Comments •
Over the past few years I have spent a lot of time traveling the world, talking and teaching about cloud security. To back that up I have probably spent more time researching the technologies than any other topic since I moved from being a developer and consultant into the analyst role. Something seemed different at such a fundamental level that I was driven to put my hands on a keyboard and see what it looked and felt like. To be honest, even after spending a couple years at this, I still feel I am barely scratching the surface.
But along the way I have learned a heck of a lot. I realized that many of my initial assumptions were wrong, and the cloud required a different lens to tease out the security implications.
This paper is the culmination of that work. It attempts to break down the security implications of cloud computing, both positive and negative, and change how we approach the problem. In my travels I have found that security professionals are incredibly receptive to the differences between cloud and traditional infrastructure, but they simply don’t have the time to spend 3-4 years researching and playing with cloud platforms and services.
I hope this work helps people think about cloud computing differently, providing practical examples of how to leverage and secure it today.
I would like to thank CloudPassage for licensing the paper. This is something I have wanted to write for a long time, but it was hard to find a security company ready to take the plunge. Their financial support enables us to release this work for free. As always, the content was developed completely independently using our Totally Transparent Research process – this time it was actually developed on GitHub to facilitate public response.
I would also like to thank the Cloud Security Alliance for reviewing and co-branding and co-hosting the paper.
There are two versions. The Executive Summary is 2 pages of highlights from the main report. The Full Version includes an executive summary (formatted differently), as well as the full report.
As always, if you have any feedback please leave it here or on the report’s permanent home page.
Posted at Friday 10th January 2014 4:09 pm
(1) Comments •
Update: Amazon reached out to me and reversed the charges, without me asking or complaining (or in any way contacting them). I accept full responsibility and didn’t post this to get a refund, but I’m sure not going to complain – neither is Mike.
This is a bit embarrassing to write.
I take security pretty seriously. Okay, that seems silly to say, but we all know a lot of people who speak publicly on security don’t practice what they preach. I know I’m not perfect – far from it – but I really try to ensure that when I’m hacked, whoever gets me will have earned it. That said, I’m also human, and sometimes make sacrifices for convenience. But when I do so, I try to make darn sure they are deliberate, if misguided, decisions. And there is the list of things I know I need to fix but haven’t had time to get to.
Last night, I managed to screw both those up.
It’s important to fess up, and I learned (the hard way) some interesting conclusions about a new attack trend that probably needs its own post. And, as is often the case, I made three moderately small errors that combined to an epic FAIL.
I was on the couch, finishing up an episode of Marvel’s Agents of S.H.I.E.L.D. (no, it isn’t very good, but I can’t help myself; if they kill off 90% of the cast and replace them with Buffy vets it could totally rock, though). Anyway… after the show I checked my email before heading to bed. This is what I saw:
Dear AWS Customer,
Your security is important to us. We recently became aware that your AWS Access Key (ending with 3KFA) along with your Secret Key are publicly available on github.com . This poses a security risk to you, could lead to excessive charges from unauthorized activity or abuse, and violates the AWS Customer Agreement.
We also believe that this credential exposure led to unauthorized EC2 instances launched in your account. Please log into your account and check that all EC2 instances are legitimate (please check all regions - to switch between regions use the drop-down in the top-right corner of the management console screen). Delete all unauthorized resources and then delete or rotate the access keys. We strongly suggest that you take steps to prevent any new credentials from being published in this manner again.
Please ensure the exposed credentials are deleted or rotated and the unauthorized instances are stopped in all regions before 11-Jan-2014.
NOTE: If the exposed credentials have not been deleted or rotated by the date specified, in accordance with the AWS Customer Agreement, we will suspend your AWS account.
Detailed instructions are included below for your convenience.
CHECK FOR UNAUTHORIZED USAGE
To check the usage, please log into your AWS Management Console and go to each service page to see what resources are being used. Please pay special attention to the running EC2 instances and IAM users, roles, and groups. You can also check “This Month’s Activity” section on the “Account Activity” page. You can use the dropdown in the top-right corner of the console screen to switch between regions (unauthorized resources can be running in any region).
DELETE THE KEY
If are not using the access key, you can simply delete it. To delete the exposed key, visit the “Security Credentials” page. Your keys will be listed in the “Access Credentials” section. To delete a key, you must first make it inactive, and then delete it.
ROTATE THE KEY
If your application uses the access key, you need to replace the exposed key with a new one. To do this, first create a second key (at that point both keys will be active) and modify your application to use the new key. Then disable (but not delete) the first key. If there are any problems with your application, you can make the first key active again. When your application is fully functional with the first key inactive, you can delete the first key. This last step is necessary - leaving the exposed key disabled is not acceptable.
I bolted off the couch, mumbling to my wife, “my Amazon’s been hacked”, and disappeared into my office. I immediately logged into AWS and GitHub to see what happened.
Lately I have been expanding the technical work I did for my Black Hat presentation, I am building a proof of concept tool to show some DevOps-style Software Defined Security techniques. Yes, I’m an industry analyst, and we aren’t supposed to touch anything other than PowerPoint, but I realized a while ago that no one was actually demonstrating how to leverage the cloud and DevOps for defensive security. Talking about it wasn’t enough – I needed to show people.
The code is still super basic but evolving nicely, and will be done in plenty of time for RSA. I put it up on GitHub to keep track of it, and because I plan to release it after the talk. It’s actually public now because I don’t really care if anyone sees it early.
The Ruby program currently connects to AWS and a Chef server I have running, and thus needs credentials. Stop smirking – I’m not that stupid, and the creds are in a separate configuration file that I keep locally. My first thought was that I screwed up the
.gitignore and somehow accidentally published that file.
Nope, all good. But it took all of 15 seconds to realize that a second
test.rb file I used to test smaller code blocks still had my Access Key and Secret Key in a line I commented out. When I validated my code before checking it in, I saw the section for pulling from the configuration file, but missed the commented code containing my keys.
Back to AWS.
I first jumped into my Security Credentials section and revoked the key. Fortunately I didn’t see any other keys or access changes, and that key isn’t embedded in anything other than my dev code so deleting it wouldn’t break anything. If this was in a production system it would have been very problematic.
Then I checked my running instances. Nothing in
us-east-1 where I do most of my work, so I started from the top of the list and worked my way down.
There is was. 5 extra large instances in
us-west-1. 5 more in Ireland. All had been running for 72 hours, which, working from my initial checkin of the code on GitHub, means the bad guys found the credentials within about 36 hours of creating the project and loading the files.
Time for incident response mode. I terminated all the instances and ran through every region in AWS to make sure I didn’t miss anything. The list was:
- Lock down or revoke the Access Key.
- Check all regions for running instances, and terminate anything unexpected (just the 10 I found).
- Snapshot one of the instances for forensics. I should have also collected metadata but missed that (mainly which AMI they used).
- Review IAM settings. No new users/groups/roles, and no changes to my running policies.
- Check host keys. No changes to my existing ones, and only a generic one in each region where they launched the new stuff.
- Check security groups. No changes to any of my running ones.
- Check volumes and snapshots. Clean, except for the boot volumes of those instances.
- Check CloudTrail. The jerks only launched instances in regions not supported, so even though I have CloudTrail running I didn’t collect any activity.
- Check all other services. Fortunately, being my dev/test/teaching account, I know exactly what I’m running. So I jumped into the Billing section of my account and confirmed only the services I am actively using were running charges. This would be a serious pain if you were actively using a bunch of different services.
- Notice the $500 in accumulated EC2 charges. Facepalm. Say naughty words.
I was lucky. The attackers didn’t mess with anything active I was running. That got me curious, because 10 extra large instances racking up $500 in 3 days initially made me think they were out to hurt me. Maybe even embarrass the stupid analyst. Then I went into forensics mode.
Without CloudTrail I had no way to look for the origin of the API calls using that Access Key. Aside from the basic metadata (some of which I lost because I forgot to record it when I terminated the instances – a note to my future self). What I did have was a snapshot of their server. Forensics isn’t my expertise but I know a few basics.
I launched a new (micro) instance, and then created and attached a new volume based on the snapshot. I mounted the volume and started poking around. Since I still had the original snapshot, I didn’t need to worry about altering the data on the volume. I can always create and attach a new one.
The first thing I did was check the logs. I saw a bunch of references to CUDA 5.5. Uh oh, that’s GPU stuff, and explains why they launched a bunch of extra larges. The bad guys didn’t care about me, and weren’t out to get me specifically, as you will see in a sec. Then I checked the user accounts. There was only one, ec2-user, which is standard on Amazon Linux instances. The home directory had everything I needed:
cpuminer CudaMiner tor-0.2.4.20.tar.gz
I didn’t need to check DuckDuckGo to figure that one out (but I did to be sure). Looks like some Litecoin/Bitcoin mining. That explains the large instances. Poking around the logs I also found the IP address ‘18.104.22.168’, which shows as Latvia. Not that I can be certain of anything because they also use Tor.
I could dig in more but that told me all I needed to know. (If you want a copy of the snapshot, let me know). Here’s my conclusion:
Attackers are scraping GitHub for AWS credentials embedded in code (and probably other cloud services). I highly doubt I was specifically targeted. They then use these to launch extra large instances for Bitcoin mining, and mask the traffic with Tor.
Someone mentioned this in our internal chat room the other day, so I know I’m not the first to write it up.
Here is where I screwed up:
- I did not have billing alerts enabled. This is the one error I knew I should have dealt with earlier but didn’t get around to. I paid the price for complacency.
- I did not completely scrub my code before posting to GitHub. This was a real mistake – I tried and made an error. I blame my extreme sleep deprivation, and a lot of this work was done over the holidays, with a lot of distractions. I was sloppy.
- I used an Access key in my dev code without enough (any) restrictions. This was also on my to-do list, but after I finished up more of the code because a lot of what I’m building needs extensive permissions and I can’t predict them all ahead of time. Managing AWS IAM is a real pain so I was saving it for the end.
Here is how I am fixing things:
- Billing alerts are now enabled, with a threshold just above my monthly average.
- I am creating an IAM policy and Access Key that restricts the application to my main development region, and removes the ability to make IAM changes or access/adjust CloudTrail.
- All dev work using that Access Key will be in a region that supports CloudTrail.
- When I am done with development I will create a more tailored IAM policy that only grants required operations. If I actually designed this thing ahead of time instead of hacking on it a little every day I could have done this ahead of time.
In the end I was lucky. I am only out $500 (well, Securosis is), 45 minutes of investigation and containment effort, and my ego (which has plenty of deflation room). They could have much mucked with my account more deeply. They couldn’t lock me out with an Access Key, but still could have cost me many more hours, days, and cash.
Lesson learned. Not that I didn’t know it already, but I suppose a hammer to the head every now and then helps keep you frosty.
I’d like to thank “Alex R.”, who I assume is on the AWS security team. I am also impressed that AWS monitors GitHub for this, and then has a process to validate whether the keys were potentially used, to help point compromised customers in the right direction.
Posted at Tuesday 7th January 2014 6:00 pm
(39) Comments •
This is part five of a series. You can read part one, part two, part three, or part four; or track the project on GitHub.
Real World Examples
Cloud computing covers such a wide range of different technologies that there are no shortage of examples to draw from. Here are a few generic examples from real-world deployments. These get slightly technical because we want to highlight practical, tactical techniques to prove we aren’t just making all this up:
Embedding and Validating a Security Agent Automatically
In a traditional environment we embed security agents by building them into standard images or requiring server administrators to install and register them. Both options are very prone to error and omission, and hard to validate because you often need to rely on manual scanning. Both issues become much easier to manage in cloud computing.
To embed the agent:
- The first option is to build the agent into images. Instead of using generic operating system images you build your own, then require users to only launch approved images. In a private cloud you can enforce this with absolute control of what they run. In public clouds it is a bit tougher to enforce, but you can quickly catch exceptions using our validation process.
- The second option, and our favorite, is to inject the agent when instances launch. Some operating systems support initialization scripts which are passed to the launching instance by the cloud controller. Depending again on your cloud platform, you can inject these scrips automatically when autoscaling, via a management portal, or manually at other times. The scripts install and configure software in the instance before it is accessible on the network.
- Either way you need an agent that understands how to work within cloud infrastructure and is capable of self-registering to the management server. The agent pulls system information and cloud metadata, then connects with its management server, which pushes configuration policies back to the agent so it can self-configure. This process is entirely automated the first time the agent runs.
- Configuration may be based on detected services running on the instance, metadata tags applied to the instance (in the cloud management plane), or other characteristics such as where it is on the network.
- We provide a detailed technical example of agent injection and self-configuration in our Software Defined Security paper.
The process is simple. Build the agent into images or inject it into launching instances, then have it connect to a management server to configure itself. The capabilities of these agents vary widely. Some replicate standard endpoint protection but others handle system configuration, administrative user management, log collection, network security, host hardening, and more.
Validating that all your instances are protected can be quite easy, especially if your tool supports API:
- Obtain a list of all running instances from the cloud controller. This is a simple API call.
- Obtain a list of all instances with the security agent. This should be an API call to your security management platform, but might require pulling a report if that isn’t supported.
- Compare the lists. You cannot hide in the cloud, so you know every single instance. Compare active instances against managed instances, and find the exceptions.
We also show how to do this in the paper linked above.
Controlling SaaS with SAML
Pretty much everyone uses some form of Software as a Service, but controlling access and managing users can be a headache. Unless you link up using federated identity, you need to manage user accounts on the SaaS platform manually. Adding, configuring, and removing users on yet another system, and one that is always Internet accessible, is daunting. Federated identity solves this problem:
- Enable federated identity extensions on your directory server. This is an option for Active Directory and most LDAP servers.
- Contact your cloud provider to obtain their SAML configuration and management requirements. SAML (Security Assertion Markup Language) is a semi-standard way for a relying party to allow access and activities based on approval from an identity provider.
- Configure SAML yourself or use a third-party tool compatible with your cloud provider(s) which does this for you. If you use several SaaS providers a tool will save a lot of effort.
- With SAML users don’t have a username and password with the cloud provider. The only way to log in is to first authenticate to your directory server, which then provides (invisible to the user) a token to allow access to the cloud provider. Users need to be in the office or on a VPN.
- If you want to enable remote users without VPN you can set up a cloud proxy and issue them a special URL to use instead of the SaaS provider’s standard address. This address redirects to your proxy, which then handles connecting back to your directory server for authentication and authorization. This is something you typically buy rather than build.
Why do this? Instead of creating users on the SaaS platform it enables you to use existing user accounts in your directory server and authorize access using standard roles and groups, just like you do for internal servers. You also now get to track logins, disable accounts from a single source (your directory server), and otherwise maintain control. It also means people can’t steal a user’s password and then access Salesforce from anywhere on the Internet
Compartmentalizing Cloud Management with IAM
One of the largest new risks in cloud computing is Internet-accessible management of your entire infrastructure. Most cloud administrators use cloud APIs and command line interfaces to manage the infrastructure (or PaaS, and even sometimes SaaS). This means access credentials are accessed through environment variables or even the registry. If they use a web interface that opens up browser-based attacks. Either way, without capability compartmentalization an attacker could take complete control over their infrastructure by merely hacking a laptop. With a few API calls or a script they could copy or destroy everything in minutes.
All cloud platforms support internal identity and access management to varying degrees – this is something you should look for during your selection process. You can use this to limit security risks – not just to break out development and operations teams. The following isn’t supported on all platforms yet but it gives you an idea of the options:
- Create a Security Group and assign it IAM rights, and restrict these rights from all other groups. “IAM rights” means the security team manages new users, changes user and group rights, and prevents privilege escalation. They can even revoke administrative access to running instances by modifying the associated rights.
- Use separate cloud development and production groups and accounts. Even if you use DevOps require users to switch accounts for different tasks.
- The development group can have complete control over a development environment, which is segregated from the operations environment. Restrict them to building and launching in cloud segments that are isolated from the Internet and only route back to your organization. Developers can have free access to create, destroy, and otherwise manage development instances.
- In your production environment break out administrative tasks. Restrict all snapshotting and termination of instances to separate roles. This prevents attackers from copying data or destroying servers unless they manage to get into one of those accounts.
- Security Group changes should be restricted to the security team (or another designated group). Cloud administrators can move instances into and out of different Security Groups if needed (although ideally you would also restrict this), but only a small team should set the rules for production.
- You will still need super-admin accounts, but these can be highly restricted and used as infrequently as possible.
- In general use different groups, with different credentials, for different parts of your infrastructure. For example in production you could break out management by application stack.
- If you need auditing on API calls, and your cloud platform doesn’t support it, require administrators to connect through a proxy server that logs activity.
Under these guidelines an attacker needs to break into multiple accounts to cause the worst damage. Notice that what we just described isn’t necessarily easy to manage at scale – this is an area where you would allocate the resources freed by reducing other risks such as patching.
Hypersegregation with Security Groups
Our last example is also one of the simplest and most powerful.
As mentioned earlier, a Security Group is essentially a basic stateless firewall implemented by the cloud platform. It’s like having a small cheap firewall in front of every server. When first using Security Groups, many users think of them like subnet firewalls, but that isn’t quite how they work. In a subnet the firewall in front of the group protects access to the systems in the group. A Security Group is more like a firewall policy applied on a per-system level. Instances in a Security Group can’t communicate with other instances in the same group unless you create an explicit rule to allow that.
Every single instance is, by default, firewalled off from every other one. This enables an incredible level of compartmentalization we like to call hypersegregation. (Because we are analysts and we tend to make up our own words).
For example, within an application stack you will likely have multiple instances of your web servers, application servers, and database servers. Each of those should be in a Security Group that allows it to only talk to the layers immediately above and below in the stack. The instances in the Security Group shouldn’t be allowed to talk to each other so cracking a server only allows very limited communications, over approved ports and protocols, to the servers directly above and below.
Security Groups also should not allow any public Internet access (except from the web server group). Administrative access is restricted to known addresses from either a jump server or your internal IP range.
Better yet, instead of always leaving every server open to administrative access, keep that closed unless needed. Then adjust the individual server’s Security Group to make the change. You do this through the cloud management plane, so an attacker would need to crack the management plane, obtain server credentials, and finally obtain access to the server itself.
This setup is nearly impossible to create with traditional infrastructure. We cannot afford all those physical firewalls, and creating that many switch-based rules is a non-starter at scale. We could do it using host firewall rules, but managing those across multiple platforms in a dynamic environment is insanely complex.
In this case the cloud offers substantially better security by default.
Where to Go from Here
This paper can only offer a high-level overview to highlight how cloud computing is different for security, and to give you ideas on how to adjust your security controls to leverage its advantages while accounting for the different risks. The real devil is in the details, and we always worry that we are over-simplifying with these overviews.
But every single thing we described is being used, today, in the real world. These aren’t cases of “maybe it will work”, but examples of what leading cloud users are implementing on a daily basis. Our examples are generally far more basic than what we have seen in practice.
The problem is that most security professionals don’t have the time or resources to become cloud security experts. Their days are filled with the ongoing minutiae of stopping attacks, meeting compliance requirements, and fighting fires. It becomes easy to dismiss cloud computing as yet another fad or trend we can manage as we always have, especially in light of vendors’ deluge of announcements that their products work just the same in the cloud.
But cloud computing is far not business as usual. It is an entirely new technology and operations model that fundamentally disrupts existing practices. One with a staggering rate of change, as well as entirely new platforms and capabilities emerging constantly. Two years ago big data was accessible only to those with top-line resources and massive datacenters. Now anyone can rent petabyte-scale data warehouses for a few hours of analysis. Using a web browser.
Adoption of the cloud will only accelerate, and it is vital that security professionals come up to speed on the technologies and adjust to meet new demands. The opportunities to improve security over existing practices are powerful and practical.
We will continue to cover this in depth in future research, digging into the specifics of how to handle cloud security and what it means to existing practices. Hopefully you will find it useful.
Posted at Wednesday 20th November 2013 9:17 pm
(1) Comments •
This is part four of a series. You can read part one, part two, or part three; or track the project on GitHub.
As a reminder, this is the second half of our section on examples for adapting security to cloud computing. As before this isn’t an exhaustive list – just ideas to get you started.
There are three reasons to encrypt data in the cloud, in order of their importance:
- To protect data in backups, snapshots, and other portable copies or extracts.
- To protect data from cloud administrators.
How you encrypt varies greatly, depending on where the data resides and which particular risks most concern you. For example many cloud providers encrypt object file storage or SaaS by default, but they manage the keys. This is often acceptable for compliance but doesn’t protect against a management plane breach.
We wrote a paper on infrastructure encryption for cloud, from which we extracted some requirements which apply across encryption scenarios:
- If you are encrypting for security (as opposed to a compliance checkbox) you need to manage your own keys. If the vendor manages your keys your data may still be exposed in the event of a management plane compromise.
- Separate key management from cloud administration. Sure, we are all into DevOps and flattening management, but this is one situation where security should manage outside the cloud management plane.
- Use key managers that are as agile and elastic as the cloud. Like host security agents, your key manager needs to operate in an environment where servers appear and disappear automatically, and networks are virtual.
- Minimize SaaS encryption. The only way to encrypt data going to a SaaS provider is with a proxy, and encryption breaks the processing of data at the cloud provider. This reduces the utility of the service, so minimize which fields you need to encrypt. Or, better yet, trust your provider.
- Use secure cryptography agents and libraries when embedding encryption in hosts or IaaS and PaaS applications. The defaults for most crypto libraries used by developers are not secure. Either understand how to make them secure or use libraries designed from the ground up for security.
Federate and Automate Identity Management
Managing users and access in the cloud introduces two major headaches:
- Controlling access to external services without having to manage a separate set of users for each.
- Managing access to potentially thousands or tens of thousands of ephemeral virtual machines, some of which may only exist for a few hours.
In the first case, and often the second, federated identity is the way to go:
- For external cloud services, especially SaaS, rely on SAML-based federated identity linked to your existing directory server. If you deal with many services this can become messy to manage and program yourself, so consider one of the identity management proxies or services designed specifically to tackle this problem.
- For access to your actual virtual servers, consider managing users with a dynamic privilege management agent designed for the cloud. Normally you embed SSH keys (or known Windows admin passwords) as part of instance initialization (the cloud controller handles this for you). This is highly problematic for privileged users at scale, and even straight directory server integration is often quite difficult. Specialized agents designed for cloud computing dynamically update users, privileges, and credentials at cloud speeds and scale.
Adapt Network Security
Networks are completely virtualized in cloud computing, although different platforms use different architectures and implementation mechanisms, complicating the situation. Despite that diversity there are consistent traits to focus on. The key issues come down to loss of visibility using normal techniques, and adapting to the dynamic nature of cloud computing.
All public cloud providers disable networking sniffing, and that is an option on all private cloud platforms. A bad guy can’t hack a box and sniff the entire network, but you also can’t implement IDS and other network security like in traditional infrastructure. Even when you can place a physical box on the network hosting the cloud, you will miss traffic between instances on the same physical server, and highly dynamic network changes and instances appear and disappear too quickly to be treated like regular servers. You can sometimes use a virtual appliance instead, but unless the tool is designed to cloud specifications, even one that works in a virtual environment will crack in a cloud due to performance and functional limitations.
While you can embed more host network security in the images your virtual machines are based on, the standard tools typically won’t work because they don’t know exactly where on the network they will pop up, nor what addresses they need to talk to. On a positive note, all cloud platforms include basic network security. Set your defaults properly, and every single server effectively comes with its own firewall.
- Design a good baseline of Security Groups (the basic firewalls that secure the networking of each instance), and use tags or other mechanisms to automatically apply them based on server characteristics. A Security Group is essentially a firewall around every instance, offering compartmentalization that is extremely difficult to get in a traditional network.
- Use a host firewall, or host firewall management tool, designed for your cloud platform or provider. These connect to the cloud itself to pull metadata and configure themselves more dynamically than standard host firewalls.
- Also consider pushing more network security, including IDS and logging, into your instances.
- Prefer virtual network security appliances that support cloud APIs and are designed for the cloud platform or provider. For example, instead of forcing you to route all your virtual traffic through it as if you were on a physical network, the tool could distribute its own workload – perhaps even integrating with hypervisors.
- Take advantage of cloud APIs. It is very easy to pull every Security Group rule and then locate every instance. Combined with some additional basic tools you could then automate finding errors and omissions. Many cloud deployments do this today as a matter of course.
- Whatever tools you use, they must be able to account for the high rate of servers appearing and disappearing in a cloud. Rules must follow hosts.
Leverage Cloud Characteristics
This section is a bit more advanced, but you can reap significant security advantages once you start to leverage the nature of the cloud. Instead of patching, just launch properly configured new servers and swap using on a cloud load balancer. Find every single system in your cloud deployment, including extensive metadata, with a simple API call. Deploy applications to a Platform as a Service (PaaS) and stop worrying about misconfigured servers. Here are some real world examples and recommendations to get you started, but they barely scratch the surface:
- Use immutable servers instead of patching. Few sysadmins patch servers without trepidation, and this is a frequent source of downtime. Eliminate the worry by running your applications behind cloud load balancers, and instead of patching servers, launch new ones with the updated software. Then slowly (or quickly) switch traffic to the new, ‘patched’ servers. If something doesn’t work your old servers are still there and you can shift traffic back. It is like having a spare datacenter lying around.
- Leverage stateless security strategies to mange your environment in real time. Normally we rely on knowledge from scanners and assessments to understand our assets and environments, which can be out of date and difficult to keep complete. The cloud controller knows where everything is, how it is configured (to a degree), and even who owns or created it, so we have a constant stream of comprehensive real-time data. A server cannot exist in the cloud without the controller knowing about it. Your entire network architecture is but an API call away – no scanners needed. Track and manage your security state in real time.
- Automate more security. Embed security configurations and agents into images, or inject them into instances when they launch. Every virtual machine, when launched, can automatically configure host-based security – especially if they can communicate with a management server designed for the cloud. For example, a host can register itself with a configuration management server, then secure running services by default depending on what the host is intended for. You could even automatically adjust Security Group firewall rules based on the software services running on the host, who owns it, and where it is in your application stack.
- Standardize security with Platform as a Service. Hate patching database servers or configuring them properly? Struggle with developers and admins who open up too many services on application servers? Use Platform as a Service instead, and improve your ability to standardize security.
- Build a security abstraction layer. Nothing prevents your security team from using the same cloud APIs and management tools as administrators and developers. Configured properly, this provides security oversight and control without interfering with development or operations. For example you could restrict management of cloud IAM to the security team, enabling them to assume management of a server in case of a security incident. The security team could control key network Security Groups and security in production, while still allowing developers to manage it themselves in more isolated development environments. Embed a host security agent into every image (or instance, using launch scripts) and security gains a hook into every running virtual machine.
- Move to Software Defined Security. This concept is an extension of basic automation. Nothing prevents Security from writing its own programs using cloud APIs, and the APIs of security and operations tools, so you can create powerful and agile security controls. For example you could write a small program to find every instance in your environment that isn’t linked into your configuration management tool, and recheck every few minutes. The tool would identify who launched the server, the operating system, where it was on the network, and the surrounding network security. You could, at a keystroke, take control of the server, notify the owner, and isolate it on the network until you know what it is for; then integrate it into configuration management and enforce security policies. All with perhaps 100 lines of code.
These should get you thinking – they start to show how the cloud can nearly eradicate certain security problems and enable you to shift resources.
Posted at Wednesday 20th November 2013 12:27 am
(0) Comments •
This is part three of a series. You can read part one or part two, or track the project on GitHub.
This part is split into two posts – here is the first half:
Adapting Security for Cloud Computing
If you didn’t already, you should now have a decent understanding of how cloud computing differs from traditional infrastructure. Now it’s time to switch gears to how to evolve security to address shifting risks.
These examples are far from comprehensive, but offer a good start and sample of how to think differently about cloud security.
As we keep emphasizing, taking advantage of the cloud poses new risks, as well as both increasing and decreasing existing risks. The goal is to leverage the security advantages, freeing up resources to cover the gaps. There are a few general principles for approaching the problem that help put you in the proper state of mind:
- You cannot rely on boxes and wires. Quite a bit of classical security relies on knowing the physical locations of systems, as well as the network cables connecting them. Network traffic in cloud computing is virtualized, which completely breaks this model. Network routing and security are instead defined by software rules. There are some advantages here, which are beyond the scope of this paper but which we will detail with future research.
- Security should be as agile and elastic as the cloud itself. Your security tools need to account for the highly dynamic nature of the cloud, where servers might pop up automatically and run for only an hour before disappearing forever.
- Rely more on policy-based automation. Wherever possible design your security to use the same automation as the cloud itself. For example there are techniques to automate (virtual) firewall rules based on tags associated with a server, rather than applying them manually.
- Understand and adjust for the characteristics of the cloud. Most virtual network adapters in cloud platforms disable network sniffing, so that risk drops off the list. Security groups are essentially virtual firewalls that on individual instance, meaning you get full internal firewalls and compartmentalization by default. Security tools can be embedded in images or installation scripts to ensure they are always installed, and cloud-aware ones can self configure. SAML can be used to provide absolute device and user authentication control to external SaaS applications. All these and more are enabled by the cloud, once you understand its characteristics.
- Integrate with DevOps. Not all organizations are using DevOps, but DevOps principles are pervasive in cloud computing. Security teams can integrate with this approach and leverage it themselves for security benefits, such as automating security configuration policy enforcement.
DevOps is an IT model that blurs the lines between development and IT operations. Developers play a stronger role in managing their own infrastructure through heavy use of programming and automation. Since cloud enables management of infrastructure using APIs, it is a major enabler of DevOps. While it is incredibly agile and powerful, lacking proper governance and policies it can also be disastrous since it condenses many of the usual application development and operations check points.
These principles will get you thinking in cloud terms, but let’s look at some specifics.
Control the Management Plane
The management plane is the administrative interfaces, web and API, used to manage your cloud. It exists in all types of cloud computing service models: IaaS, PaaS, and SaaS. Someone who compromises a cloud administrator’s credentials has the equivalent of unmonitored physical access to your entire data center, with enough spare hard drives, fork lifts, and trucks to copy the entire thing and drive away. Or blow the entire thing up.
- We cannot overstate the importance of hardening the management plane. It literally provides absolute control over your cloud deployment – often including all disaster recovery.*
We have five recommendations for securing the management plane:
- If you manage a private cloud, ensure you harden the web and API servers, keeping all components up to date and protecting them with the highest levels of web application security. This is no different than protecting any other critical web server.
- Leverage the Identity and Access Management features offered by the management plane. Some providers offer very fine-grained controls. Most also integrate with your existing IAM using federated identity. Give preference to your platform/provider’s controls and…
- Compartmentalize with IAM. No administrator should have full rights to all aspects of the cloud. Many providers and platforms support granular controls, including roles and groups, which you can leverage to restrict the damage potential of a compromised developer or workstation. For example, you can have a separate administrator for assigning IAM rights, only allow administrators to manage certain segments of your cloud, and further restrict them from terminating instances.
- Add auditing, logging, and alerting where possible. This is one of the more difficult problems in cloud security because few cloud providers audit administrator activity – such as who launched or stopped a server using the API. For now you will likely need a third-party tool or to work with particular providers for necessary auditing.
- Consider using security or cloud management proxies. These tools and services proxy the connection between a cloud administrator and the public or private cloud management plane. They can apply additional security rules and fill logging and auditing gaps.
Automate Host (Instance) Security
An instance is a virtual machine, which is based on a stored template called an image. When you ask the cloud for a server you specify the image to base it on, which includes an operating system and might bring a complete single-server application stack. The cloud then configures it using scripts which can embed administrator credentials, provide an IP address, attach and format storage, etc.
Instances may exist for years or minutes, are configured dynamically, and can be launched nearly anywhere in your infrastructure – public or private. You cannot rely on manually assessing and adjusting their security. This is very different than building a server in a test environment, performing a vulnerability scan, and then physically installing it behind a particular firewall and IPS. More security needs to be embedded in the host itself, and the images the instances are based on.
Fortunately, these techniques improve your ability to enforce secure configurations.
- Embed security in images and at launch. If you use your own images, you can embed security settings and agents in the base images, and set them to activate and self-configure (after connecting to their management server) when an instance launches. Alternatively, many clouds and images support passing initialization scripts to new instances, which process them during launch using the same framework the cloud uses to configure essential settings. You can embed security settings and install security software in these scripts.
- Integrate with configuration management. Most serious cloud administrators rely on a new breed of configuration management tools to dynamically manage their systems with automation. Security can leverage these to enforce base security configurations, even down to specific application settings, which apply to all managed systems. Set properly, these configuration management tools handle both initial configurations and maintaining state, overwriting local changes when policy pulls and pushes occur.
- Dynamically configure security agents. When security agents are embedded into images or installed automatically on launch, default settings rarely meet a system’s particular security requirements. Thankfully, cloud platforms provide rich metadata on instances and their environment – including system configuration, network configuration, applications installed, etc. Cloud-aware security agents connect instantly to the management server on launch, and then self-configure with policies leveraging their information. They can also update settings in near real time based on new policies or changes in the cloud.
- Security agents should be lightweight, designed for the cloud, and cloud agnostic. You shouldn’t need 8 different security agents for 12 different cloud providers, and agents shouldn’t materially increase system resource requirements or struggle to communicate in a dynamic and virtual network.
- Host security tools should support REST APIs. This enables you to integrate your security into the cloud fabric itself when needed. Agents don’t necessarily need to communicate with the management server over a REST API, but the management server should expose key functions via (secure) API. This enables you to, for example, write scripts to pull host security information, compare it against network security information, and make adjustments or generate reports. Or integrate security alerts and status from the host tool into your SIEM without additional connectors.
REST APIs are the dominant format for APIs in web-friendly applications. They run over HTTP and are much easier to integrate and manage compared to older SOAP APIs.
We will finish with adapting security controls in a second post tomorrow…
Posted at Monday 18th November 2013 8:19 pm
(1) Comments •