Login  |  Register  |  Contact
Friday, January 13, 2017

Secure Networking in the Cloud Age: Use Cases

By Mike Rothman

As we wrap up our series on secure networking in the cloud era, we have covered the requirements and migration considerations for this new network architecture – highlighting increased flexibility for configuration, scaling, and security services. In a technology environment which can change as quickly as a developer hitting ‘commit’ for a new feature, infrastructure needs to keep pace, and that is not something most enterprises can or should build themselves.

One of the cornerstones of this approach to building networks is considering the specific requirements of the site, users, and applications, when deciding whether to buy or build the underlying network. This post will work through a few use cases to highlight the power of this approach, including:

  • Compromised Remote Device: The underlying network supporting cloud computing needs to respond, pretty much instantaneously when under attack. This use case will show how you can protect it network from users who appear to be compromised, without needing someone at a keyboard reconfiguring pipes.
  • Optimized Interconnectivity: You might have 85 stores which need to be interconnected, or possibly 2,000 employees in the field. Or maybe 10 times that. Either way, provisioning a secure network for your entire organization can be highly challenging – not least because mobile employees and smaller sites need robust access and strong security, but fixed routes can negatively impact network latency and performance.
  • Protecting SaaS: Cloud applications have become a visibility black hole for enterprises, so we’ll discuss how to protect users and sites which access critical corporate data, even if they never traverse a traditional corporate network. This is especially important because the lack of clear inspection points on the network breaks traditional security models, so you need to bring the secure network to the site and/or users.
  • Security by Constituency: One of our key requirements is the ability to flexibly support users, locations, and applications; so our final use case will show how a policy-driven software-defined secure network can provide the secure connectivity required by a variety of different users.

Of course there is considerable overlap between these use cases. For instance a mobile employee may predominately use SaaS applications, thus benefit from both those use cases. But these scenarios help illuminate the future of secure networking.

Compromised Remote Device

It happens on your network all the time. A device is compromised and starts acting strangely. One of your security monitors fires an alert, which shows suspicious activity from that device. In the old days you needed to figure out whether the issue was real; then go into the network console, isolate the device, and begin investigation. It all sounds simple enough, right?

But what happens when the device is remote, and not on your corporate network? You might not know the device has been compromised, and you may have no way to take it off the network. Hopefully it won’t slip buy long enough to escalate privileges and find a way into your internal network.

To address this, you basically extend your network out to your users. So the device connects to the closest point of presence (regardless of where it is) and virtually joins your corporate network. Sure, that sounds a lot like just running each user on a VPN and bringing them back behind your perimeter, but this model offers real advantages.

First, traffic is not backhauled to your corporate network, which avoids overburdening the security controls and adding a huge amount of latency. The burden of enforcing security polices happens within the network, not on the devices running on your premises. Second, compromised devices are isolated from the rest of your network. It becomes much harder for attackers to move laterally through your network, because they need to bypass additional inspection points to reach the internal network.

Once a device has been determined to be compromised, a policy can automatically quarantine it to prevent access to key SaaS apps and the internal network.

Optimized Interconnectivity

One of the larger hassles in networking is supporting large numbers of remote sites. Setting up many security devices, especially remotely, both costs and requires onsite IT chops to troubleshoot. And of course traveling employees are all over the place, demanding fully access to critical data (both on the internal network and in the cloud), as if they were in the office.

These are two separate issues, but there is one solution. It involves extending the secure network to the user and/or site. This enables you to use a last-mile service, typically a basic dumb Internet pipe, for access to the closest point of presence with access to your network. Once on your network, the user or site gets all the same intelligent routing and security services as on your corporate network. Without having to backhaul traffic to your corporate network.

Of course you need to figure out whether to build out PoPs and network infrastructure to extend the network where your organization needs it. In reality, you are likely to engage a network service provider to build a virtual network between your sites, and provide connectivity to your users. This gets you out of the Wide Area Networking business. In many scenarios it provides enhanced network performance, increased security, and greater flexibility. We won’t weigh in on cost because many factors affect the cost of provisioning this kind of network, but the additional capabilities make this a pretty easy decision if the costs are in the ballpark.

Protecting SaaS

As we mentioned previously, the advent of SaaS has removed much of your visibility into what employees are doing in critical applications. Let’s consider a sales automation service and a disgruntled salesperson who wants to grab his client list before quitting. If they are sitting in a coffee shop somewhere, you probably have no idea what they are doing within the application, because they don’t traverse your network to access the SaaS service.

But if you have the rep on a performance plan, you know they are a flight risk. So you can proactively set a policy to watch for any downloads of the customer list from the group of employees on performance plans. You should have already locked down your SaaS environment, so this data can only be accessed via the secure network, and your virtual network can monitor the flow and quantity of data to and from that user. You can also inspect the content as it traverses the network, and see they are taking client list or other sensitive data.

At that point you can block the transmission via security policy, and shut down the employee’s access until Human Resources can take care of the situation. All without having to watch for specific alerts or wait for an administrator make changes in real time.

Security by Constituency

One interesting extension of the policies above would be to define slightly different policies for certain groups of employees, and automatically enforce the appropriate policies for every employee. Let’s consider a concern about the CFO’s device. You can put her and all the folks with access to sensitive financial data in a special secure network policy group.

The security for this group is turned up to 11 (in Spinal Tap parlance), since you want to ensure these folks don’t accept or make connections to sites known to participate in botnets. Your secure network has access to threat intelligence feeds including lists of known active bot networks, so you can automatically block traffic for those users to and from all suspicious sites. You can also stop file transfers to any unauthorized site, and alert on anomalous behavior for anyone in that group, to further lock them down.

You can relax security policies for other groups of employees that may not have access to such sensitive data. So you restrict their access to only the general Internet from inside the corporate network, and you don’t allow them to connect to anything internal from outside the network. By policy they can only access corporate data from work. Of course you could implement another policy for those employees, so they can request a temporary exception if they know they’ll need to work from home for a day, and simultaneously increase their monitoring level. Without minimal human intervention.

For all these use cases policies can be pre-defined, so enforcement occurs automatically once they connect to the secure network. This happens regardless of where the employee happens to be, or what application they are accessing. You can change policies as needed, and make them as simple or complicated as your business requires. Your secure network can adapt to your business, as it should.

—Mike Rothman

Tuesday, January 10, 2017

Network Security in the Cloud Age: Requirements and Migration

By Mike Rothman

As we noted in our introductory post for this Network Security in the Cloud Age series, everything changes, and technology is undergoing the most radical change and disruption since… well, ever. We’re not kidding – check out our Tidal Forces post for the rundown. This disruption will have significant ramifications for how we build and manage networks. Let’s work through the requirements for this network of the future, and then provide some perspective on how you can and should migrate to the new network architecture.

At the highest level, the main distinction in building networks in the Cloud Age is moving from a one to many network(s) model. Networks have been traditionally been built and managed as a single enterprise network, which required these environments to be built for peak usage, but at the same time to support the lowest common denominator from a functionality standpoint. Yes, those are contradictory requirements – that’s how it worked out. Your network had to serve all masters (regardless of the disparity in functional requirements per application) and be sized to stay up under any conceivable load.

In the Cloud Age we need to think differently. Now it’s about what kind of network this specific application or use case requires not what you already have. So you build what is needed, where it’s needed. We’ll get into specific use cases later in this series, but a network to support a distributed workforce doesn’t need, and probably shouldn’t have, the same characteristics of the network that interconnects your primary sites. And an externally-facing web application needs a different network than one for access to sensitive data still locked within your enterprise data center.

And everything in the Cloud Age is software defined. You basically program your network, adapting it to specific conditions laid out in a set of governing policies. No more crawling around the wiring closet to find the faulty cable that knocked out your G/L system. Though we’re sure you’ll miss those days.

Cloud Age Secure Network Requirements

When we translate the hand-waving above into specific requirements for a secure network of the future, we come up with the following:

  • Availability: This is consistent with the networks you have been building for decades. It’s a bad day for the network/security team when the network goes down, whether in your data center or the cloud. So a cloud network needs to be built to ensure availability with diverse routes, alternative access points, and alternative access to corporate date – wherever it may reside.
  • Elasticity: Instead of building a network for peak usage, you don’t need to really do anything here. Ensuring sufficient bandwidth is the cloud provider’s problem, not yours. Obviously if you use more you pay for more (metered billing), but you don’t need to put in a big order for new mega switches which might be fully utilized once over the next year. You just need to make sure the provider can scale to what you may need, and that you can expand and extend your network as needed.
  • Software Defined: The cloud demands flexibility. A cloud network needs access flexibility because employees and other constituents move around. It needs architectural flexibility – you will need to adapt to changing requirements in areas such as scaling, usage, and security. Things move very quickly in cloud land, and you don’t have time to wait for network administrators to reconfigure the network, so you need an automated system to do it. This is driven by software, so orchestration and automation via other products and services is essential.
  • Policy-driven: Speaking of Software Defined Networks (SDN), you need a cloud network governed by policies which specify rules for when it changes. Many attributes can drive these policies, and the role of a network security architect is evolving to encompass these policies, because once released into production policies are applied automatically and immediately, so things can go south quickly if they aren’t solid.
  • Flexibly Secure: Finally, you want to make sure all your constituencies can be supported and protected by the cloud network. So if you support remote users proper authentication, access control, and inspection for threat/malware detection must be provided on the ingress side. You’d also like those users’ egress traffic (including encrypted traffic) to be protected against security issues, such as data leakage and connections to malicious servers. Additionally, you should be able to protect traffic to cloud applications. And cloud networks needs to satisfy all these use cases.
  • Monitoring & Reporting: Compliance oversight and governance don’t go away when you move to the cloud. So you need visibility into the network traffic to detect performance and security issues, as well as the ability to generate reports to substantiate network activity and security controls.


A good thing about moving to secure networks in the cloud age is that you don’t need to get there in one fell swoop. It’s not like an overnight cutover to a new switching environment because the old and new vendors cannot play nicely together. This is where moving from one monolithic network to many application-specific networks pays huge dividends.

You can keep running your existing enterprise network to support the functions still served out of your own data centers. If your web app and manufacturing systems run on your own hardware, moving that data to the cloud probably doesn’t make sense. But as you move or rebuild those applications within a public cloud environment or embrace Software as a Service (SaaS) to replace legacy applications, you can move that traffic to a cloud network.

You may be able to take better advantage of your WAN by leveraging a service provider. Supporting access for a global user base, and maintaining connections between sites, may not be the best use of your constrained networking and security resources.

The other area to focus on is the back-end interconnection points between your existing enterprise networks and cloud services. This is where you are most exposed, because any issues in your data center could affect the cloud and vice-versa. Of course if you have architected your cloud networks correctly any damage should be isolated. But make no assumptions. You will want extensive monitoring, and to really lock down traffic between your data center and the cloud.

Service Providers

Understanding your requirements for a cloud network, the next question is to figure out to what you can and should do, and what you should look to a service provider to do for you. As mentioned above, it’s not like the old days of outsourcing, when you moved all your assets (and people) to the outsourcer. You can look to specific service providers for specific use cases, or try to find one to meet many. All while continuing to run your existing network to support existing applications.

In terms of the buy vs. build decision, the fundamental choice is whether to build a network which offers the scale and services you (might) need, across all the geographies where you need connectivity. We’ll dig into specific use cases in our next post. There are variations for each company’s environment, but the choice between buy vs. build is generally reasonably clear.

As you consider looking at service providers, here are some selection criteria to keep in mind:

  • Coverage and Scale: Obviously your provider should have presence in the places you need access. Another area to consider is their network’s scalability architecture. Yes, scale is their problem, but don’t just take their word that they can scale. So diligence is warranted to ensure the provider can handle your scale, especially for any use cases which could be hit by Denial of Service attacks.
  • Security: Well, duh. But we have seen a number of service providers who weren’t as secure as they should have been, so understand the control sets they use and how they secure their own environment. Most vendors understand that a high-profile attack on their infrastructure could be an existential threat, so they take security seriously. But you still need to do your homework, and understand how the provider protects themselves and traffic on their networks.
  • Innovation: Another thing about cloud architecture is how fast it’s changing. What is new and shiny today seems to be obsolete in a quarter or two, so ensure that the provider can support different kinds of networking services, and seems to roll out new capabilities in step with the rest of the market. If you aren’t on the cutting edge of experimenting with new cloud services, your provider doesn’t need to be either. But if you are, you will get very frustrated if you try to route traffic or deploy capabilities your provider doesn’t support.
  • Modular Services: You can move to a cloud network at your own pace, so you want your provider to offer the services you need, when you need them. That means not paying for stuff you don’t use, with a very flexible provisioning process to add new services quickly. For example a portion of your network might just need connectivity. Another use case might require deep packet inspection of all traffic. Another scenario might specify an application front-ended with DDoS protection and a WAF. You don’t need the same provider for everything, but make sure your providers can meet your requirements as they evolve.
  • Monitoring Access: There is necessarily less visibility on cloud-based networks than a network you control in your own facilities. You may not be able to access raw packets, but you need sufficient information to substantiate controls, and satisfy compliance and governance requirements.
  • API Access: Finally, in a policy-driven environment, you need the ability to automate changes to your network environment, through an API. Given the newness of cloud networking, we don’t yet have standard APIs for network access into the cloud, so you’ll be doing the integration yourself (at least until someone delivers a cloud orchestration tool). So an API needs to suffice for now.

As we continue this series, we will dig into a few use cases to show the advantages of secure networking in the cloud, and how to support the most common scenarios you will face as you migrate.

—Mike Rothman

Wednesday, January 04, 2017

Assembling A Container Security Program [New Paper]

By Adrian Lane

We are pleased to launch our latest research paper, on Docker security: Assembling a Container Security Program. Containers are now such integral elements of software delivery that enterprises are demanding security in and around containers. And it’s no coincidence that Docker has recently added a variety of security capabilities to its offerings, but they are only a small subset of what customers need. During our research we learned many things, including that:

  • Containers are no longer a hypothetical topic for discussion among security practitioners. Today Development and Operations teams need a handle on what is being done, and how to verify that security controls are in place.
  • Security attention in this area is still focused on OS hardening. This is complex and can be difficult to manage, but it is a fairly well-understood set of problems. But there are many more important moving pieces in play, which are still largely being ignored.
  • Very little attention is being paid to the build environment – making sure the container contains what it should, and nothing else. The companies we talked to do not, as a rule, verify that internal code and third-party libraries are secure.
  • Human error is more likely to cause issues than security bugs. Running services in the container with root credentials, poor handling of keys and certificates, opening up ports inappropriately, and indiscriminate communications are all common issues… which can be tested for.
  • The handoff from Development to Operations, and how Operations teams vet containers prior to putting them into production, are somewhat free-form. As more containers are delivered faster, especially with continuous integration and DevOps engineering, container management in general – and specifically knowing what containers should be running at any given time – is becoming harder.

Overall, there are many issues beyond OS hardening and patching your Docker runtime. Crucial runtime aspects of container security include monitoring, container segregation, and blocking unwanted communications; these are not getting sufficient attention. They ways containers are built, managed, and deployed are all important aspects of application security, and so should be core to any container security program. So we took an unusually broad view of container security, covering each of these aspects in this paper.

Finally, we would like to thank Aqua Security for licensing this content. Community support like this enables us to bring independent analysis and research to you free of charge. We don’t even require registration. You can grab a copy of the research paper directly, or visit the paper’s landing page in our research library, and please visit Aqua Security if you would like to understand how they help provide container security.

—Adrian Lane

Network Security in the Cloud Age: Everything Changes

By Mike Rothman

We have spent a lot of time discussing the disruptive impact of the cloud and mobility on… pretty much everything. If you need a reminder, check out our Inflection paper, which lays out how we (correctly, in hindsight) saw the coming tectonic shifts in the computing landscape. Rich is updating that research now, so you can check out his first post, where he discusses the trends which threaten promise to upend everything we know about security: Tidal Forces.

To summarize, cloud computing and mobility disrupt the status quo by abstracting and automating huge portions of technology infrastructure – basically replacing corporate data centers in many cases. You no longer stroll down to the wiring closet to troubleshoot network problems, because your employees are distributed across the world, using all sorts of devices to access critical data. Your data center may no longer exist, but it is certainly much less important and valuable today, because it has been replaced to some degree by a monstrous Infrastructure as a Service (IaaS) provider who offers far better economies and much faster turnaround than your IT group ever could. The physical layer is totally abstracted, and you interact with your network (and the rest of your technology stack) through a web console – or more likely, an API.

Development and Operations organizations are now collaborating, which means as soon as a developer makes a change it can be immediately deployed (after some automated testing) to the production environment. Continuous deployment may require network changes, and can introduce security issues. But there isn’t really any ability to have a human scrutinize all the changes, or ensure all the governance and security policies are in place and effective.

To further complicate things, you no longer run many applications on infrastructure you control. In case you haven’t heard, Software as a Service (SaaS) is now a thing (we call it “the new back office”), and you don’t get to tell a SaaS provider what their network should look like. You connect to their service over the Internet, and that’s that. You no longer know where your data is, nor do you have the ability to monitor traffic flows for misuse.

To be a bit clearer about the impact on networking in the cloud age, let’s highlight the impacts:

  1. Your data is everywhere (and nowhere): Whether it’s an application you built (now running in an IaaS environment) or an application you bought (provided by a SaaS vendor), either way you no longer have any idea where your data is, and limited means to protect it on the network.
  2. Lack of visibility: You cannot tap an IaaS or SaaS environment, so you don’t have visibility into what’s happening on your network. Some cloud providers are offering increasing access to network telemetry, but raw packet access is a poor fit for the cloud’s agility and elasticity.
  3. Bottlenecks don’t make sense: One way to get around the lack of visibility is to route all traffic through an inspection point, and enforce security policies there. Unfortunately most cloud-native architectures don’t support that approach, due to the inherent isolation between computing tiers, and the increasingly popularity of serverless systems. The last thing you want to do is make the cloud look just like your existing environment, so traditional bottlenecks won’t survive this disruption.
  4. App-specific infrastructure: Finally, you don’t just have one network to worry about. You can have hundreds if you implement every IaaS stack as its own network(s). Every SaaS service you buy runs on its own network. There is no longer any consistency between cloud application networks. Overall this is an improvement, because each application can have its own network – designed, tuned, and sized to its particular requirements. Applications are no longer forced into a one-size-fits all suboptimal network, but they also aren’t forced onto your network, with all your integrated security requirements and capabilities.
  5. Velocity of change is unprecedented: With continuous deployment changes to the network need to happen in lock-step with application and operational changes. This means your network and security ops folks’ work queues are going the way of the Dodo bird. There just isn’t time for traditional network management and security, and your existing staff cannot keep pace in this kind of environment.

The tidal forces of the cloud are rapidly upending almost everything you know about security. Those who fail to get their arms around this, clinging doggedly to old models, will fail.

Focusing on the Right Things

Before you reach for the hemlock, let’s take a step back to remember what we really need to provide as network security professionals:

  1. Connectivity: The network needs to provide access to resources (applications and data) wherever in the world they reside, whenever users they need access, on whatever device they happen to be using. Within policy constraints of course, but IT can no longer simply dictate access terms.
  2. Availability: The network needs to be reliable and survivable to satisfy application uptime requirements. It is a bad day when business stops because of a network problem, and worse when a security issue takes the network down.
  3. Performance: There are many potential choke-points which can slow down an application. But the network should not be one of them – even during peak usage. In the old days you needed to design and build for peak usage. But you got no credit for the other 99% of the time, when some (perhaps most) of that infrastructure was idle.
  4. Security: Last but not least, you had better not have any security issues originating from the network. Instead the expectation is that you will detect attacks using the network. So you need to make sure the network is secure, rather than a vector for attack.

The cloud can help us satisfy each of these critical imperatives. But: not if you think you can get away with the same old, same old, running all your traffic through a small set of ingress and egress points to inspect traffic using your old security equipment.

Don’t get distracted with outdated expectations. If you can provide connectivity, availability, performance, and security within the tolerances required by your applications, does it matter what the network architecture looks like? We know some purists say yes, but we also remember purists hanging on to the SNA protocol for years. They couldn’t make the necessary changes, and eventually went the way of the dinosaurs. Success moving forward requires making sure you can provide the services your organizations needs, while keeping pace with change in the cloud age.

Everything changes. Including network security. Let’s map out what that means to you and your network security controls, with specificity.

Network Security in the Cloud Age

We need to rethink network security in light of the critical imperatives above. The old stuff doesn’t work any more, and we cannot afford to compromise or reject the power of the cloud to keep a traditional security model. So let’s set some design goals for this new-fangled cloud network:

  • Secure Network Everywhere: Traditionally to enforce security controls on remote sites and users, you would backhaul traffic behind your corporate perimeter, inspecting it using the equipment and policies used for internal traffic. But that doesn’t scale. So let’s flip our perspective: instead of moving traffic to our security controls we will extend our network controls out to the location, user, or device.
  • Infinite Perimeters: You cannot dictate devices’ location or access, so you need to figure out how to build an effective perimeter around each device. Conceptually each device includes its own network, so you need to securely interconnect all your networks. In traditional networks you would set up tunnels between each location or network, but that’s a N! problem, and N is skyrocketing in the cloud age. Network architecture needs to evolve to protect every device wherever it is, doing whatever it is doing, providing a “secure mesh” between all devices.
  • Elasticity: Demand for bandwidth is insatiable, so you need a secure network design which can scale with your requirements. But sizing everything for peak usage is wasteful and expensive, so you will want to contract when you don’t need as much. Ideally you will only ever pay for as much network as you currently need. Especially because now you pay based on usage.
  • Policy Driven: We have different security requirements for ingress and egress focused networks. Ingress networks provide access to computing resources, and so must focus on protecting data. Egress network protect users by regulating access to resources outside the organization. These different policies must be supported whatever the network looks like, and policies must be able to keep up with continuous deployment. Which means that…
  • Automation Wins: Things change instantly in the cloud. There is no time to wait for a human to make changes, so your environment must be automated. But you need to trust your automation, with the ability to roll back changes when necessary. That means you need to program your networks, just like you program everything else. Yes, this means software-defined networks. We will dig in later in this series.

Design goals are great, but what will they mean in practice? Particularly in terms of what you need and how you build it? We will explore requirements and solution architectures in our next post.

—Mike Rothman

Tuesday, January 03, 2017

Tidal Forces: The Trends Tearing Apart Security As We Know It

By Rich

Imagine a black hole suddenly appearing in the solar system – gravity instantly warping space and time in our celestial neighborhood, inexorably drawing in all matter. Closer objects are affected more strongly, with the closest whipping past the event horizon and disappearing from the observable universe. Farther objects are pulled in more slowly, but still inescapably. As they come closer to the disturbance, the gravitational field warping space exponentially, closer points are pulled away from trailing edges, potentially ripping entire planets apart.

These are tidal forces. The same force that creates tides and waves in our ocean, as the moon pulls more strongly on closer water, and less on seas on the far side of the planet.

Black holes are a useful metaphor for disruptive innovations. Once one appears it affects everything around it, and nothing looks the same at the end. And like a black hole’s gravity, business/technical tidal forces rip apart our conceptions, markets, and practices – slowly at first, accelerating as we approach an event horizon, beyond which the future is unclear.

I have talked a lot about disruptive innovation over the past nine years, since starting Securosis. In blog posts, on stage at RSA (with Chris Hoff), and in countless other venues. All my research continues to convince me we are deep into a series of shifts, which are shredding existing security practices and markets, at a much deeper and more fundamental level than we have seen before. This is largely because now is the the first time we have had a profession and markets large enough for these forces to act on in a meaningful way. If a market falls down in the woods, and there aren’t any billion-dollar companies to smash on the head, nobody pays attention. Now our magnitude and inertia magnify these disruptions.

Sticking with my metaphor, I like to think of these disruptive forces as three black holes influencing all information technology. Security is only one of the many areas impacted, but it is the only one I am really qualified to discuss. There are also a series of other emergent waves and interactions which complicate the model and could fill a book, but I’ll do my best to focus on the most impactful trends. As I lay these out, please keep in mind that I am not saying these eliminate security issues – but they definitely transform them.

  • Endpoints are different, often more secure, and frequently less open: The modern definition of an ‘endpoint’ is almost unrecognizably different than ten years ago. Laptop and desktop sales are stagnant, as phones put more power into your pocket than a high-end desktop had when this shift started. Mobile devices are incredibly secure compared to previous computing platforms (largely due to their closed systems), while modern general purpose computer operating systems are also far more hardened (and compromised less often) than in the past. Not perfect – but much better, with a higher exploitation cost, and continuously improving. Ask any enterprise security manager how Windows 7-10 infection rates look compared to XP, entirely aside from the almost complete lack of widespread malware on Apple’s iOS and macOS. But these devices are not only largely inaccessible to many security vendors (notably monitoring and anti-malware), but their tools don’t offer much value for preventing exploitation. Combined across consumer and enterprise markets, these trends have produced a major consumer shift to phones and tablets. In turn, this has slenderized the cash cow of consumer (and often enterprise) antivirus, with clear signs that evem on traditional computers, the mandatory security footprint will shrink in time. The ancillary effects on network security are also profound – we will address them in a moment. Even the biggest fly in the ointment, the massive security issues of IoT, are poor fits for ‘traditional’ tools and practices.
  • Software as a Service (SaaS) is the new back office: Email, file servers, CRM, ERP, and many other back-office applications are rapidly migrating from traditional on-premise infrastructure into cloud services. Entire fleets of servers, which we have dedicate massive budgets to securing, are being shut down and repurposed or decommissioned. Migrating these to a mature cloud service often reduces security risk and cost. On the other hand moving to less secure SaaS providers (most of the market) requires a compensatory shift in security operations, skills, and spending. This transition also supports the rise of zero trust networks, where enterprises no longer trust their local networks, instead requiring all connections to all services to be encrypted with TLS (increasingly immune to existing monitoring techniques) or VPN. Between this transition to the cloud and the growth in encrypted connections, we see dramatic impacts to perimeter security, monitoring, patching, incident response, and probably a dozen other security practices. Migrating to highly secure cloud services wipes out the need for large portions of existing security, and the corresponding increases are much smaller, producing an often substantial net gain. Worst case, you might still deploy your own software stack, but it will be in an IaaS cloud instead of a data center across the corporate campus.
  • Infrastructure as a Service (IaaS) is the new data center: Major cloud providers (a very short list of very large companies) offer infrastructure which, thanks to economic forces, is far more secure than most enterprise data centers. Amazon Web Services itself was about a $12B business in 2016, so clearly the migration to cloud computing is now more of a stampede. A shift merely from physical to virtual machines would still be important, with wide-ranging impact, but we are watching a deeper architectural transformation, driven by cloud providers’ software defined networks; combined with serverless, containers, and other emerging options. You cannot stick your existing IPS in front of a Lambda function, nor can you patch or configure an Elastic Load Balancer. Many foundational security practices, which we rely on to protect our custom applications, either aren’t needed or cannot be implemented using traditional tools or techniques.

All of this is available when build an organization from scratch today. Very secure endpoints, which are much less reliant on historic security tools, connecting predominantly to cloud services over encrypted links. Offices with networks which exist merely to provide Internet access – with nearly all applications, services, and servers hosted in the cloud. New applications leveraging architectures and capabilities which barely resemble those of yesterday, and certainly aren’t hosted in a data center you manage.

But facing these dramatic changes, we see a security market heavily reliant on existing revenue models, and a professional workforce which has spent decades building a particular set of skills, practices, and operational models which don’t always match emerging requirements. This is not just theory – I have talked with friends and contacts at major security vendors who cannot shift existing products and operations to best leverage the cloud, even when they want to. Shareholders refuse to support the required revenue model changes, while companies see massive internal friction – at precisely the same time they need to modify product development, operations, and sales compensation. When your entire revenue and sales compensation models are built on pushing boxes, transitioning to elastic software and services products and pricing isn’t hard.

On the security professional side I have trained hundreds of practitioners on cloud security, while working with dozens of organizations to secure cloud deployments. It can take years to fully update skills, and even longer to re-engineer enterprise operations, even without battling internal friction from large chunks of the workforce – who don’t believe these changes are happening, lack some of the required foundational skills (mostly coding), or simply lack time to learn new things while keeping the old things running.

I don’t claim to know exactly how all this will play out. I don’t claim to have all the answers, But I do know, without a doubt, that these tidal forces are inexorably drawing us forward at wildly uneven yet accelerating rates – which will rip apart existing security markets, practices, and operations. And the bigger you are, the further apart your leading and trailing edges, the more painful the stretching.

Over the next few weeks this series will focus on each of the forces, discussing the transformations and their impact in depth. I’m cheating a bit, using this blog as a way to pull my thoughts together for my upcoming RSA session on this topic. Even if we don’t know exactly what’s on the other side of the event horizon, we can still prepare by recognizing that change is happening, and looking for key opportunities to prepare for multiple potential outcomes.


Thursday, December 29, 2016

Dynamic Security Assessment: Process and Functions

By Mike Rothman

As we wind down the year it’s time to return to forward-looking research, specifically a concept we know will be more important in 2017. As described in the first post of our Dynamic Security Assessment series, there are clear limitations to current security testing mechanisms. But before we start talking about solutions we should lay out the requirements for our vision of dynamic security assessment.

  1. Ongoing: Infrastructure is dynamic, so point-in-time testing cannot be sufficient. That’s one of the key issues with traditional vulnerability testing: a point-in-time assessment can be obsolete before the report hits your inbox.
  2. Current: Every organization faces fast-moving and innovative adversaries, leveraging ever-changing attack tactics and techniques. So to provide relevant and actionable findings, a testing environment must be up-to-date and factor in new tactics.
  3. Non-disruptive: The old security testing adage of do no harm still holds. Assessment functions must take down systems or hamper operations in any way.
  4. Automated: No security organization (that we know of, at least) has enough people, so expecting them to constantly assess the environment isn’t realistic. To make sustained assessment feasible, it needs to be mostly automated.
  5. Evaluate Alternatives: When a potential attack is identified you need to validate and then remediate it. Don’t waste time shooting into the dark, so it’s important that you be able to see the impact of potential changes and workarounds to first figure out whether they would stop the attack, and then select the best option if you have several.

Dynamic Security Assessment Process

As usual we start our research by focusing on process rather than shiny widgets. The process is straightforward.

  1. Deployment: Your first step is to deploy assessment devices. You might refer to them as agents or sensors. But you will need a presence both inside and outside the network, to launch attacks and track results.
  2. Define Mission: After deployment you need to figure out what a typical attacker would want to access in your environment. This could be a formal threat modeling process, or you could start with asking the simple question, “What could be compromised that would cost the CEO/CFO/CIO/CISO his/her job?” Everything is important to the person responsible for it, but to find an adversary’s most likely target consider what would most drastically harm your business.
  3. Baseline/Triage: Next you need an initial sense of the vulnerability and exploitability of your environment, using a library of attacks to investigate its vulnerability. If you try, you can usually identify critical issues which immediately require all hands on deck. Once you get through the initial triage and remediation of potential attacks, you will have an initial activity baseline.
  4. Ongoing Assessment: Then you can start assessing your environment on an ongoing basis. An automated feed of new attack tactics and targets is useful for ensuring you look for the latest attacks seen in the wild. When an assessment engine finds something, administrators are alerted to successful attack paths and/or patterns for validation, and then criticality determination of a potential attack. This process needs to run continuously because things change in your environment from minute to minute.
  5. Fix: This step tends to be performed by Operations, and is somewhat opaque to the assessment process. But this is where critical issues are fixed and/or remediated.
  6. Verify Fixes: The final step is to validate that issues were actually fixed. The job is not complete until you verify that the fix is both operational and effective.

Yes, that all looks a lot like every other security assessment methodology you have seen. What needs to happen hasn’t really changed – you still need to figure out exposure, understand criticality, fix, and then make sure the fixes worked. What has changed is the technology used for assessment. This is where the industry has made significant strides to improve both accuracy and usefulness.

Assessment Engine

The centerpiece of DSA is what we call an assessment engine. It’s how you understand what is possible in an environment, to define the universe of possible attacks, and then figure out which would be most damaging. This effectively reduces the detection window, because without it you don’t know if an attack has been used on you; it also helps you prioritize remediation efforts, by focusing on what would work against your defenses.

You feed your assessment engine the topology of your network, because attackers need to first gain a foothold in your network, and then move laterally to achieve their mission. Once your engine has a map of your network, existing security controls are factored in so the engine can determine which devices are vulnerable to which attacks. For instance you’ll want to define access control points (firewalls) and threat detection (intrusion prevention) points in the network, and what kinds of controls run on which endpoints. Attacks almost always involve both networks and endpoints, so your assessment engine must be able to simulate both.

Then the assessment engine can start figuring out what can be attacked and how. The best practices of attackers are distilled into algorithms to simulate how an attack could hit across multiple networks and devices. To illuminate the concept a bit, consider the attack lifecycle/kill chain. The engine simulates reconnaissance from both inside and outside your network to determine what is visible and where to move next in search of its target.

It is important to establish presence, and to gather data from both inside and outside your network, because attackers will be working to do the same. Sometimes they get lucky and are invited in by unsuspecting employees, but other times they look for weaknesses in perimeter defenses and applications. Everything is fair game and thus should be subject to DSA.

Then the simulation should deliver the attack to see what would compromise that device. With an idea of which controls are active on the device, you can determine which attacks might work. Using data from reconnaissance, an attack path from entry point to target can be generated. These paths represent lateral movement within the environment, and the magic of the dynamic assessment is in figuring out how an attacker would move – without causing repercussions yourself.

Finally you will want to assess the ability of an attacker to exfiltrate data, so the assessment system will try to get the payload past egress filters.

It is not possible to fully mimic a human attacker presented with specific and changing defenses. That’s what red teams and penetration testers are for. But you cannot run constant penetration tests on everything, so dynamic security assessment helps you identify areas of concern; then you can have a human check and determine the most appropriate workaround.

But this isn’t an either/or proposition. The correct answer is both. DSA algorithms provide a probabilistic view of your attack surface, and help you understand likely paths for attackers to access your targets and exfiltrate data. In software testing terms, DSA increases code coverage of application testing. Humans cannot consider every attack, try every path, and attack every device – but a DSA system can provide better coverage.

Threat Intelligence

If we refer back to our requirements, the simulation/analytics engine takes care of most of what you need done. It provides ongoing, non-disruptive, automated assessment of your entire environment. The only thing missing is keeping the tool current, which is where threat intelligence (TI) comes into play.

Integration of new attacks into the assessment engine allows it to consider new tactics and targets. If you face a sophisticated adversary, you have some idea of what they will throw at you, based on what other organizations report. So you can feed your assessment engine new methods to analyze. If a new attack would succeed, you’ll know about it – ideally before it succeeds in your environment.

Automation is critical to a sustainable and useful assessment function. You don’t have time to manually keep the tool updated and run new tests. You have more leeway with assessment, where a faulty update won’t disrupt the environment. You might get some annoying false positives, but you won’t lose half your network, as you could if an active endpoint or network security control update goes awry.


Finally, once you have an attack that could succeed, you’ll want to dig into specifics. The modern way of doing that is through visualization. You should be able to see an attacker’s path, and which devices could be compromised. Drilling down into specific devices, and possible attacks highlighted by the assessment engine, can help you identify faulty controls and weak configurations.

Visualization is key to weighing alternative fixes, and figuring out which would be most efficient. Assessing how different controls would affect a simulated attack can help you quickly identify your best remediation option.

If dynamic security assessment sounds like what vulnerability management should have evolved into, you are right. Rather than looking at devices individually and providing summary data with dashboards showing how quickly you are fixing vulnerabilities, a DSA engine puts vulnerabilities into context. It’s not just about what can be attacked, but how the attack would fit into a larger campaign to access a target and steal information.

We will wrap up this series by applying these techniques in a realistic attack scenario. Defining requirements and discussing tech is fun, but the concepts resonate much better in a specific situation you might see – or, more likely, have seen already.

—Mike Rothman

Wednesday, December 21, 2016

Incite 12/21/2016: To Incite

By Mike Rothman

In the process of wrapping up the year I realize the last Incite I wrote was in August. Damn. That’s a long respite. It’s in my todo list every Tuesday. And evidently I have dutifully rescheduled it for about 3 months now. I am one to analyze (and probably overanalyze) everything, so I need to figure out why I have resisted writing the Incite.

I guess it makes sense to go back to 2007, when I started writing the Incite. My motivation was to build my first independent research business (Security Incite), and back then a newsletter was the way to do it. I was pretty diligent about writing almost every day – providing inflammatory commentary on security news, poking the bear whenever I could, and making a name for myself.

I think that was modestly successful, and it really reflected who I was back then. Angry, blunt, cynical, and edgy. So the Incite persona fit and I communicated that through my blog, speaking gigs, and strategy work for years. During that initial period I also started adding some personal stories and funny anecdotes to lighten it up a bit. Mostly because I was getting bored – it’s not like security news is the most exciting thing to work on every day. But the feedback on my personal stories was great, so I kept doing it.

So basically the Incite turned into my playground, where I could share pretty much anything going on with me. And I did. The good, bad, and ugliness of life. As I went through a period of turbulence and personal evolution (midlife transformation), I used the Incite as my journal. Only I know a lot of the underlying machinations that drove many of those posts, but the Incite allowed me to document my journey. For me.

I got through the proverbial tunnel back in July of 2015. Obviously I’m still learning and growing (mostly by screwing things up), but I didn’t feel compelled to continue documenting my journey. I did learn a lot through the process, so I wanted to share my experiences and associated philosophies, since that was how I coped with my personal turmoil. I also hoped that my writing would help other folks in similar situations. But I don’t seem to have a lot of ground left to cover, and since I’ve moved forward in my personal life, I don’t want to keep digging into the past.

Where does that leave me now? The reality is that the Incite persona no longer fits. I’ve been alluding to that for a while, and on reflection, it’s left me a little untethered and resistant to writing. My resistance comes from having to maintain a persona I no longer want. Grumpy Mike is an act. And I no longer want to play that role. When people you just meet tell you, “you’re not so mean,” it’s time to rehabilitate your image. But the Incite perpetuates that perception.

When looking at a situation without an easy answer, my teacher Casey always counseled me to flip the perspective. Look at it from a different viewpoint and see if a solution appears. Since I seem to be triggered by the word ‘Incite’, let’s dig into that.

To Incite

It’s clear the idea of encouraging “violent or unlawful behavior” is the problem. But if I look at the synonyms, I see words that do reflect what I’m trying to do. Encourage, stimulate, excite, awaken, inspire, and trigger. I always wrote the Incite for me, but based on many many discussions and notes of support I’ve received, it has done many of these things for readers. And that makes me happy.

Everything changes. I’m living, breathing proof of that. And it’s time to move forward. So I’m going to retire the Incite newsletter. That writing is an important release for me and I still like to share anecdotes, so I’ll continue doing that in some way, shape, or form. And I’m going to get better about doing 3-5 quick security news analyses each week as well, since we are kind of a security research firm.

But it won’t stop there. I will be launching some new services early next year to develop the next generation of security leaders, so I’ll be integrating weekly video interviews and other personal development content into the mix as well.

I know 2016 was hard for many people. From my perspective there were certainly surprises. Overall it was a good year for me and my family. I have a lot to celebrate and be thankful for. So I’ll spend my holiday season catching up on projects that dragged out (meaning I’ll be active on the blog) and pinching myself, just to make sure this is all real.


—Mike Rothman

Monday, December 19, 2016

The NINTH Annual Disaster Recovery Breakfast: the More Things Change…

By Mike Rothman

DRB 2017 -- See you there

Big 9. Lucky 9. Or maybe not so lucky 9, because by the time you reach our annual respite from the wackiness of the RSA Conference, you may not be feeling very lucky. But if you flip your perspective, you’ll be in the home stretch, with only one more day of the conference before you can get the hell out of SF.

We are happy to announce this year’s RSA Conference Disaster Recovery Breakfast. It’s hard to believe this is our ninth annual event. Everything seems to be in a state of flux and disruption. It’s a bit unsettling. But we’re happy to help you anchor at least for a few hours to grab some grub, drinks, and bacon.

We remain grateful that so many of our friends, clients, and colleagues enjoy a couple hours away from the monstrosity that is now the RSAC. By Thursday we’re all disasters, so it’s very nice to have a place to kick back, have some conversations at a normal decibel level, and grab a nice breakfast. Or don’t talk to anyone at all and embrace your introvert – we get that too.

With the continued support of Kulesa Faul, CHEN PR, and LaunchTech, you’ll have a great opportunity to say hello and thank them for helping support your habits. We are also very happy to welcome the CyberEdge Group as a partner. They are old friends, and we are ecstatic to have them participate.

As always the breakfast will be Thursday morning (February 16) from 8-11 at Jillian’s in the Metreon. It’s an open door – come and leave as you want. We will have food, beverages, and assorted non-prescription recovery items to ease your day. Yes, the bar will be open – Mike gets the DTs if he doesn’t have his rise and shine Guinness.

Please remember what the DR Breakfast is all about. No marketing, no spin, no t-shirts, and no flashing sunglasses – it’s just a quiet place to relax and have muddled conversations with folks you know, or maybe even go out on a limb and meet someone new. We are confident you will enjoy the DRB as much as we do.

See you there.

To help us estimate numbers, please RSVP to rsvp (at) securosis (dot) com.

—Mike Rothman

Thursday, December 08, 2016

Amazon re:Invent Takeaways? Hang on to Your A**es…

By Rich

I realized I promised to start writing more again to finish off the year and then promptly disappeared for over a week. Not to worry, it was for a good cause, since I spent all of last week at Amazon’s re:Invent conference. And, umm, might have been distracted this week by the release of the Rogue One expansion pack for Star Wars Battlefront. But enough about me…

Here are my initial thoughts about re:Invent and Amazon’s direction. It may seem like I am biased towards Amazon Web Services, for two reasons. First, they still have a market lead in terms of both adoption and available services. That isn’t to say other providers aren’t competitive, especially in particular areas, but Amazon has maintained a strong lead across the board. This is especially true of security features and critical security capabilities. Second, most of my client work is still on AWS, so I need to pay more attention to it – selection bias. Although Azure and Google are slowly creeping in.

With that out of the way, here’s my analysis of the event’s announcements:

  • The biggest security news wasn’t security products. With security we tend to get a bit myopic, and focus on security products and features, but the real impact on our practices nearly always comes from broader changes to IT adoption patterns and technologies. Last week Amazon laid out the future of computing and there is plenty of evidence that Microsoft and Google are well along the same path, if not ahead:
    • The future is serverless: When you use a cloud load balancer, you don’t run an instance or a virtual machine – you just request a load balancer. Sure, somewhere it’s running on hardware and an operating system, but all that is hidden from you, and the cloud provider takes responsibility for managing nearly all the security. That’s great for things like load balancers, message queues, and even the occasional database, but what about your custom code? That’s where AWS Lambda comes, in and Amazon has tripled down. Lambda lets you load code into the cloud, which AWS runs on demand (in a Linux container). You just write your code and don’t worry about the rest. AWS announced enhancements to Lambda, but the big product piece is Step Functions that allow you to tie together application components with a state machine (I’m simplifying). The net result? More, bigger, serverless applications, and a gap which kept Lambda out of complex projects has been closed. Security take? Serverless blows apart nearly all our existing security models. I’m not kidding – it’s insanely disruptive. This post is already going to be too long, so I’ll start a series on this soon.
    • The future is serverless AI: Amazon released a quad of artificial intelligence tools. Image recognition, conversational interfaces (like Alexa, Google Now, and Siri), text to speech, and accessible machine learning (a set of features that doesn’t require you to program machine learning from scratch). Go read the descriptions and watch the demos – these are really interesting and powerful capabilities. Security take? Prepare for more data to flow into the cloud… and stay there. You simply can’t compete with these capabilities on-premise. On the upside, we can also harness these to improve security analysis and operations.
    • The future is distributed and ever-present: Those Lambda functions? Amazon announced they are now accessible on edge routers (sorry Akamai), in big-storage Snowball appliances (a smart NAS you can drop anywhere that will process locally and communicate with the cloud, or you just ship it all to Amazon for data storage), and in IoT devices on the friggin’ silicon. All feeding back into the cloud. Amazon is extending its processing engine to basically everywhere (IoT FTW). Security take? This is enterprise-targeted IoT, combined with distributed mesh computing. Hang on to your hats.
  • Security is still core to AWS, but their focus is on reducing friction. None of what I described above can work without a bombproof security baseline. This was the first re:Invent I’ve been to where there were no security announcements in the Day 1 keynote. They announced DDoS on Day 2 and a bunch of enhancements during the State of Security track lead-off presentation. It seemed almost understated until you went to the various sessions and saw the bigger picture. When AWS builds security products like KMS or Inspector it’s mostly to reduce the friction of security and compliance when customers want to move to AWS. They step in when they see existing products failing or slowing down AWS adoption, for core features they need themselves, and when they think an improvement will bring more clients. Don’t assume a low level of announcements means a low level of commitment or capabilities – it’s just that security is becoming more of the fabric. For example Lambda gives you basically a super-hardened server to run arbitrary code – that’s much more important than…
  • Multiple account management. Finally. It’s easy for me to recommend using 2-5 accounts per project, but managing accounts at enterprise scale on AWS is a major pain in the ass. Organizations is the first step into enabling master and sub accounts. It’s in preview, and although I applied I’m not in yet so I don’t have a lot of details. But this helps resolve the single biggest pain point for most of my cloud-native customers.
  • Anti-DDoS. Finally. You can’t use BGP based anti-DDoS with AWS which has limited everyone to cloud-based web services. I’m a huge fan, but they don’t work well with all AWS services – especially when you use the CDN. Now everyone gets basic anti-DDoS for free and advanced anti-DDoS (humans watching and troubleshooting) is pretty darn cost effective. Sorry Akamai (and Cloudflare and Incapsula). Actually, Amazon’s WAF capabilities are still limited enough that DDoS + cloud WAF vendors should be okay… for a while.
  • Systems Manager adds automated image creation, patch, and configuration management. EC2 Systems Manager is a collection of tools to knock down those problems. But it’s definitely rough around the edges, and looks like it will work best if you manage it programatically. It has the potential to really disrupt patch and configuration management tools, and to combine with Inspector to also hit security vulnerability assessment products below the belt.
  • Improved compliance reporting. Remember when Hoff started up the CloudAudit project for automated reporting of cloud provider compliance? It isn’t standards-based but AWS Artifact revives the concept, and will make life easier for everyone who needs to work with auditors and Amazon deployments.
  • IPv6 Suppport. Fortunately it’s optional and on-demand.

This really only scratches the surface. I skipped over VMWare end-of-lifing their on-premise virtualization (seriously, hard to see this any other way), a ton of database announcements (including serverless SQL), and most of what’s on this list.

One big point is that in the cloud, everything is software defined. Many of the services I just described work best if you manage them programmatically via APIs. The web console will only get you so far, and doesn’t work well once you start dealing with multiple accounts. Software Defined Security and DevSecOps are really the only ways to keep up with the cloud – especially Amazon.

Overall I think I captured the big security points:

  • The future is serverless, and this breaks a lot of how we approach things.
  • Cloud security is Software Defined Security.
  • AWS focuses on reducing friction to cloud adoption, and security is often the friction. Vendors in the way will get gutted without a second thought.


Wednesday, November 16, 2016

Cloud Security Automation: Code vs. CloudFormation or Terraform Templates

By Rich

Right now I’m working on updating many of my little command line tools into releasable versions. It’s a mixed bag of things I’ve written for demos, training classes, clients, or Trinity (our mothballed product). A few of these are security automation tools I’m working on for clients to give them a skeleton framework to build out their own automation programs. Basically, what we created Trinity for, that isn’t releasable.

One question that comes up a lot when I’m handing this off is why write custom Ruby/Python/whatever code instead of using CloudFormation or Terraform scripts. If you are responsible for cloud automation at all this is a super important question to ask yourself.

The correct answer is there isn’t one single answer. It depends as much on your experience and preferences as anything else. Each option can handle much of the job, at least for configuration settings and implementing a known-good state. Here are my personal thoughts from the security pro perspective.

CloudFormation and Terraform are extremely good for creating known good states and immutable infrastructure and, in some cases, updating and restoring to those states. I use CloudFormation a lot and am starting to also leverage Terraform more (because it is cross-cloud capable). They both do a great job of handling a lot of the heavy lifting and configuring pieces in the proper order (managing dependencies) which can be tough if you script programmatically. Both have a few limits:

  • They don’t always support all the cloud provider features you need, which forces you to bounce outside of them.
  • They can be difficult to write and manage at scale, which is why many organizations that make heavy use of them use other languages to actually create the scripts. This makes it easier to update specific pieces without editing the entire file and introducing typos or other errors.
  • They can push updates to stacks, but if you made any manual changes I’ve found these frequently break. Thus they are better for locked-down production environments that are totally immutable and not for dev/test or manually altered setups.
  • They aren’t meant for other kinds of automation, like assessing or modifying in-use resources. For example, you can’t use them for incident response or to check specific security controls.

I’m not trying to be negative here – they are awesome awesome tools, which are totally essential to cloud and DevOps. But there are times you want to attack the problem in a different way.

Let me give you a specific use case. I’m currently writing a “new account provisioning” tool for a client. Basically, when a team at the client starts up a new Amazon account, this shovels in all the required security controls. IAM, monitoring, etc. Nearly all of it could be done with CloudFormation or Terraform but I’m instead writing it as a Ruby app. Here’s why:

  • I’m using Ruby to abstract complexity from the security team and make security easy. For example, to create new Identity and Access Management policies, users, and roles, the team can point the tool towards a library of files and the tool iterates through and builds them in the right order. The security team only needs to focus on that library of policies and not the other code to build things out. This, for them, will be easier than adding it to a large provisioning template. I could take that same library and actually build a CloudFormation template dynamically the same way, but…
  • … I can also use the same code base to fix existing accounts or (eventually) assess and modify an account that’s been changed in the future. For example, I can (and will) be able to asses an account, and if the policies don’t match, enable the user to repair it with flexibility and precision. Again, this can be done without the security pro needing to understand a lot of the underlying complexity.

Those are the two key reasons I sometimes drop from templates to code. I can make things simpler and also use the same ‘base’ for more complex scenarios that the infrastructure as code tools aren’t meant to address, such as ‘fixing’ existing setups and allowing more granular decisions on what to configure or overwrite. Plus, I’m not limited to waiting for the templates to support new cloud provider features; I can add capabilities any time there is an API, and with modern cloud providers, it there’s a feature it has an API.

In practice you can mix and match these approaches. I have my biases, and maybe some of it is just that I like to learn the APIs and features directly. I do find that having all these code pieces gives me a lot more options for various use cases, including using them to actually generate the templates when I need them and they might be the better choice. For example, one of the features of my framework is installing a library of approved CloudFormation templates into a new account to create pre-approved architecture stacks for common needs.

It all plays together. Pick what makes sense for you, and hopefully this will give you a bit of insight into how I make the decision.


Monday, November 14, 2016

Cloud Database Security: 2011 vs. Today

By Adrian Lane

Adrian here.

I had a brief conversation today about security for cloud database deployments, and their two basic questions encapsulated many conversations I have had over the last few months. It is relevant to a wider audience, so I will discuss them here.

The first question I was asked was, “Do you think that database security is fundamentally different in the cloud than on-premise?”

Yes, I do. It’s not the same. Not that we no longer need IAM, assessment, monitoring, or logging tools, but the way we employ them changes. And there will be more focus on things we have not worried about before – like the management plane – and far less on things like archival and physical security. But it’s very hard to compare apples to apples here, because of fundamental changes in the way cloud works. You need to shift your approach when securing databases run on cloud services.

The second question was, “Then how are things different today from 2011 when you wrote about cloud database security?”

Database security has changed in three basic ways:

1) Architecture: We no longer leverage the same application and database architectures. It is partially about applications adopting microservices, which both promotes micro-segmentation at the network and application layer, and also breaks the traditional approach of closely tying the application to a database. Architecture has also developed in response to evolving database services. We see need for more types of data, with far more dynamic lookup and analysis than transaction support. Together these architectural changes lead to more segmented deployment, with more granular control over access to data and database services.

2) Big Data: In 2011 I expected people to push their Oracle, MS SQL Server, and PostgreSQL installations into the cloud, to reduce costs and scale better. That did not happen. Instead firms prefer to start new projects in the cloud rather than moving existing projects. Additionally we see strong adoption of big data platforms such as Hadoop and Dynamo. These are different platforms with slightly different security issues and security tools than the relational platforms which dominated the previous two decades. And in an ecosystem like Hadoop applications running on the same data lake may be exposed to entirely different service layers.

3) Database as a Service: At Securosis we were a bit surprised by how quickly the cloud vendors embraced big data. Now they offer big data (along with other relational database platforms) as a service. “Roll your own” has become much less necessary. Basic security around internal table structures, patching, administrative access, and many other facets is now handled by vendors to reduce your headaches. We can avoid installation issues. Licensing is far, far easier. It has become so easy to stand up a new relational database or big data cluster this way running databases on Infrastructure as a Service now seems antiquated.

I have not gone back through everything I wrote in 2011, but there are probably many more subtle differences. But the question itself overlook another important difference: Security is now embedded in cloud services. None of us here at Securosis anticipated how fast cloud platform vendors would introduce new and improved security features. They have advanced their security offerings much faster than any other platform or service offering I’ve ever seen, and done a much better job with quality and ease of use than anyone expected. There are good reasons for this. In most cases the vendors were starting from a clean slate, unencumbered by legacy demands. Additionally, they knew security concerns were an impediment to enterprise adoption. To remove their primary customer objections, they needed to show that security was at least as good as on-premise.

In conclusion, if you are moving new or existing databases to the cloud, understand that you will be changing tools and process, and adjusting your biggest priorities.

—Adrian Lane

Thursday, November 10, 2016

Dynamic Security Assessment: The Limitations of Security Testing [New Series]

By Mike Rothman

We have been fans of testing the security of infrastructure and applications as long as we can remember doing research. We have always known attackers are testing your environment all the time, so if you aren’t also self-assessing, inevitably you will be surprised by a successful attack. And like most security folks, we are no fans of surprises.

Security testing and assessment has gone through a number of iterations. It started with simple vulnerability scanning. You could scan a device to understand its security posture, which patches were installed, and what remained vulnerable on the device. Vulnerability scanning remains a function at most organizations, driven mostly by a compliance requirement.

As useful as it was to understand which devices and applications were vulnerable, a simple scan provides limited information. A vulnerability scanner cannot recognize that a vulnerable device is not exploitable due to other controls. So penetration testing emerged as a discipline to go beyond simple context-less vulnerability scanning, with humans trying to steal data.

Pen tests are useful because they provide a sense of what is really at risk. But a penetration test is resource-intensive and expensive, especially if you use an external testing firm. To address that, we got automated pen testing tools, which use actual exploits in a semi-automatic fashion to simulate an attacker.

Regardless of whether you use carbon-based (human) or silicon-based (computer) penetration testing, the results describe your environment at a single point in time. As soon as you blink, your environment will have changed, and your findings may no longer be valid.

With the easy availability of penetration testing tools (notably the open source Metasploit), defending against a pen testing tool has emerged as the low bar of security. Our friend Josh Corman coined HDMoore’s Law, after the leader of the Metasploit project. Basically, if you cannot stop a primitive attacker using Metasploit (or another pen testing tool), you aren’t very good at security.

The low bar isn’t high enough

As we lead enterprises through developing security programs, we typically start with adversary analysis. It is important to understand what kinds of attackers will be targeting your organization and what they will be looking for. If you think your main threat is a 400-pound hacker in their parents’ basement, defending against an open source pen testing tool is probably sufficient.

But do any of you honestly believe an unsophisticated attacker wielding a free penetration testing tool is all you have to worry about? Of course not. The key thing to understand about adversaries is simple: They don’t play by your rules. They will attack when you don’t expect it. They will take advantage of new attacks and exploits to evade detection. They will use tactics that look like a different adversary to raise a false flag.

The adversary will do whatever it takes to achieve their mission. They can usually be patient, and will wait for you to screw something up. So the low bar of security represented by a pen testing tool is not good enough.

Dynamic IT

The increasing sophistication of adversaries is not your only challenge assessing your environment and understanding risk. Technology infrastructure seems to be undergoing the most significant set of changes we have ever seen, and this is dramatically complicating your ability to assess your environment.

First, you have no idea where your data actually resides. Between SaaS applications, cloud storage services, and integrated business partner networks, the boundaries of traditional technology infrastructure have been extended unrecognizably, and you cannot assume your information is on a network you control. And if you don’t control the network it becomes much harder to test.

The next major change underway is mobility. Between an increasingly disconnected workforce and an explosion of smart devices accessing critical information, you can no longer assume your employees will access applications and data from your networks. Realizing that authorized users needing legitimate access to data can be anywhere in the world, at any time, complicates assessment strategies as well.

Finally, the push to public cloud-based infrastructure makes it unclear where your compute and storage are, as well. Many of the enterprises we work with are building cloud-native technology stacks using dozens of services across cloud providers. You don’t necessarily know where you will be attacked, either.

To recap, you no longer know where your data is, where it will be accessed from, or where your computation will happen. And you are chartered to protect information in this dynamic IT environment, which means you need to assess the security of your environment as often as practical. Do you start to see the challenge of security assessment today, and how much more complicated it will be tomorrow?

We Need Dynamic Security Assessment

As discussed above, a penetration test represents a point in time snapshot of your environment, and is obsolete when complete, because the environment continues to change. The only way to keep pace with our dynamic IT environment is dynamic security assessment. The rest of this series will lay out what we mean by this, and how to implement it within your environment.

As a little prelude to what you’ll learn, a dynamic security assessment tool includes:

  • A highly sophisticated simulation engine, which can imitate typical attack patterns from sophisticated adversaries without putting production infrastructure in danger.
  • An understanding of the network topology, to model possible lateral movement and isolate targeted information and assets.
  • A security research team to leverage both proprietary and public threat intelligence, and to model the latest and greatest attacks to avoid unpleasant surprises.
  • An effective security analytics function to figure out not just what is exploitable, but also how different workarounds and fixes will impact infrastructure security.

We would like to thank SafeBreach as the initial potential licensee of this content. As you may remember, we research using our Totally Transparent Research methodology, which requires foresight on the part of our licensees. It enables us to post our papers in our Research Library without paywalls, registration, or any other blockage to you reading (and hopefully enjoying) our research.

We will start describing Dynamic Security Assessment in our next post.

—Mike Rothman

Wednesday, November 09, 2016

Assembling a Container Security Program: Monitoring and Auditing

By Adrian Lane

Our last post in this series covers two key areas: Monitoring and Auditing. We have more to say, in the first case because most development and security teams are not aware of these options, and in the latter because most teams hold many misconceptions and considerable fear on the topic. So we will dig into these two areas essential to container security programs.


Every security control we have discussed so far had to do with preventative security. Essentially these are security efforts that remove vulnerabilities or make it hard from anyone to exploit them. We address known attack vectors with well-understood responses such as patching, secure configuration, and encryption. But vulnerability scans can only take you so far. What about issues you are not expecting? What if a new attack variant gets by your security controls, or a trusted employee makes a mistake? This is where monitoring comes in: it’s how you discover the unexpected stuff. Monitoring is critical to a security program – it’s how you learn what is effective, track what’s really happening in your environment, and detect what’s broken.

For container security it is no less important, but today it’s not something you get from Docker or any other container provider.

Monitoring tools work by first collecting events, and then examining them in relation to security policies. The events may be requests for hardware resources, IP-based communication, API requests to other services, or sharing information with other containers. Policy types are varied. We have deterministic policies, such as which users and groups can terminate resources, which containers are disallowed from making external HTTP requests, or what services a container is allowed to run. Or we may have dynamic – also called ‘behavioral’ – policies, which prevent issues such as containers calling undocumented ports, using 50% more memory resources than typical, or uncharacteristically exceeding runtime parameter thresholds. Combining deterministic white and black list policies with dynamic behavior detection provides the best of both worlds, enabling you to detect both simple policy violations and unexpected variations from the ordinary.

We strongly recommend that your security program include monitoring container activity. Today, a couple container security vendors offer monitoring products. Popular evaluation criteria for differentiating products and determining suitability include:

  • Deployment Model: How does the product collect events? What events and API calls can it collect for inspection? Typically these products use either of two models for deployment: an agent embedded in the host OS, or a fully privileged container-based monitor running in the Docker environment. How difficult is it to deploy collectors? Do the host-based agents require a host reboot to deploy or update? You will need to assess what type of events can be captured.
  • Policy Management: You will need to evaluate how easy it is to build new policies – or modify existing ones – within the tool. You will want to see a standard set of security policies from the vendor to help speed up deployment, but over the lifetime of the product you will stand up and manage your own policies, so ease of management is key to your-long term happiness.
  • Behavioral Analysis: What, if any, behavioral analysis capabilities are available? How flexible are they, meaning what types of data can be used in policy decisions? Behavioral analysis requires starting with system monitoring to determine ‘normal’ behavior. The criteria for detecting aberrations are often limited to a few sets of indicators, such as user ID or IP address. The more you have available – such as system calls, network ports, resource usage, image ID, and inbound and outbound connectivity – the more flexible your controls can be.
  • Activity Blocking: Does the vendor provide the capability to block requests or activity? It is useful to block policy violations in order to ensure containers behave as intended. Care is required, as these policies can disrupt new functionality, causing friction between Development and Security, but blocking is invaluable for maintaining Security’s control over what containers can do.
  • Platform Support: You will need to verify your monitoring tool supports the OS platforms you use (CentOS, CoreOS, SUSE, Red Hat, etc.) and the orchestration tool (such as Swarm, Kubernetes, Mesos, or ECS) of your choice.

Audit and Compliance

What happened with the last build? Did we remove sshd from that container? Did we add the new security tests to Jenkins? Is the latest build in the repository?

Many of you reading this may not know the answer off the top of your head, but you should know where to get it: log files. Git, Jenkins, JFrog, Docker, and just about every development tool you use creates log files, which we use to figure out what happened – and often what went wrong. There are people outside Development – namely Security and Compliance – who have similar security-related questions about what is going on with the container environment, and whether security controls are functioning. Logs are how you get these external teams the answers they need.

Most of the earlier topics in this research, such as build environment and runtime security, have associated compliance requirements. These may be externally mandated like PCI-DSS or GLBA, or internal security requirements from internal audit or security teams. Either way the auditors will want to see that security controls are in place and working. And no, they won’t just take your word for it – they will want audit reports for specific event types relevant to their audit. Similarly, if your company has a Security Operations Center, in order to investigate alerts or determine whether a breach has occurred, they will want to see all system and activity logs over a period of time to in order reconstruct events. You really don’t want to get too deep into this stuff – just get them the data and let them worry about the details.

The good news is that most of what you need is already in place. During our investigation for this series we did not speak with any firms which did not have Splunk, log storage, or SIEM on-premise, and in many cases all three were available. Additionally the vast majority of code repositories, build controllers, and container management systems – specifically the Docker runtime and Docker Trusted Registry – produce event logs, in formats which can be consumed by various log management and Security Information and Event Management (SIEM) systems. As do most third-party security tools for image validation and monitoring. You will need to determine how easy this is to leverage. Some tools simply dump syslog-format information into a directory, and it’s up to you to drop this into Splunk, an S3 bucket, Loggly, or your SIEM tool. In other cases – most, actually – you can specify CEF, JSON, or some other format, and the tools can automatically link to the SIEM of your choice, sending events as they occur.

This concludes our research on Building a Container Security Program. We covered a ton of different aspects – both production and non-production. We tried to offer sufficient depth to be helpful, without overwhelming you with details. If we missed something you feel is important, or you have unanswered questions, please drop us a note. We will address it in the comments below, or in the final paper, as appropriate. Your feedback that helps make these series and papers better, so please help us and other readers out.

—Adrian Lane

Tuesday, November 08, 2016

Assembling a Container Security Program: Runtime Security

By Adrian Lane

This post will focus on the ‘runtime’ aspects of container security. Unlike the tools and processes discussed in previous sections, here we will focus on containers in production systems. This includes which images are moved into production repositories, security around selecting and running containers, and the security of the underlying host systems.

Runtime Security

  • The Control Plane: Our first order of business is ensuring the security of the control plane – the platforms for managing host operating systems, the scheduler, the container engine(s), the repository, and any additional deployment tools. Again, as we advised for build environment security, we recommend limiting access to specific administrative accounts: one with responsibility for operating and orchestrating containers, and another for system administration (including patching and configuration management). We recommend network segregation and physical (for on-premise) or logical segregation (for cloud and virtual) systems.
  • Running the Right Container: We recommend establishing a trusted image repository and ensuring that your production environment can only pull containers from that trusted source. Ad hoc container management is a good way to facilitate bypassing of security controls, so we recommend scripting the process to avoid manual intervention and ensure that the latest certified container is always selected. Second, you will want to check application signatures prior to putting containers into the repository. Trusted repository and registry services can help, by rejecting containers which are not properly signed. Fortunately many options are available, so find one you like. Keep in mind that if you build many containers each day, a manual process will quickly break down. You’ll need to automate the work and enforce security policies in your scripts. Remember, it is okay to have more than one image repository – if you are running across multiple cloud environments, there are advantages to leveraging the native registry in each. Beware the discrepancies between platforms, which can create security gaps.
  • Container Validation and BOM: What’s in the container? What code is running in your production environment? How long ago did we build this container image? These are common questions asked when something goes awry. In case of container compromise, a very practical question is: how many containers are currently running this software bundle? One recommendation – especially for teams which don’t perform much code validation during the build process – is to leverage scanning tools to check pre-built containers for common vulnerabilities, malware, root account usage, bad libraries, and so on. If you keep containers around for weeks or months, it is entirely possible a new vulnerability has since been discovered, and the container is now suspect. Second, we recommend using the Bill of Materials capabilities available in some scanning tools to catalog container contents. This helps you identify other potentially vulnerable containers, and scope remediation efforts.
  • Input Validation: At startup containers accept parameters, configuration files, credentials, JSON, and scripts. In some more aggressive scenarios, ‘agile’ teams shove new code segments into a container as input variables, making existing containers behave in fun new ways. Either through manual review, or leveraging a third-party security tool, you should review container inputs to ensure they meet policy. This can help you prevent someone from forcing a container to misbehave, or simply prevent developers from making dumb mistakes.
  • Container Group Segmentation: Docker does not provide container-level restriction on which containers can communicate with other containers, systems, hosts, IPs, etc. Basic network security is insufficient to prevent one container from attacking another, calling out to a Command and Control botnet, or other malicious behavior. If you are using a cloud services provider you can leverage their security zones and virtual network capabilities to segregate containers and specify what they are allowed to communicate with, over which ports. If you are working on-premise, we recommend you investigate products which enable you to define equivalent security restrictions. In this way each application has an analogue to a security group, which enables you to specify which inbound and outbound ports are accessible to and from which IPs, and can protect containers from unwanted access.
  • Blast Radius: An good option when running containers in cloud services, particularly IaaS clouds, is to run different containers under different cloud user accounts. This limits the resources available to any given container. If a given account or container set is compromised, the same cloud service restrictions which prevent tenants from interfering with each other limit possible damage between accounts and projects. For more information see our post on limiting blast radius with user accounts.

Platform Security

In Docker’s early years, when people talked about ‘container’ security, they were really talking about how to secure the Linux operating system underneath Docker. Security was more about the platform and traditional OS security measures. If an attacker gained control of the host OS, they could pretty much take control of anything they wanted in containers. The problem was that security of containers, their contents, and even the Docker engine were largely overlooked. This is one reason we focused our research on the things that make containers – and the tools that build them – secure.

That said, no discussion of container security can be complete without some mention of OS security. We would be remiss if we did not talk about host/OS/engine security, at least a bit. Here we will cover some of the basics. But we will not go into depth on securing the underlying OS. We could not do that justice within this research, there is already a huge amount of quality documentation available on the operating system of your choice, and there are much more knowledgable sources to address your concerns and questions on OS security.

  • Kernel Hardening: Docker security depends fundamentally on the underlying operating system to limit access between ‘users’ (containers) on the system. This resource isolation model is built atop a virtual map called Namespaces, which maps specific users or group of users to a subset of resources (e.g.: networks, files, IPC, etc.) within their Namespace. Containers should run under a specified user ID. Hardening starts with a secure kernel, strips out any unwanted services and features, and then configuring Namespaces to limit (segregate) resource access. It is essential to select an OS platform which supports Namespaces, to constrain which kernel resources the container can access and control user/group resource utilization. Don’t mix Docker and non-Docker services – the trust models don’t align correctly. You will want to script setup and configuration of your kernel deployments to ensure consistency. Periodically review your settings as operating system security capabilities evolve.
  • Docker Engine: Docker security has come a long way, and the Docker engine can now perform a lot of the “heavy lifting” for containers. Docker now has full support for Linux kernel features including Namespaces and Control Groups (cgroups) to isolate containers and container types. We recommend advanced isolation via Linux kernel features such as SELinux or AppArmor, on top of GRSEC compatible kernels. Docker exposes these Linux kernel capabilities at either the Docker daemon level or the container level, so you have some flexibility in resource allocation. But there is still work to do to properly configure your Docker deployment.
  • Container Isolation: We have discussed resource isolation at the kernel level, but you should also isolate Docker engine/OS groups – and their containers – at the network layer. For container isolation we recommend mapping groups of mutually trusted containers to separate machines and/or network security groups. For containers running critical services or management tools, consider running one container per VM/physical server for on-premise applications, or grouping them into into a dedicated cloud VPC to limit attack surface and minimize an attacker’s ability to pivot, should a service or container be compromised.
  • Cloud Container Services: Several cloud services providers offer to tackle platform security issues on your behalf, typically abstracting away some lower-level implementation layers by offering Containers as a Service. By delegating underlying platform-level security challenges to your cloud provider, you can focus on application-layer issues and realize the benefits of containers, without worrying about platform security or scalability.

Platform security for containers is a huge field, and we have only scratched the surface. If you want to learn more the OS platform providers, Docker, and many third-party security providers offer best practice guidance, research papers, and blogs which discuss this in greater detail.

Note that the majority of security controls in the post are preventative – efforts to prevent what we expect an attacker to attempt. We set a secure baseline to make it difficult for attackers to compromise containers – and if they do, to limit the damage they can cause. In our next and final post in this series we will discuss monitoring, logging, and auditing events in a container system. We will focus on examining what is really going on, and discovering what we don’t know in terms of security.

—Adrian Lane

Monday, November 07, 2016

More on Bastion Accounts and Blast Radius

By Rich

I have received some great feedback on my post last week on bastion accounts and networks. Mostly that I left some gaps in my explanation which legitimately confused people. Plus, I forgot to include any pretty pictures. Let’s work through things a bit more.

First, I tended to mix up bastion accounts and networks, often saying “account/networks”. This was a feeble attempt to discuss something I mostly implement in Amazon Web Services that can also apply to other providers. In Amazon an account is basically an AWS subscription. You sign up for an account, and you get access to everything in AWS. If you sign up for a second account, all that is fully segregated from every other customer in Amazon. Right now (and I think this will change in a matter of weeks) Amazon has no concept of master and sub accounts: each account is totally isolated unless you use some special cross-account features to connect parts of accounts together. For customers with multiple accounts AWS has a mechanism called consolidated billing that rolls up all your charges into a single account, but that account has no rights to affect other accounts. It pays the bills, but can’t set any rules or even see what’s going on.

It’s like having kids in college. You’re just a checkbook and an invisible texter.

If you, like Securosis, use multiple accounts, then they are totally segregated and isolated. It’s the same mechanism that prevents any random AWS customer from seeing anything in your account. This is very good segregation. There is no way for a security issue in one account to affect another, unless you deliberately open up connections between them. I love this as a security control: an account is like an isolated data center. If an attacker gets in, he or she can’t get at your other data centers. There is no cost to create a new account, and you only pay for the resources you use. So it makes a lot of sense to have different accounts for different applications and projects. Free (virtual) data centers for everyone!!!

This is especially important because of cloud metastructure. All the management stuff like web consoles and APIs that enables you to do things like create and destroy entire class B networks with a couple API calls. If you lump everything into a single account, more administrators (and other power users) need more access, and they all have more power to disrupt more projects. This is compartmentalization and segregation of duties 101, but we have never before had viable options for breaking everything into isolated data centers. And from an operational standpoint, the more you move into DevOps and PaaS, the harder it is to have everyone running in one account (or a few) without stepping on each other.

These are the fundamentals of my blast radius post.

One problem comes up when customers need a direct connection from their traditional data center to the cloud provider. I may be all rah rah cloud awesome, but practically speaking there are many reasons you might need to connect back home. Managing this for multiple accounts is hard, but more importantly you can run into hard limits due to routing and networking issues.

That’s where a bastion account and network comes in. You designate an account for your Direct Connect. Then you peer into that account (in AWS using cross-account VPC peering support) any other accounts that need data center access. I have been saying “bastion account/network” because in AWS this is a dedicated account with its own dedicated VPC (virtual network) for the connection. Azure and Google use different structures, so it might be a dedicated virtual network within a larger account, but still isolated to a subscription, or sub-account, or whatever mechanism they support to segregate projects. This means:

  • Not all your accounts need this access, so you can focus on the ones which do.
  • You can tightly lock down the network configuration and limit the number of administrators who can change it.
  • Those peering connections rely on routing tables, and you can better isolate what each peered account or network can access.
  • One big Direct Connect essentially “flattens” the connection into your cloud network. This means anyone in the data center can route into and attack your applications in the cloud. The bastion structure provides multiple opportunities to better restrict network access to destination accounts. It is a way to protect your cloud(s) from your data center.
  • A compromise in one peered account cannot affect another account. AWS networking does not allow two accounts peered to the same account to talk to each other. So each project is better isolated and protected, even without firewall rules.

For example the administrator of a project can have full control over their account and usage of AWS services, without compromising the integrity of the connection back to the data center, which they cannot affect – they only have access to the network paths they were provided. Their project is safe, even if another project in the same organization is totally compromised.

Hopefully this helps clear things up. Multiple accounts and peering is a powerful concept and security control. Bastion networks extend that capability to hybrid clouds. If my embed works, below you can see what it looks like (a VPC is a virtual network, and you can have multiple VPCs in a single account):