Last week I wrote up my near epic fail on Amazon Web Services where I ‘let’ someone launch a bunch of Litecoin mining instances in my account.
Since then I received some questions on my forensics process, so I figure this is a good time to write up the process in more detail. Specifically, how to take a snapshot and use it for forensic analysis.
I won’t cover all the steps at the AWS account layer – this post focuses on what you should do for a specific instance, not your entire management plane.
The first step, which I skipped, is to collect all the metadata associated with the instance. There is an easy way, a hard way (walk through the web UI and take notes manually), and the way I’m building into my nifty tool for all this that I will release at RSA (or sooner, if you know where to look).
The best way is to use the AWS command line tools for your operating system. Then run the command
aws ec2 describe-instances --instance-ids i-5203422c (inserting your instance ID). Note that you need to follow the instructions linked above to properly configure the tool and your credentials.
I suggest piping the output to a file (e.g.,
aws ec2 describe-instances --instance-ids i-5203422c > forensic-metadata.log) for later examination.
You should also get the console output, which is stored by AWS for a short period on boot/reboot/termination.
aws ec2 get-console-output --instance-id i-5203422c. This might include a bit more information if the attacker mucked with logs inside the instance, but won’t be useful for a hacked instance because it is only a boot log. This is a good reason to use a tool that collects instance logs outside AWS.
That is the basics of the metadata for an instance. Those two pieces collect the most important bits. The best option would be CloudTrail logs, but that is fodder for a future post.
Now on to the instance itself. While you might log into it and poke around, I focused on classical storage forensics. There are four steps:
- Take a snapshot of all storage volumes.
- Launch an instance to examine the volumes.
- Attach the volumes.
- Perform your investigation.
If you want to test any of this, feel free to use the snapshot of the hacked instance that was running in my account (well, one of 10 instances). The snapshot ID you will need is
Snapshot the storage volumes
I will show all this using the web interface, but you can also manage all of it using the command line or API (which is how I now do it, but that code wasn’t ready when I had my incident).
There is a slightly shorter way to do this in the web UI by going straight to volumes, but that way is easier to botch, so I will show the long way and you can figure out the shorter alternative yourself.
- Click Instances in your EC2 management console, then check the instance to examine.
- Look at the details on the bottom, click the Block Devices, then each entry. Pull the Volume ID for every attached volume.
- Switch to Volumes and then snapshot each volume you identified in the steps above.
- Label each snapshot so you remember it. I suggest date and time, “Forensics”, and perhaps the instance ID.
You can also add a name to your instance, then skip direct to Volumes and search for volumes attached to it.
Remember, once you take a snapshot, it is read-only – you can create as many copies as you like to work on without destroying the original. When you create a volume from an instance it doesn’t overwrite the snapshot, it gets another copy injected into storage.
Snapshots don’t capture volatile memory, so if you need RAM you need to either play with the instance itself or create a new image from that instance and launch it – perhaps the memory will provide more clues. That is a different process for another day.
Launch a forensics instance
Launch the operating system of your choice, in the same region as your snapshot. Load it with whatever tools you want. I did just a basic analysis by poking around.
Attach the storage volumes
- Go to Snapshots in the management console.
- Click the one you want, right-click, and then “Create Volume from Snapshot”. Make sure you choose the same Availability Zone as your forensics instance.
- Seriously, make sure you choose the same Availability Zone as your instance. People always mess this up. (By ‘people’, I of course mean ‘I’).
- Go back to Volumes.
- Select the new volume when it is ready, and right click/attach.
- Select your forensics instance. (Mine is stopped in the screenshot – ignore that).
- Set a mount point you will remember.
Perform your investigation
- Create a mount point for the new storage volumes, which are effectively external hard drives. For example,
sudo mkdir /forensics.
- Mount the new drive, e.g.,
sudo mount /dev/xvdf1 /forensics. Amazon may change the device mapping when you attach the drive (technically your operating system does that, not AWS, and you get a warning when you attach).
- Remember, use
sudo bash (or the appropriate equivalent for your OS) if you want to peek into a user account in the attached volume.
And that’s it. Remember you can mess with the volume all you want, then attach a new one from the snapshot again for another pristine copy. If you need a legal trail my process probably isn’t rigorous enough, but there should be enough here that you can easily adapt.
Again, try it with my snapshot if you want some practice on something with interesting data inside. And after RSA check back for a tool which automates nearly all of this.
Posted at Monday 13th January 2014 2:35 pm
(0) Comments •
Over the past few years I have spent a lot of time traveling the world, talking and teaching about cloud security. To back that up I have probably spent more time researching the technologies than any other topic since I moved from being a developer and consultant into the analyst role. Something seemed different at such a fundamental level that I was driven to put my hands on a keyboard and see what it looked and felt like. To be honest, even after spending a couple years at this, I still feel I am barely scratching the surface.
But along the way I have learned a heck of a lot. I realized that many of my initial assumptions were wrong, and the cloud required a different lens to tease out the security implications.
This paper is the culmination of that work. It attempts to break down the security implications of cloud computing, both positive and negative, and change how we approach the problem. In my travels I have found that security professionals are incredibly receptive to the differences between cloud and traditional infrastructure, but they simply don’t have the time to spend 3-4 years researching and playing with cloud platforms and services.
I hope this work helps people think about cloud computing differently, providing practical examples of how to leverage and secure it today.
I would like to thank CloudPassage for licensing the paper. This is something I have wanted to write for a long time, but it was hard to find a security company ready to take the plunge. Their financial support enables us to release this work for free. As always, the content was developed completely independently using our Totally Transparent Research process – this time it was actually developed on GitHub to facilitate public response.
I would also like to thank the Cloud Security Alliance for reviewing and co-branding and co-hosting the paper.
There are two versions. The Executive Summary is 2 pages of highlights from the main report. The Full Version includes an executive summary (formatted differently), as well as the full report.
As always, if you have any feedback please leave it here or on the report’s permanent home page.
Posted at Friday 10th January 2014 9:09 am
(1) Comments •
Update: Amazon reached out to me and reversed the charges, without me asking or complaining (or in any way contacting them). I accept full responsibility and didn’t post this to get a refund, but I’m sure not going to complain – neither is Mike.
This is a bit embarrassing to write.
I take security pretty seriously. Okay, that seems silly to say, but we all know a lot of people who speak publicly on security don’t practice what they preach. I know I’m not perfect – far from it – but I really try to ensure that when I’m hacked, whoever gets me will have earned it. That said, I’m also human, and sometimes make sacrifices for convenience. But when I do so, I try to make darn sure they are deliberate, if misguided, decisions. And there is the list of things I know I need to fix but haven’t had time to get to.
Last night, I managed to screw both those up.
It’s important to fess up, and I learned (the hard way) some interesting conclusions about a new attack trend that probably needs its own post. And, as is often the case, I made three moderately small errors that combined to an epic FAIL.
I was on the couch, finishing up an episode of Marvel’s Agents of S.H.I.E.L.D. (no, it isn’t very good, but I can’t help myself; if they kill off 90% of the cast and replace them with Buffy vets it could totally rock, though). Anyway… after the show I checked my email before heading to bed. This is what I saw:
Dear AWS Customer,
Your security is important to us. We recently became aware that your AWS Access Key (ending with 3KFA) along with your Secret Key are publicly available on github.com . This poses a security risk to you, could lead to excessive charges from unauthorized activity or abuse, and violates the AWS Customer Agreement.
We also believe that this credential exposure led to unauthorized EC2 instances launched in your account. Please log into your account and check that all EC2 instances are legitimate (please check all regions - to switch between regions use the drop-down in the top-right corner of the management console screen). Delete all unauthorized resources and then delete or rotate the access keys. We strongly suggest that you take steps to prevent any new credentials from being published in this manner again.
Please ensure the exposed credentials are deleted or rotated and the unauthorized instances are stopped in all regions before 11-Jan-2014.
NOTE: If the exposed credentials have not been deleted or rotated by the date specified, in accordance with the AWS Customer Agreement, we will suspend your AWS account.
Detailed instructions are included below for your convenience.
CHECK FOR UNAUTHORIZED USAGE
To check the usage, please log into your AWS Management Console and go to each service page to see what resources are being used. Please pay special attention to the running EC2 instances and IAM users, roles, and groups. You can also check “This Month’s Activity” section on the “Account Activity” page. You can use the dropdown in the top-right corner of the console screen to switch between regions (unauthorized resources can be running in any region).
DELETE THE KEY
If are not using the access key, you can simply delete it. To delete the exposed key, visit the “Security Credentials” page. Your keys will be listed in the “Access Credentials” section. To delete a key, you must first make it inactive, and then delete it.
ROTATE THE KEY
If your application uses the access key, you need to replace the exposed key with a new one. To do this, first create a second key (at that point both keys will be active) and modify your application to use the new key. Then disable (but not delete) the first key. If there are any problems with your application, you can make the first key active again. When your application is fully functional with the first key inactive, you can delete the first key. This last step is necessary - leaving the exposed key disabled is not acceptable.
I bolted off the couch, mumbling to my wife, “my Amazon’s been hacked”, and disappeared into my office. I immediately logged into AWS and GitHub to see what happened.
Lately I have been expanding the technical work I did for my Black Hat presentation, I am building a proof of concept tool to show some DevOps-style Software Defined Security techniques. Yes, I’m an industry analyst, and we aren’t supposed to touch anything other than PowerPoint, but I realized a while ago that no one was actually demonstrating how to leverage the cloud and DevOps for defensive security. Talking about it wasn’t enough – I needed to show people.
The code is still super basic but evolving nicely, and will be done in plenty of time for RSA. I put it up on GitHub to keep track of it, and because I plan to release it after the talk. It’s actually public now because I don’t really care if anyone sees it early.
The Ruby program currently connects to AWS and a Chef server I have running, and thus needs credentials. Stop smirking – I’m not that stupid, and the creds are in a separate configuration file that I keep locally. My first thought was that I screwed up the
.gitignore and somehow accidentally published that file.
Nope, all good. But it took all of 15 seconds to realize that a second
test.rb file I used to test smaller code blocks still had my Access Key and Secret Key in a line I commented out. When I validated my code before checking it in, I saw the section for pulling from the configuration file, but missed the commented code containing my keys.
Back to AWS.
I first jumped into my Security Credentials section and revoked the key. Fortunately I didn’t see any other keys or access changes, and that key isn’t embedded in anything other than my dev code so deleting it wouldn’t break anything. If this was in a production system it would have been very problematic.
Then I checked my running instances. Nothing in
us-east-1 where I do most of my work, so I started from the top of the list and worked my way down.
There is was. 5 extra large instances in
us-west-1. 5 more in Ireland. All had been running for 72 hours, which, working from my initial checkin of the code on GitHub, means the bad guys found the credentials within about 36 hours of creating the project and loading the files.
Time for incident response mode. I terminated all the instances and ran through every region in AWS to make sure I didn’t miss anything. The list was:
- Lock down or revoke the Access Key.
- Check all regions for running instances, and terminate anything unexpected (just the 10 I found).
- Snapshot one of the instances for forensics. I should have also collected metadata but missed that (mainly which AMI they used).
- Review IAM settings. No new users/groups/roles, and no changes to my running policies.
- Check host keys. No changes to my existing ones, and only a generic one in each region where they launched the new stuff.
- Check security groups. No changes to any of my running ones.
- Check volumes and snapshots. Clean, except for the boot volumes of those instances.
- Check CloudTrail. The jerks only launched instances in regions not supported, so even though I have CloudTrail running I didn’t collect any activity.
- Check all other services. Fortunately, being my dev/test/teaching account, I know exactly what I’m running. So I jumped into the Billing section of my account and confirmed only the services I am actively using were running charges. This would be a serious pain if you were actively using a bunch of different services.
- Notice the $500 in accumulated EC2 charges. Facepalm. Say naughty words.
I was lucky. The attackers didn’t mess with anything active I was running. That got me curious, because 10 extra large instances racking up $500 in 3 days initially made me think they were out to hurt me. Maybe even embarrass the stupid analyst. Then I went into forensics mode.
Without CloudTrail I had no way to look for the origin of the API calls using that Access Key. Aside from the basic metadata (some of which I lost because I forgot to record it when I terminated the instances – a note to my future self). What I did have was a snapshot of their server. Forensics isn’t my expertise but I know a few basics.
I launched a new (micro) instance, and then created and attached a new volume based on the snapshot. I mounted the volume and started poking around. Since I still had the original snapshot, I didn’t need to worry about altering the data on the volume. I can always create and attach a new one.
The first thing I did was check the logs. I saw a bunch of references to CUDA 5.5. Uh oh, that’s GPU stuff, and explains why they launched a bunch of extra larges. The bad guys didn’t care about me, and weren’t out to get me specifically, as you will see in a sec. Then I checked the user accounts. There was only one, ec2-user, which is standard on Amazon Linux instances. The home directory had everything I needed:
cpuminer CudaMiner tor-0.2.4.20.tar.gz
I didn’t need to check DuckDuckGo to figure that one out (but I did to be sure). Looks like some Litecoin/Bitcoin mining. That explains the large instances. Poking around the logs I also found the IP address ‘18.104.22.168’, which shows as Latvia. Not that I can be certain of anything because they also use Tor.
I could dig in more but that told me all I needed to know. (If you want a copy of the snapshot, let me know). Here’s my conclusion:
Attackers are scraping GitHub for AWS credentials embedded in code (and probably other cloud services). I highly doubt I was specifically targeted. They then use these to launch extra large instances for Bitcoin mining, and mask the traffic with Tor.
Someone mentioned this in our internal chat room the other day, so I know I’m not the first to write it up.
Here is where I screwed up:
- I did not have billing alerts enabled. This is the one error I knew I should have dealt with earlier but didn’t get around to. I paid the price for complacency.
- I did not completely scrub my code before posting to GitHub. This was a real mistake – I tried and made an error. I blame my extreme sleep deprivation, and a lot of this work was done over the holidays, with a lot of distractions. I was sloppy.
- I used an Access key in my dev code without enough (any) restrictions. This was also on my to-do list, but after I finished up more of the code because a lot of what I’m building needs extensive permissions and I can’t predict them all ahead of time. Managing AWS IAM is a real pain so I was saving it for the end.
Here is how I am fixing things:
- Billing alerts are now enabled, with a threshold just above my monthly average.
- I am creating an IAM policy and Access Key that restricts the application to my main development region, and removes the ability to make IAM changes or access/adjust CloudTrail.
- All dev work using that Access Key will be in a region that supports CloudTrail.
- When I am done with development I will create a more tailored IAM policy that only grants required operations. If I actually designed this thing ahead of time instead of hacking on it a little every day I could have done this ahead of time.
In the end I was lucky. I am only out $500 (well, Securosis is), 45 minutes of investigation and containment effort, and my ego (which has plenty of deflation room). They could have much mucked with my account more deeply. They couldn’t lock me out with an Access Key, but still could have cost me many more hours, days, and cash.
Lesson learned. Not that I didn’t know it already, but I suppose a hammer to the head every now and then helps keep you frosty.
I’d like to thank “Alex R.”, who I assume is on the AWS security team. I am also impressed that AWS monitors GitHub for this, and then has a process to validate whether the keys were potentially used, to help point compromised customers in the right direction.
Posted at Tuesday 7th January 2014 11:00 am
(30) Comments •
This is part five of a series. You can read part one, part two, part three, or part four; or track the project on GitHub.
Real World Examples
Cloud computing covers such a wide range of different technologies that there are no shortage of examples to draw from. Here are a few generic examples from real-world deployments. These get slightly technical because we want to highlight practical, tactical techniques to prove we aren’t just making all this up:
Embedding and Validating a Security Agent Automatically
In a traditional environment we embed security agents by building them into standard images or requiring server administrators to install and register them. Both options are very prone to error and omission, and hard to validate because you often need to rely on manual scanning. Both issues become much easier to manage in cloud computing.
To embed the agent:
- The first option is to build the agent into images. Instead of using generic operating system images you build your own, then require users to only launch approved images. In a private cloud you can enforce this with absolute control of what they run. In public clouds it is a bit tougher to enforce, but you can quickly catch exceptions using our validation process.
- The second option, and our favorite, is to inject the agent when instances launch. Some operating systems support initialization scripts which are passed to the launching instance by the cloud controller. Depending again on your cloud platform, you can inject these scrips automatically when autoscaling, via a management portal, or manually at other times. The scripts install and configure software in the instance before it is accessible on the network.
- Either way you need an agent that understands how to work within cloud infrastructure and is capable of self-registering to the management server. The agent pulls system information and cloud metadata, then connects with its management server, which pushes configuration policies back to the agent so it can self-configure. This process is entirely automated the first time the agent runs.
- Configuration may be based on detected services running on the instance, metadata tags applied to the instance (in the cloud management plane), or other characteristics such as where it is on the network.
- We provide a detailed technical example of agent injection and self-configuration in our Software Defined Security paper.
The process is simple. Build the agent into images or inject it into launching instances, then have it connect to a management server to configure itself. The capabilities of these agents vary widely. Some replicate standard endpoint protection but others handle system configuration, administrative user management, log collection, network security, host hardening, and more.
Validating that all your instances are protected can be quite easy, especially if your tool supports API:
- Obtain a list of all running instances from the cloud controller. This is a simple API call.
- Obtain a list of all instances with the security agent. This should be an API call to your security management platform, but might require pulling a report if that isn’t supported.
- Compare the lists. You cannot hide in the cloud, so you know every single instance. Compare active instances against managed instances, and find the exceptions.
We also show how to do this in the paper linked above.
Controlling SaaS with SAML
Pretty much everyone uses some form of Software as a Service, but controlling access and managing users can be a headache. Unless you link up using federated identity, you need to manage user accounts on the SaaS platform manually. Adding, configuring, and removing users on yet another system, and one that is always Internet accessible, is daunting. Federated identity solves this problem:
- Enable federated identity extensions on your directory server. This is an option for Active Directory and most LDAP servers.
- Contact your cloud provider to obtain their SAML configuration and management requirements. SAML (Security Assertion Markup Language) is a semi-standard way for a relying party to allow access and activities based on approval from an identity provider.
- Configure SAML yourself or use a third-party tool compatible with your cloud provider(s) which does this for you. If you use several SaaS providers a tool will save a lot of effort.
- With SAML users don’t have a username and password with the cloud provider. The only way to log in is to first authenticate to your directory server, which then provides (invisible to the user) a token to allow access to the cloud provider. Users need to be in the office or on a VPN.
- If you want to enable remote users without VPN you can set up a cloud proxy and issue them a special URL to use instead of the SaaS provider’s standard address. This address redirects to your proxy, which then handles connecting back to your directory server for authentication and authorization. This is something you typically buy rather than build.
Why do this? Instead of creating users on the SaaS platform it enables you to use existing user accounts in your directory server and authorize access using standard roles and groups, just like you do for internal servers. You also now get to track logins, disable accounts from a single source (your directory server), and otherwise maintain control. It also means people can’t steal a user’s password and then access Salesforce from anywhere on the Internet
Compartmentalizing Cloud Management with IAM
One of the largest new risks in cloud computing is Internet-accessible management of your entire infrastructure. Most cloud administrators use cloud APIs and command line interfaces to manage the infrastructure (or PaaS, and even sometimes SaaS). This means access credentials are accessed through environment variables or even the registry. If they use a web interface that opens up browser-based attacks. Either way, without capability compartmentalization an attacker could take complete control over their infrastructure by merely hacking a laptop. With a few API calls or a script they could copy or destroy everything in minutes.
All cloud platforms support internal identity and access management to varying degrees – this is something you should look for during your selection process. You can use this to limit security risks – not just to break out development and operations teams. The following isn’t supported on all platforms yet but it gives you an idea of the options:
- Create a Security Group and assign it IAM rights, and restrict these rights from all other groups. “IAM rights” means the security team manages new users, changes user and group rights, and prevents privilege escalation. They can even revoke administrative access to running instances by modifying the associated rights.
- Use separate cloud development and production groups and accounts. Even if you use DevOps require users to switch accounts for different tasks.
- The development group can have complete control over a development environment, which is segregated from the operations environment. Restrict them to building and launching in cloud segments that are isolated from the Internet and only route back to your organization. Developers can have free access to create, destroy, and otherwise manage development instances.
- In your production environment break out administrative tasks. Restrict all snapshotting and termination of instances to separate roles. This prevents attackers from copying data or destroying servers unless they manage to get into one of those accounts.
- Security Group changes should be restricted to the security team (or another designated group). Cloud administrators can move instances into and out of different Security Groups if needed (although ideally you would also restrict this), but only a small team should set the rules for production.
- You will still need super-admin accounts, but these can be highly restricted and used as infrequently as possible.
- In general use different groups, with different credentials, for different parts of your infrastructure. For example in production you could break out management by application stack.
- If you need auditing on API calls, and your cloud platform doesn’t support it, require administrators to connect through a proxy server that logs activity.
Under these guidelines an attacker needs to break into multiple accounts to cause the worst damage. Notice that what we just described isn’t necessarily easy to manage at scale – this is an area where you would allocate the resources freed by reducing other risks such as patching.
Hypersegregation with Security Groups
Our last example is also one of the simplest and most powerful.
As mentioned earlier, a Security Group is essentially a basic stateless firewall implemented by the cloud platform. It’s like having a small cheap firewall in front of every server. When first using Security Groups, many users think of them like subnet firewalls, but that isn’t quite how they work. In a subnet the firewall in front of the group protects access to the systems in the group. A Security Group is more like a firewall policy applied on a per-system level. Instances in a Security Group can’t communicate with other instances in the same group unless you create an explicit rule to allow that.
Every single instance is, by default, firewalled off from every other one. This enables an incredible level of compartmentalization we like to call hypersegregation. (Because we are analysts and we tend to make up our own words).
For example, within an application stack you will likely have multiple instances of your web servers, application servers, and database servers. Each of those should be in a Security Group that allows it to only talk to the layers immediately above and below in the stack. The instances in the Security Group shouldn’t be allowed to talk to each other so cracking a server only allows very limited communications, over approved ports and protocols, to the servers directly above and below.
Security Groups also should not allow any public Internet access (except from the web server group). Administrative access is restricted to known addresses from either a jump server or your internal IP range.
Better yet, instead of always leaving every server open to administrative access, keep that closed unless needed. Then adjust the individual server’s Security Group to make the change. You do this through the cloud management plane, so an attacker would need to crack the management plane, obtain server credentials, and finally obtain access to the server itself.
This setup is nearly impossible to create with traditional infrastructure. We cannot afford all those physical firewalls, and creating that many switch-based rules is a non-starter at scale. We could do it using host firewall rules, but managing those across multiple platforms in a dynamic environment is insanely complex.
In this case the cloud offers substantially better security by default.
Where to Go from Here
This paper can only offer a high-level overview to highlight how cloud computing is different for security, and to give you ideas on how to adjust your security controls to leverage its advantages while accounting for the different risks. The real devil is in the details, and we always worry that we are over-simplifying with these overviews.
But every single thing we described is being used, today, in the real world. These aren’t cases of “maybe it will work”, but examples of what leading cloud users are implementing on a daily basis. Our examples are generally far more basic than what we have seen in practice.
The problem is that most security professionals don’t have the time or resources to become cloud security experts. Their days are filled with the ongoing minutiae of stopping attacks, meeting compliance requirements, and fighting fires. It becomes easy to dismiss cloud computing as yet another fad or trend we can manage as we always have, especially in light of vendors’ deluge of announcements that their products work just the same in the cloud.
But cloud computing is far not business as usual. It is an entirely new technology and operations model that fundamentally disrupts existing practices. One with a staggering rate of change, as well as entirely new platforms and capabilities emerging constantly. Two years ago big data was accessible only to those with top-line resources and massive datacenters. Now anyone can rent petabyte-scale data warehouses for a few hours of analysis. Using a web browser.
Adoption of the cloud will only accelerate, and it is vital that security professionals come up to speed on the technologies and adjust to meet new demands. The opportunities to improve security over existing practices are powerful and practical.
We will continue to cover this in depth in future research, digging into the specifics of how to handle cloud security and what it means to existing practices. Hopefully you will find it useful.
Posted at Wednesday 20th November 2013 2:17 pm
(1) Comments •
This is part four of a series. You can read part one, part two, or part three; or track the project on GitHub.
As a reminder, this is the second half of our section on examples for adapting security to cloud computing. As before this isn’t an exhaustive list – just ideas to get you started.
There are three reasons to encrypt data in the cloud, in order of their importance:
- To protect data in backups, snapshots, and other portable copies or extracts.
- To protect data from cloud administrators.
How you encrypt varies greatly, depending on where the data resides and which particular risks most concern you. For example many cloud providers encrypt object file storage or SaaS by default, but they manage the keys. This is often acceptable for compliance but doesn’t protect against a management plane breach.
We wrote a paper on infrastructure encryption for cloud, from which we extracted some requirements which apply across encryption scenarios:
- If you are encrypting for security (as opposed to a compliance checkbox) you need to manage your own keys. If the vendor manages your keys your data may still be exposed in the event of a management plane compromise.
- Separate key management from cloud administration. Sure, we are all into DevOps and flattening management, but this is one situation where security should manage outside the cloud management plane.
- Use key managers that are as agile and elastic as the cloud. Like host security agents, your key manager needs to operate in an environment where servers appear and disappear automatically, and networks are virtual.
- Minimize SaaS encryption. The only way to encrypt data going to a SaaS provider is with a proxy, and encryption breaks the processing of data at the cloud provider. This reduces the utility of the service, so minimize which fields you need to encrypt. Or, better yet, trust your provider.
- Use secure cryptography agents and libraries when embedding encryption in hosts or IaaS and PaaS applications. The defaults for most crypto libraries used by developers are not secure. Either understand how to make them secure or use libraries designed from the ground up for security.
Federate and Automate Identity Management
Managing users and access in the cloud introduces two major headaches:
- Controlling access to external services without having to manage a separate set of users for each.
- Managing access to potentially thousands or tens of thousands of ephemeral virtual machines, some of which may only exist for a few hours.
In the first case, and often the second, federated identity is the way to go:
- For external cloud services, especially SaaS, rely on SAML-based federated identity linked to your existing directory server. If you deal with many services this can become messy to manage and program yourself, so consider one of the identity management proxies or services designed specifically to tackle this problem.
- For access to your actual virtual servers, consider managing users with a dynamic privilege management agent designed for the cloud. Normally you embed SSH keys (or known Windows admin passwords) as part of instance initialization (the cloud controller handles this for you). This is highly problematic for privileged users at scale, and even straight directory server integration is often quite difficult. Specialized agents designed for cloud computing dynamically update users, privileges, and credentials at cloud speeds and scale.
Adapt Network Security
Networks are completely virtualized in cloud computing, although different platforms use different architectures and implementation mechanisms, complicating the situation. Despite that diversity there are consistent traits to focus on. The key issues come down to loss of visibility using normal techniques, and adapting to the dynamic nature of cloud computing.
All public cloud providers disable networking sniffing, and that is an option on all private cloud platforms. A bad guy can’t hack a box and sniff the entire network, but you also can’t implement IDS and other network security like in traditional infrastructure. Even when you can place a physical box on the network hosting the cloud, you will miss traffic between instances on the same physical server, and highly dynamic network changes and instances appear and disappear too quickly to be treated like regular servers. You can sometimes use a virtual appliance instead, but unless the tool is designed to cloud specifications, even one that works in a virtual environment will crack in a cloud due to performance and functional limitations.
While you can embed more host network security in the images your virtual machines are based on, the standard tools typically won’t work because they don’t know exactly where on the network they will pop up, nor what addresses they need to talk to. On a positive note, all cloud platforms include basic network security. Set your defaults properly, and every single server effectively comes with its own firewall.
- Design a good baseline of Security Groups (the basic firewalls that secure the networking of each instance), and use tags or other mechanisms to automatically apply them based on server characteristics. A Security Group is essentially a firewall around every instance, offering compartmentalization that is extremely difficult to get in a traditional network.
- Use a host firewall, or host firewall management tool, designed for your cloud platform or provider. These connect to the cloud itself to pull metadata and configure themselves more dynamically than standard host firewalls.
- Also consider pushing more network security, including IDS and logging, into your instances.
- Prefer virtual network security appliances that support cloud APIs and are designed for the cloud platform or provider. For example, instead of forcing you to route all your virtual traffic through it as if you were on a physical network, the tool could distribute its own workload – perhaps even integrating with hypervisors.
- Take advantage of cloud APIs. It is very easy to pull every Security Group rule and then locate every instance. Combined with some additional basic tools you could then automate finding errors and omissions. Many cloud deployments do this today as a matter of course.
- Whatever tools you use, they must be able to account for the high rate of servers appearing and disappearing in a cloud. Rules must follow hosts.
Leverage Cloud Characteristics
This section is a bit more advanced, but you can reap significant security advantages once you start to leverage the nature of the cloud. Instead of patching, just launch properly configured new servers and swap using on a cloud load balancer. Find every single system in your cloud deployment, including extensive metadata, with a simple API call. Deploy applications to a Platform as a Service (PaaS) and stop worrying about misconfigured servers. Here are some real world examples and recommendations to get you started, but they barely scratch the surface:
- Use immutable servers instead of patching. Few sysadmins patch servers without trepidation, and this is a frequent source of downtime. Eliminate the worry by running your applications behind cloud load balancers, and instead of patching servers, launch new ones with the updated software. Then slowly (or quickly) switch traffic to the new, ‘patched’ servers. If something doesn’t work your old servers are still there and you can shift traffic back. It is like having a spare datacenter lying around.
- Leverage stateless security strategies to mange your environment in real time. Normally we rely on knowledge from scanners and assessments to understand our assets and environments, which can be out of date and difficult to keep complete. The cloud controller knows where everything is, how it is configured (to a degree), and even who owns or created it, so we have a constant stream of comprehensive real-time data. A server cannot exist in the cloud without the controller knowing about it. Your entire network architecture is but an API call away – no scanners needed. Track and manage your security state in real time.
- Automate more security. Embed security configurations and agents into images, or inject them into instances when they launch. Every virtual machine, when launched, can automatically configure host-based security – especially if they can communicate with a management server designed for the cloud. For example, a host can register itself with a configuration management server, then secure running services by default depending on what the host is intended for. You could even automatically adjust Security Group firewall rules based on the software services running on the host, who owns it, and where it is in your application stack.
- Standardize security with Platform as a Service. Hate patching database servers or configuring them properly? Struggle with developers and admins who open up too many services on application servers? Use Platform as a Service instead, and improve your ability to standardize security.
- Build a security abstraction layer. Nothing prevents your security team from using the same cloud APIs and management tools as administrators and developers. Configured properly, this provides security oversight and control without interfering with development or operations. For example you could restrict management of cloud IAM to the security team, enabling them to assume management of a server in case of a security incident. The security team could control key network Security Groups and security in production, while still allowing developers to manage it themselves in more isolated development environments. Embed a host security agent into every image (or instance, using launch scripts) and security gains a hook into every running virtual machine.
- Move to Software Defined Security. This concept is an extension of basic automation. Nothing prevents Security from writing its own programs using cloud APIs, and the APIs of security and operations tools, so you can create powerful and agile security controls. For example you could write a small program to find every instance in your environment that isn’t linked into your configuration management tool, and recheck every few minutes. The tool would identify who launched the server, the operating system, where it was on the network, and the surrounding network security. You could, at a keystroke, take control of the server, notify the owner, and isolate it on the network until you know what it is for; then integrate it into configuration management and enforce security policies. All with perhaps 100 lines of code.
These should get you thinking – they start to show how the cloud can nearly eradicate certain security problems and enable you to shift resources.
Posted at Tuesday 19th November 2013 5:27 pm
(0) Comments •
This is part three of a series. You can read part one or part two, or track the project on GitHub.
This part is split into two posts – here is the first half:
Adapting Security for Cloud Computing
If you didn’t already, you should now have a decent understanding of how cloud computing differs from traditional infrastructure. Now it’s time to switch gears to how to evolve security to address shifting risks.
These examples are far from comprehensive, but offer a good start and sample of how to think differently about cloud security.
As we keep emphasizing, taking advantage of the cloud poses new risks, as well as both increasing and decreasing existing risks. The goal is to leverage the security advantages, freeing up resources to cover the gaps. There are a few general principles for approaching the problem that help put you in the proper state of mind:
- You cannot rely on boxes and wires. Quite a bit of classical security relies on knowing the physical locations of systems, as well as the network cables connecting them. Network traffic in cloud computing is virtualized, which completely breaks this model. Network routing and security are instead defined by software rules. There are some advantages here, which are beyond the scope of this paper but which we will detail with future research.
- Security should be as agile and elastic as the cloud itself. Your security tools need to account for the highly dynamic nature of the cloud, where servers might pop up automatically and run for only an hour before disappearing forever.
- Rely more on policy-based automation. Wherever possible design your security to use the same automation as the cloud itself. For example there are techniques to automate (virtual) firewall rules based on tags associated with a server, rather than applying them manually.
- Understand and adjust for the characteristics of the cloud. Most virtual network adapters in cloud platforms disable network sniffing, so that risk drops off the list. Security groups are essentially virtual firewalls that on individual instance, meaning you get full internal firewalls and compartmentalization by default. Security tools can be embedded in images or installation scripts to ensure they are always installed, and cloud-aware ones can self configure. SAML can be used to provide absolute device and user authentication control to external SaaS applications. All these and more are enabled by the cloud, once you understand its characteristics.
- Integrate with DevOps. Not all organizations are using DevOps, but DevOps principles are pervasive in cloud computing. Security teams can integrate with this approach and leverage it themselves for security benefits, such as automating security configuration policy enforcement.
DevOps is an IT model that blurs the lines between development and IT operations. Developers play a stronger role in managing their own infrastructure through heavy use of programming and automation. Since cloud enables management of infrastructure using APIs, it is a major enabler of DevOps. While it is incredibly agile and powerful, lacking proper governance and policies it can also be disastrous since it condenses many of the usual application development and operations check points.
These principles will get you thinking in cloud terms, but let’s look at some specifics.
Control the Management Plane
The management plane is the administrative interfaces, web and API, used to manage your cloud. It exists in all types of cloud computing service models: IaaS, PaaS, and SaaS. Someone who compromises a cloud administrator’s credentials has the equivalent of unmonitored physical access to your entire data center, with enough spare hard drives, fork lifts, and trucks to copy the entire thing and drive away. Or blow the entire thing up.
- We cannot overstate the importance of hardening the management plane. It literally provides absolute control over your cloud deployment – often including all disaster recovery.*
We have five recommendations for securing the management plane:
- If you manage a private cloud, ensure you harden the web and API servers, keeping all components up to date and protecting them with the highest levels of web application security. This is no different than protecting any other critical web server.
- Leverage the Identity and Access Management features offered by the management plane. Some providers offer very fine-grained controls. Most also integrate with your existing IAM using federated identity. Give preference to your platform/provider’s controls and…
- Compartmentalize with IAM. No administrator should have full rights to all aspects of the cloud. Many providers and platforms support granular controls, including roles and groups, which you can leverage to restrict the damage potential of a compromised developer or workstation. For example, you can have a separate administrator for assigning IAM rights, only allow administrators to manage certain segments of your cloud, and further restrict them from terminating instances.
- Add auditing, logging, and alerting where possible. This is one of the more difficult problems in cloud security because few cloud providers audit administrator activity – such as who launched or stopped a server using the API. For now you will likely need a third-party tool or to work with particular providers for necessary auditing.
- Consider using security or cloud management proxies. These tools and services proxy the connection between a cloud administrator and the public or private cloud management plane. They can apply additional security rules and fill logging and auditing gaps.
Automate Host (Instance) Security
An instance is a virtual machine, which is based on a stored template called an image. When you ask the cloud for a server you specify the image to base it on, which includes an operating system and might bring a complete single-server application stack. The cloud then configures it using scripts which can embed administrator credentials, provide an IP address, attach and format storage, etc.
Instances may exist for years or minutes, are configured dynamically, and can be launched nearly anywhere in your infrastructure – public or private. You cannot rely on manually assessing and adjusting their security. This is very different than building a server in a test environment, performing a vulnerability scan, and then physically installing it behind a particular firewall and IPS. More security needs to be embedded in the host itself, and the images the instances are based on.
Fortunately, these techniques improve your ability to enforce secure configurations.
- Embed security in images and at launch. If you use your own images, you can embed security settings and agents in the base images, and set them to activate and self-configure (after connecting to their management server) when an instance launches. Alternatively, many clouds and images support passing initialization scripts to new instances, which process them during launch using the same framework the cloud uses to configure essential settings. You can embed security settings and install security software in these scripts.
- Integrate with configuration management. Most serious cloud administrators rely on a new breed of configuration management tools to dynamically manage their systems with automation. Security can leverage these to enforce base security configurations, even down to specific application settings, which apply to all managed systems. Set properly, these configuration management tools handle both initial configurations and maintaining state, overwriting local changes when policy pulls and pushes occur.
- Dynamically configure security agents. When security agents are embedded into images or installed automatically on launch, default settings rarely meet a system’s particular security requirements. Thankfully, cloud platforms provide rich metadata on instances and their environment – including system configuration, network configuration, applications installed, etc. Cloud-aware security agents connect instantly to the management server on launch, and then self-configure with policies leveraging their information. They can also update settings in near real time based on new policies or changes in the cloud.
- Security agents should be lightweight, designed for the cloud, and cloud agnostic. You shouldn’t need 8 different security agents for 12 different cloud providers, and agents shouldn’t materially increase system resource requirements or struggle to communicate in a dynamic and virtual network.
- Host security tools should support REST APIs. This enables you to integrate your security into the cloud fabric itself when needed. Agents don’t necessarily need to communicate with the management server over a REST API, but the management server should expose key functions via (secure) API. This enables you to, for example, write scripts to pull host security information, compare it against network security information, and make adjustments or generate reports. Or integrate security alerts and status from the host tool into your SIEM without additional connectors.
REST APIs are the dominant format for APIs in web-friendly applications. They run over HTTP and are much easier to integrate and manage compared to older SOAP APIs.
We will finish with adapting security controls in a second post tomorrow…
Posted at Monday 18th November 2013 1:19 pm
(1) Comments •
This is part two of a series. You can read part one here or track the project on GitHub.
How the Cloud Is Different for Security
In the early days of cloud computing, even some very well-respected security professionals claimed it was little more than a different kind of outsourcing, or equivalent to the multitenancy of a mainframe. But the differences run far deeper, and we will show how they require different cloud security controls. We know how to manage the risks of outsourcing or multi-user environments; cloud computing security builds on this foundation and adds new twists.
These differences boil down to abstraction and automation, which separate cloud computing from basic virtualization and other well-understood technologies.
Abstraction is the extensive use of multiple virtualization technologies to separate compute, network, storage, information, and application resources from the underlying physical infrastructure. In cloud computing we use this to convert physical infrastructure into a resource pool that is sliced, diced, provisioned, deprovisioned, and configured on demand, using the automation we will talk about next.
It really is a bit like the matrix. Individual servers run little more than a hypervisor with connectivity software to link them into the cloud, and the rest is managed by the cloud controller. Virtual networks overlay the physical network, with dynamic configuration of routing at all levels. Storage hardware is similarly pooled, virtualized, and then managed by the cloud control layers. The entire physical infrastructure, less some dedicated management components, becomes a collection of resource pools. Servers, applications, and everything else runs on top of the virtualized environment.
Abstraction impacts security significantly in four ways:
- Resource pools are managed using standard, web-based (REST) Application Programming Interfaces (APIs). The infrastructure is managed with network-enabled software at a fundamental level.
- Security can lose visibility into the infrastructure. On the network we can’t rely on physical routing for traffic inspection or management. We don’t necessarily know which hard drives hold which data.
- Everything is virtualized and portable. Entire servers can migrate to new physical systems with a few API calls or a click on a web page.
- We gain greater pervasive visibility into the infrastructure configuration itself. If the cloud controller doesn’t know about a server it cannot function. We can map the complete environment with those API calls.
We have focused on Infrastructure as a Service, but the same issues apply to Platform and Software as a Service, except they often offer even less visibility.
Virtualization has existed for a long time. The real power cloud computing adds is automation. In basic virtualization and virtual data centers we still rely on administrators to manually provision and manage our virtual machines, networks, and storage. Cloud computing turns these tasks over to the cloud controller to coordinate all these pieces (and more) using orchestration.
Users ask for resources via web page or API call, such as a new server with 1tb of storage on a particular subnet, and the cloud determines how best to provision it from the resource pool; then it handles installation, configuration, and coordinating all the networking, storage, and compute resources to pull everything together into a functional and accessible server. No human administrator required.
Or the cloud can monitor demand on a cluster and add and remove fully load-balanced and configured systems based on rules, such as average system utilization over a specified threshold. Need more resources? Add virtual servers. Systems underutilized? Drop them back into the resource pool. In public cloud computing this keeps costs down as you expand and contract based on what you need. In private clouds it frees resources for other projects and requirements, but you still need a shared resource pool to handle overall demand. But you are no longer stuck with under-utilized physical boxes in one corner of your data center and inadequate capacity in another.
The same applies to platforms (including databases or application servers) and software; you can expand and contract database storage, software application server capacity, and storage as needed – without additional capital investment.
In the real world it isn’t always so clean. Heavy use of public cloud may exceed the costs of owning your own infrastructure. Managing your own private cloud is no small task, and is ripe with pitfalls. And abstraction does reduce performance at certain levels, at least for now. But with the right planning, and as the technology continues to evolve, the business advantages are undeniable.
The NIST model of cloud computing is the best framework for understanding the cloud. It consists of five Essential Characteristics, three Service Models (IaaS, PaaS, and SaaS) and four Delivery Models (public, private, hybrid and community). Our characteristic of abstraction generally maps to resource pooling and broad network access, while automation maps to on-demand self service, measured service, and rapid elasticity. We aren’t proposing a different model, just overlaying the NIST model to better describe things in terms of security.
Thanks to this automation and orchestration of resource pools, clouds are incredibly elastic, dynamic, agile, and resilient.
But even more transformative is the capability for applications to manage their own infrastructure because everything is now programmable. The lines between development and operations blur, offering incredible levels of agility and resilience, which is one of the concepts underpinning the DevOps movement. But of course done improperly it can be disastrous.
Cloud, DevOps, and Security in Practice: Examples
Here are a few examples that highlight the impact of abstraction and automation on security. We will address the security issues later in this paper.
- Autoscaling: As mentioned above, many IaaS providers support autoscaling. A monitoring tool watches server load and other variables. When the average load of virtual machines exceeds a configurable threshold, new instances are launched from the same base image with advanced initialization scripts. These scripts can automatically configure all aspects of the server, pulling metadata from the cloud or a configuration management server. Advanced tools can configure entire application stacks. But these servers may only exist for a short period, perhaps never during a vulnerability assessment window. Or images may launch in the wrong zone, with the wrong network security rules. The images and initialization scripts might not be up to date for the latest security vulnerabilities, creating cracks in your defenses.
- Immutable Servers: Autoscaling can spontaneously and automatically orchestrate the addition and subtraction of servers and other resources. The same concepts can eliminate the need to patch. Instead of patching a running server you might use the same scripting and configuration management techniques, behind a virtual load balancer, to launch new, up-to-date versions of a server and then destroy the unpatched virtual machines.
- Snapshots: Cloud data typically relies on virtual storage, and even running servers use what are essentially virtual hard drives with RAID. A snapshot is a near-instant backup of all the data on a storage volume, since it merely needs to copy one version of the data, without taking the system down on affecting performance. These snapshots are incredibly portable and, in public clouds, can be made public with a single API call. Or you could write a program to snapshot all your servers at once (if your cloud has the capacity). This is great for forensics, but also enables an attacker to copy your entire data center and make it public with about 10 lines of code.
- Management Credentials: The entire infrastructure deployed on the cloud is managed, even down to the network and server level, using API calls and perhaps web interfaces. Administrator tools typically keep these credentials in memory as environment variables or the registry, making them accessible even without administrative control over the cloud admin’s workstation. Also, most clouds don’t provide an audit log of these commands. Many organizations fail to compartmentalize the rights of cloud administrators, leaving their entire infrastructure open to a single compromised system.
- Software Defined Security: With only 20 lines of code you can connect to your cloud over one API, your configuration management tool with another, and your security tool with a third. You can instantly assess the configuration and security of every server in your environment, without any scanning, in real time. This is nearly impossible with traditional security tools.
Snapshots highlights some of the risks of abstraction. Autoscaling, some risks of automation, and management credentials, the risks of both. But Software Defined Security and immutable servers offer advantages. We will dig into specifics next, now that we have highlighted the core differences.
And all that without mentioning multitenancy or outsourcing.
Posted at Wednesday 13th November 2013 10:44 pm
(0) Comments •
This is the first post in a new series detailing the key differences between cloud computing and traditional security. I feel pretty strongly that, although many people are talking about the cloud, nobody has yet done a good job of explaining why and how security needs to adapt at a fundamental level. It is more than outsourcing, more than multitenancy, and definitely more than simple virtualization. This is my best stab at it, and I hope you like it.
The entire paper, as I write it, is also posted and updated at GitHub for those of you who want to track changes, submit feedback, or even submit edits.
Special thanks to CloudPassage for agreeing to license the paper (as always, we are following our Totally Transparent Research Process and they do not have any more influence than you do, and can back out of licensing the paper if, in the end, they don’t like it).
And here we go…
What CISOs Need to Know about Cloud Computing
One of a CISO’s most difficult challenges is sorting the valuable wheat from the overhyped chaff, and then figuring out what it means in terms of risk to your organization. There is no shortage of technology and threat trends, and CISOs need not to only determine which matter, but how they impact security.
The rise of cloud computing is one of the truly transformative evolutions that fundamentally change core security practices. Far more than an outsourcing model, cloud computing alters the very fabric of our infrastructure, technology consumption, and delivery models. In the long run, the cloud and mobile computing are likely to mark a larger shift than the Internet.
This series details the critical differences between cloud computing and traditional infrastructure for security professionals, as well as where to focus security efforts. We will show that the cloud doesn’t necessarily increase risks – it shifts them, and provides new opportunities for significant security improvement.
Different, But Not the Way You Think
Cloud computing is a radically different technology model – not just the latest flavor of outsourcing. It uses a combination of abstraction and automation to achieve previously impossible levels of efficiency and elasticity. But in the end cloud computing still relies on traditional infrastructure as its foundation. It doesn’t eliminate physical servers, networks, or storage, but allows organizations to use them in different ways, with substantial benefits.
Sometimes this means building your own cloud in your own datacenter; other times it means renting infrastructure, platforms, and applications from public providers over the Internet. Most organizations will use a combination of both. Public cloud services eliminate most capital expenses and shift them to on-demand operational costs. Private clouds allow more efficient use of capital, tend to reduce operational costs, and increase the responsiveness of technology to internal needs.
Between the business benefits and current adoption rates, we expect cloud computing to become the dominant technology model over the next ten to fifteen years. As we make this transition it is the technology that create clouds, rather than the increased use of shared infrastructure, that really matters for security. Multitenancy is more an emergent property of cloud computing than a defining characteristic.
Security Is Evolving for the Cloud
As you will see, cloud computing isn’t more or less secure than traditional infrastructure – it is different. Some risks are greater, some are new, some are reduced, and some are eliminated. The primary goal of this series is to provide an overview of where these changes occur, what you need to do about them, and when.
Cloud security focuses on managing the different risks associate with abstraction and automation. Mutitenancy tends to be more a compliance issue than a security problem, and we will cover both aspects. Infrastructure and applications are opened up to network-based management via Internet APIs. Everything from core network routing to creating and destroying entire application stacks is now possible using command lines and web interfaces. The early security focus has been on managing risks introduced by highly dynamic virtualized environments such as autoscaled servers, and broad network access, including a major focus on compartmentalizing cloud management.
Over time the focus is gradually shifting to hardening the cloud infrastructure, platforms, and applications, and then adapting security to use the cloud to improve security. For example, the need for data encryption increases over time as you migrate more sensitive data into the cloud. But the complexities of internal network compartmentalization and server patching are dramatically reduced as you leverage cloud infrastructure.
We expect to eventually see more security teams hook into the cloud fabric itself – bridging existing gaps between security tools and infrastructure and applications with Software Defined Security. The same APIs and programming techniques that power cloud computing can provide highly-integrated dynamic and responsive security controls – this is already happening.
This series will lay out the key differences, with suggestions for where security professionals should focus. Hopefully, by the end, you will look at the cloud and cloud security in a new light, and agree that the cloud isn’t just the latest type of outsourcing.
Posted at Tuesday 12th November 2013 8:00 am
(0) Comments •
A few months back I did a series of posts demonstrating a proof of concept for implementing some basic software defined security (using AWS, Chef, and Ruby). This ended up being the basis for my KickaaS Security with APIs and Cloud talk at Black Hat.
I decided to release it as a white paper. Because I think there are too many trees in the world, and this will encourage you to print it out. Or buy a tablet – doesn’t matter to me.
Landing Page: A Practical Example of Software Defined Security
Direct Download (PDF)
In all seriousness, I hope you like it; and that this white paper, with complete content from the posts, is useful to you.
Posted at Thursday 3rd October 2013 4:02 pm
(0) Comments •
@gepeto42 had a good post:
Windows Azure SQL Database, formely known as SQL Azure, is Microsoft’s managed database platform in Azure. While it is based on Microsoft SQL Server, it has various limitations that can impact how you secure and manage it. It also has some features that can help improve security.
Helpful details. I am really hunting for more real-world cloud security examples, so please keep them coming…
Posted at Thursday 11th July 2013 4:17 pm
(0) Comments •
Many people focus (often wrongly) on the new risks of cloud computing, but I am far more interested in leveraging cloud computing to improve security.
I am deep into creating the advanced material for our Cloud Security Fundamentals class at Black Hat and want to toss out one of the tidbits we will cover. This is a bit more than a sneak peek, so if you plan to attend the class, don’t read this or you might get bored.
A couple parts of this process are useful for more than security, so I will break it up into a few shorter pieces, each as self-contained as possible. The process is very easy once you piece it together but I had a very hard time finding necessary instructions, and there are a few tricks that really racked my analyst brain. But this isn’t about SEO – I want to make it easier for future IT pros to find what they are looking for.
In this example we will automate hooking into cloud servers (‘instances’) and securely deploying a configuration management tool, including automated distribution of security credentials. Specifically, we will use Amazon EC2, S3, IAM, and OpsCode Chef; and configure them to handle completely unattended installation and configuration. This is designed to cover both manually launching instances and autoscaling.
With very minor modification you can use this process for Amazon VPC. With more work you could also use it for different public and private cloud providers, but in those cases the weakest link will typically be IAM. There are a few ways you can bridge that gap if necessary – I don’t know them all, but I do know they exist.
First, let’s define what I mean by cloud security policy compliance. That is a broad term, and in this case I am specifically referring to automating the process of hooking servers into a configuration management infrastructure and enforcing policies. By using a programmatic configuration management system like Chef we can enforce baseline security policies across the infrastructure and validate that they are in use.
For example, you can enforce that all servers are properly hardened at the operating system level, with the latest patches, and that all applications are configured according to corporate standards.
The overall process is:
- Bootstrap all new instances into the configuration management infrastructure.
- Push policies to the servers, including initial and update policies.
- Validate that policies deployed.
- Continuously scan the environment for rogue systems.
- Isolate, integrate, or remove the rogue systems.
The example we will cover in the next few posts only covers the first couple steps in detail. I have the rest mapped out but may not get it all ready in time for Black Hat – first I need to dust off some programming skills, and I learned a long time ago never to promise a release date.
All of this is insanely cool, and only the very basics of Software Defined Security.
Here is specifically what we will cover:
cloud-init to bootstrap new Amazon EC2 instances.
- Use Amazon IAM roles to provide Temporary rotating security credentials to the instance to access the initial configuration file and digital certificate for Chef.
- Automatic installation of Chef, using the provided credentials.
- Instances will use the configuration file and digital certificate to connect to a Chef server running in EC2.
- The Chef server is locked down to only accept connections from specified Security Groups.
- S3 is configured to only allow read access of the credentials from instances with the assigned IAM role.
- The tools in use and how to configure them manually.
I will start with how IAM roles work and how to configure them, next how to lock down access using IAM and Security Groups, then how to build the
cloud-init script with details on the command-line tools it installs and configures, and finally how it connects securely back up S3 for credentials.
Okay, let’s roll up our sleeves and get started…
Posted at Wednesday 10th July 2013 1:52 pm
(0) Comments •
Our last post described how to use Amazon EC2, S3, and IAM as a framework to securely and automatically download security policies and credentials. That’s the infrastructure side of the problem, and this post will show what you need to do to the instance to connect to this infrastructure, grab the credentials, install and configure Chef, and connect to the Chef server.
The advantage of this structure is that you don’t need to embed credentials into your machine image, and you can use stock (generic) operating systems are on public clouds. In private clouds it is also useful because it reduces the number of machine images to maintain.
These instructions can be modified to work in other cloud platforms, but your mileage will vary. They also require an operating system that supports
cloud-init (Windows uses
ec2config, which I know very little about, but also appears to support user data scripts).
I will walk through the details of how this works, but you won’t use any of these steps manually. They are just explanation, to give you what you need to adapt this for other circumstances.
cloud-init is software for certain Linux variants that allows your cloud controller to pass scripts to new instances as they are launched from an image (bootstrapped). It was created by Canonical (the Ubuntu guys) and is very frequently packaged into Linux machine images (AMIs).
ec2config offers similar functionality for Windows.
Users pass the script to their instances, specifying the User Data field (for web interface) or argument (for command line). It is a bit of a pain because you don’t get any feedback – you need to debug from the system log – but it works well and allows tight control. Commands run as root before anyone can even log into the instance, so
cloud-init is excellent for setting up secure configurations, loading
ssh keys, and installing software.
cloud-init is a bootstrapping tool for configuring an instance the first time it runs – it is not a management tool because after launch you cannot access it any more.
For an example see our full script at the bottom of this post.
You can download and manipulate files easily with
cloud-init, but unless you want to embed static credentials in your script there is an authentication issue. That’s where AWS IAM roles and S3 help, thanks to a very recent update to
s3cmd to use IAM roles
s3cmd is a command-line tool to access Amazon S3. Amazon S3 isn’t like a normal file share – it is only accessible through Amazon’s API.
s3cmd provides access to S3 like a local directory, as well as administration of S3. It is available in most Linux repositories for packaged installation, but the bundled versions do not yet support IAM roles. Version 1.5 alpha 2 and later add role support, so that’s what we need to use.
You can download the alpha 3 release, but if you are reading this post in the future I suggest checking for a more recent version on the main page, linked above.
s3cmd just untar the file. If you aren’t using roles you now need to configure it with your credentials. But if you have assigned a role,
s3cmd should work out of the box without a configuration file.
Unfortunately I discovered a lot of weirdness once I tried to out it in a
cloud-init script. The issue is that running it under
cloud-init runs it as
root, which changes
s3cmd’s behavior a bit. I needed to create a stub configuration file without any credentials, then use a command-line argument to specify that file.
Here is what the stub file looks like:
Seriously, that’s it.
Then you can use a command line such as:
s3cmd --config /s3cmd-1.5.0-alpha3/s3cfg ls s3://cloudsec/
Where s3cfg is your custom configuration file (you can see the path there too).
That’s all you need.
s3cmd detects that it is running in role mode and pulls your IAM credentials if you don’t specify them in the configuration file.
Scripted installation of the Chef client
The Chef client is very easy to install automatically. The only tricky bit is the command-line arguments to skip the interactive part of the install; then you copy the configuration files where they are needed.
The main instructions for package installation are in the Chef wiki. You can also use the
omnibus installer, but packaged installation is better for automated scripting. The Chef instructions show you how to add the OpsCode repository to Ubuntu so you can “
The trick is to point the installer to your Chef server, using the following code instead of a straight “
apt-get install chef-client”:
echo "chef chef/chef_server_url string http://your-server-IP:4000" \
| sudo debconf-set-selections && sudo apt-get install chef -y --force-yes
s3cmd to download
validation.pem and place them into the proper locations. In our case this looks like:
s3cmd --config /s3cmd-1.5.0-alpha3/s3cfg --force get s3://cloudsec/client.rb /etc/chef/client.rb
s3cmd --config /s3cmd-1.5.0-alpha3/s3cfg --force get s3://cloudsec/validation.pem /etc/chef/validation.pem
Tying it all together
The process is really easy once you set this up, and I went into a ton of extra detail. Here’s the overview:
- Set up your S3, Chef server, and IAM role as described in the previous post.
- Upload client.rb and validation.pem from your Chef server into your bucket. (Execute “
knife client ./” to create them).
- Launch a new instance. Select the IAM Role you set up for Chef and your S3 bucket.
- Specify your customized
cloud-init script, customized from the sample below, into the User Data field or command-line argument. You can also host the script as a file and load it from a central repository using the include file option.
If it all worked you will see your new instance registered in Chef once the install scripts run. If you don’t see it check the System Log (via AWS – no need to log into the server) to see where you script failed.
This is the script we will use for our training, which should be easy to adapt.
- &fix_routing_silliness |
public_ipv4=$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
ifconfig eth0:0 $public_ipv4 up
-- &configchef |
echo "deb http://apt.opscode.com/ precise-0.10 main" | sudo tee /etc/apt/sources.list.d/opscode.list
curl http://firstname.lastname@example.org | sudo apt-key add -
echo "chef chef/chef_server_url string http://ec2-54-218-102-48.us-west-2.compute.amazonaws.com:4000" | sudo debconf-set-selections && sudo apt-get install chef -y --force-yes
tar xvfz s3cmd-1.5.0-alpha3.tar.gz
cat >s3cfg <<EOM
./s3cmd --config /s3cmd-1.5.0-alpha3/s3cfg ls s3://cloudsec/
./s3cmd --config /s3cmd-1.5.0-alpha3/s3cfg --force get s3://cloudsec/client.rb /etc/chef/client.rb
./s3cmd --config /s3cmd-1.5.0-alpha3/s3cfg --force get s3://cloudsec/validation.pem /etc/chef/validation.pem
-- [ sh, -c, *fix_routing_silliness ]
-- [ sh, -c, *configchef]
-- touch /tmp/done
Thanks, and hopefully I didn’t drag this out too long.
Posted at Wednesday 10th July 2013 1:50 pm
(1) Comments •
Today I was mildly snarky on the Security Metrics email list when a few people suggested that instead of talking about cloud computing we should talk about shared infrastructure. In their minds, ‘shared’ = ‘cloud’. I fully acknowledge that I may be misinterpreting their point, but this is a common thread I hear. Worse yet, very frequently when I discuss security risks, other security professionals key in on multitenancy as their biggest concern in cloud computing.
To be honest it may be the least interesting aspect of the cloud from a security perspective.
Shared infrastructure and applications are definitely a concern – I don’t mean to say they do not pose any risk. But multitenancy is more an emergent property of cloud computing rather than an essential characteristic – and yes, I am deliberately using NIST terms.
In my humble opinion – please tell me if I’m wrong in the comments – the combination of resource pooling (via abstraction) and orchestration/automation creates the greatest security risk. This is primarily for IaaS and PaaS, but also can apply to SaaS when it isn’t just a standard web app.
With abstraction and automation we add a management layer that effectively network-enables direct infrastructure management. Want to wipe out someone’s entire cloud with a short
bash script? Not a problem if they don’t segregate their cloud management and harden admin systems. Want to instantly copy the entire database and make it public? That might take a little PHP or Ruby code, but well under 100 lines.
In neither of those cases is relying on shared resources a factor – it is the combination of APIs, orchestration, and abstraction.
These aren’t fully obvious until you start really spending time using and studying the cloud directly – as opposed to reading articles and research reports. Even our cloud security class only starts to scratch the surface, although we are considering running a longer version where we spend a bunch more time on it.
The good news is that these are also very powerful security enablers, as you will see later today or tomorrow when I get up some demo code I have been working on.
Posted at Tuesday 9th July 2013 1:38 am
(2) Comments •
An OpenStack Security Guide epub was released this week, and among the contributors was our friend Andrew Hay.
Trying to find this info before was like locating a piece of hay in a haystack (not an Andrew Hay – he would be considerably easier to find in a haystack). We use OpenStack for the Cloud Security Alliance training labs, and I had to figure out a lot of this myself through painful reading of barely-legible documentation.
The book was created in a 5-day sprint so it’s a little rough. Some sections are pretty light but they intend to improve it over time. The sections on hardening the Keystone identity service, picking a hypervisor, hardening core services such as the message queue, and secure networking, are pretty decent. You can’t secure OpenStack just by reading this – you need to understand the platform first – but this guide will definitely point you in the right directions.
Posted at Tuesday 2nd July 2013 2:20 pm
(0) Comments •
By Adrian Lane
It is starting to feel like summer. Both because the weather is getting warmer and because most of the Securosis team has been taking family time this week. I will keep the summary short – we have not been doing much writing and research this week.
We talk a lot about security and compliance for cloud services. It has become a theme here that, while enterprises are comfortable with SaaS (such as Salesforce), they are less comfortable with PaaS (Dropbox & Evernote, etc.), and often refuse to touch IaaS (largely Amazon AWS) … for security and compliance reasons. Wider enterprise adoption has been stuck in the mud – largely because of compliance. Enterprises simply can’t get the controls and transparency they need to meet regulations, and they worry that service provider employees might steal their $#!%. The recent Bloomberg terminal spying scandal is a soft-core version of their nightmare scenario.
As I was browsing through my feeds this week, it became clear that Amazon understands the compliance and security hurdles it needs to address, and that they are methodically removing them, one by one. The news of an HSM service a few weeks ago was very odd at first glance – it seems like the opposite of a cloud service: non-elastic, non-commodity, and not self-service. But it makes perfect sense for potential customers whose sticking point is a compliance requirement for HSM for key storage and/or generation.
A couple weeks ago Amazon announced SOC compliance, adding transparency to their security and operational practices. They followed up with a post discussing Redshift’s new transparent encryption for compute nodes, so stolen disks and snapshots would be unreadable. Last week they announced FedRAMP certification, opening the door for many government organization to leverage Amazon cloud services – probably mostly community cloud. And taking a page from the Oracle playbook, Amazon now offers training and certification to help traditional IT folks close their cloud skills gap. Amazon is doing a superlative job of listening to (potential) customer impediments and working through them.
By obtaining these certifications Amazon has made it much easier for customers to investigate what they are doing, and then negotiate a the complicated path to contract with Amazon while satisfying corporate requirements for security controls, logging, and reporting. Training raises IT’s comfort level with cloud services, and in many cases will shift detractors (IT personnel) into advocates.
But I still have reservations about security. It’s great that Amazon is addressing critical problems for AWS customers and building these critical security and compliance technologies in-house. But this makes it very difficult for customers to select non-Amazon tools for key management, encryption, logging. Amazon is on their home turf, offering real useful services optimized for their offering, with good bundled pricing. But these solutions are not necessarily designed to make you ‘secure’. They may not even address your most pressing threats because they are focused on common federal and enterprise compliance concerns. These security capabilities are clearly targeted at compliance hurdles that have been slowing AWS adoption. Bundled security capabilities are not always the best ones to choose, and compliance capabilities have an unfortunate tendency to be just good enough to tick the box.
That said, the AWS product managers are clearly on top of their game!
On to the Summary:
Webcasts, Podcasts, Outside Writing, and Conferences
Favorite Securosis Posts
Favorite Outside Posts
Research Reports and Presentations
Top News and Posts
Blog Comment of the Week
This week’s best comment goes to LonerVamp, in response to last week’s Friday Summary.
As long as Google isn’t looking at biometrics, or other ways to uniquely identify me as a product of their advertising revenues, I’m interested in what they come up with. But just like their Google+ Real Names fiasco, I distrust anything they want to do to further identify me and make me valuable for further targeted advertising.
Plus the grey market of sharing backend information with other (paying) parties. For instance, there are regulations to protect user privacy, but often the expectation of privacy is relaxed when it “appears” the a third party already knows you. For instance, if I have a set of data that includes mobile phone numbers (aka accounts) plus full real name of the owners, there can be some shady inferred trust that I am already intimate with you, and thus selling/sharing additional phone/device data with me is ok, as long as its done behind closed doors and neither of us talk about it.
Tactics like that are how these advertisers dance around regulations. Or they just plain do it until they get caught and their bottom line gets hit.
At some point I’ll just say fuck it and give up. :)
BTW, that motorist vs bicyclist article? Thanks! People suck… :( This is one reason I could argue for more full-time surveillance, since people like that will either be deterred or caught and punished as needed. Just don’t be dicks…it’s not that hard of a concept.
Posted at Thursday 30th May 2013 11:13 pm
(0) Comments •