My $500 Cloud Security Screwup—UPDATED
Update: Amazon reached out to me and reversed the charges, without me asking or complaining (or in any way contacting them). I accept full responsibility and didn’t post this to get a refund, but I’m sure not going to complain – neither is Mike.
This is a bit embarrassing to write.
I take security pretty seriously. Okay, that seems silly to say, but we all know a lot of people who speak publicly on security don’t practice what they preach. I know I’m not perfect – far from it – but I really try to ensure that when I’m hacked, whoever gets me will have earned it. That said, I’m also human, and sometimes make sacrifices for convenience. But when I do so, I try to make darn sure they are deliberate, if misguided, decisions. And there is the list of things I know I need to fix but haven’t had time to get to.
Last night, I managed to screw both those up.
It’s important to fess up, and I learned (the hard way) some interesting conclusions about a new attack trend that probably needs its own post. And, as is often the case, I made three moderately small errors that combined to an epic FAIL.
I was on the couch, finishing up an episode of Marvel’s Agents of S.H.I.E.L.D. (no, it isn’t very good, but I can’t help myself; if they kill off 90% of the cast and replace them with Buffy vets it could totally rock, though). Anyway… after the show I checked my email before heading to bed. This is what I saw:
Dear AWS Customer,
Your security is important to us. We recently became aware that your AWS Access Key (ending with 3KFA) along with your Secret Key are publicly available on github.com . This poses a security risk to you, could lead to excessive charges from unauthorized activity or abuse, and violates the AWS Customer Agreement.
We also believe that this credential exposure led to unauthorized EC2 instances launched in your account. Please log into your account and check that all EC2 instances are legitimate (please check all regions - to switch between regions use the drop-down in the top-right corner of the management console screen). Delete all unauthorized resources and then delete or rotate the access keys. We strongly suggest that you take steps to prevent any new credentials from being published in this manner again.
Please ensure the exposed credentials are deleted or rotated and the unauthorized instances are stopped in all regions before 11-Jan-2014.
NOTE: If the exposed credentials have not been deleted or rotated by the date specified, in accordance with the AWS Customer Agreement, we will suspend your AWS account.
Detailed instructions are included below for your convenience.
CHECK FOR UNAUTHORIZED USAGE To check the usage, please log into your AWS Management Console and go to each service page to see what resources are being used. Please pay special attention to the running EC2 instances and IAM users, roles, and groups. You can also check “This Month’s Activity” section on the “Account Activity” page. You can use the dropdown in the top-right corner of the console screen to switch between regions (unauthorized resources can be running in any region).
DELETE THE KEY If are not using the access key, you can simply delete it. To delete the exposed key, visit the “Security Credentials” page. Your keys will be listed in the “Access Credentials” section. To delete a key, you must first make it inactive, and then delete it.
ROTATE THE KEY If your application uses the access key, you need to replace the exposed key with a new one. To do this, first create a second key (at that point both keys will be active) and modify your application to use the new key. Then disable (but not delete) the first key. If there are any problems with your application, you can make the first key active again. When your application is fully functional with the first key inactive, you can delete the first key. This last step is necessary - leaving the exposed key disabled is not acceptable.
Alex R. aws.amazon.com
I bolted off the couch, mumbling to my wife, “my Amazon’s been hacked”, and disappeared into my office. I immediately logged into AWS and GitHub to see what happened.
Lately I have been expanding the technical work I did for my Black Hat presentation, I am building a proof of concept tool to show some DevOps-style Software Defined Security techniques. Yes, I’m an industry analyst, and we aren’t supposed to touch anything other than PowerPoint, but I realized a while ago that no one was actually demonstrating how to leverage the cloud and DevOps for defensive security. Talking about it wasn’t enough – I needed to show people.
The code is still super basic but evolving nicely, and will be done in plenty of time for RSA. I put it up on GitHub to keep track of it, and because I plan to release it after the talk. It’s actually public now because I don’t really care if anyone sees it early.
The Ruby program currently connects to AWS and a Chef server I have running, and thus needs credentials. Stop smirking – I’m not that stupid, and the creds are in a separate configuration file that I keep locally. My first thought was that I screwed up the
.gitignore and somehow accidentally published that file.
Nope, all good. But it took all of 15 seconds to realize that a second
test.rb file I used to test smaller code blocks still had my Access Key and Secret Key in a line I commented out. When I validated my code before checking it in, I saw the section for pulling from the configuration file, but missed the commented code containing my keys.
Back to AWS.
I first jumped into my Security Credentials section and revoked the key. Fortunately I didn’t see any other keys or access changes, and that key isn’t embedded in anything other than my dev code so deleting it wouldn’t break anything. If this was in a production system it would have been very problematic.
Then I checked my running instances. Nothing in
us-east-1 where I do most of my work, so I started from the top of the list and worked my way down.
There is was. 5 extra large instances in
us-west-1. 5 more in Ireland. All had been running for 72 hours, which, working from my initial checkin of the code on GitHub, means the bad guys found the credentials within about 36 hours of creating the project and loading the files.
Time for incident response mode. I terminated all the instances and ran through every region in AWS to make sure I didn’t miss anything. The list was:
- Lock down or revoke the Access Key.
- Check all regions for running instances, and terminate anything unexpected (just the 10 I found).
- Snapshot one of the instances for forensics. I should have also collected metadata but missed that (mainly which AMI they used).
- Review IAM settings. No new users/groups/roles, and no changes to my running policies.
- Check host keys. No changes to my existing ones, and only a generic one in each region where they launched the new stuff.
- Check security groups. No changes to any of my running ones.
- Check volumes and snapshots. Clean, except for the boot volumes of those instances.
- Check CloudTrail. The jerks only launched instances in regions not supported, so even though I have CloudTrail running I didn’t collect any activity.
- Check all other services. Fortunately, being my dev/test/teaching account, I know exactly what I’m running. So I jumped into the Billing section of my account and confirmed only the services I am actively using were running charges. This would be a serious pain if you were actively using a bunch of different services.
- Notice the $500 in accumulated EC2 charges. Facepalm. Say naughty words.
I was lucky. The attackers didn’t mess with anything active I was running. That got me curious, because 10 extra large instances racking up $500 in 3 days initially made me think they were out to hurt me. Maybe even embarrass the stupid analyst. Then I went into forensics mode.
Without CloudTrail I had no way to look for the origin of the API calls using that Access Key. Aside from the basic metadata (some of which I lost because I forgot to record it when I terminated the instances – a note to my future self). What I did have was a snapshot of their server. Forensics isn’t my expertise but I know a few basics.
I launched a new (micro) instance, and then created and attached a new volume based on the snapshot. I mounted the volume and started poking around. Since I still had the original snapshot, I didn’t need to worry about altering the data on the volume. I can always create and attach a new one.
The first thing I did was check the logs. I saw a bunch of references to CUDA 5.5. Uh oh, that’s GPU stuff, and explains why they launched a bunch of extra larges. The bad guys didn’t care about me, and weren’t out to get me specifically, as you will see in a sec. Then I checked the user accounts. There was only one, ec2-user, which is standard on Amazon Linux instances. The home directory had everything I needed:
root@ip-10-160-125-203:/forensics/home/ec2-user# ls cpuminer CudaMiner tor-0.2.4.20.tar.gz cuda_5.5.22_linux_64.run tor-0.2.4.20
I didn’t need to check DuckDuckGo to figure that one out (but I did to be sure). Looks like some Litecoin/Bitcoin mining. That explains the large instances. Poking around the logs I also found the IP address ‘220.127.116.11’, which shows as Latvia. Not that I can be certain of anything because they also use Tor.
I could dig in more but that told me all I needed to know. (If you want a copy of the snapshot, let me know). Here’s my conclusion:
Attackers are scraping GitHub for AWS credentials embedded in code (and probably other cloud services). I highly doubt I was specifically targeted. They then use these to launch extra large instances for Bitcoin mining, and mask the traffic with Tor.
Someone mentioned this in our internal chat room the other day, so I know I’m not the first to write it up.
Here is where I screwed up:
- I did not have billing alerts enabled. This is the one error I knew I should have dealt with earlier but didn’t get around to. I paid the price for complacency.
- I did not completely scrub my code before posting to GitHub. This was a real mistake – I tried and made an error. I blame my extreme sleep deprivation, and a lot of this work was done over the holidays, with a lot of distractions. I was sloppy.
- I used an Access key in my dev code without enough (any) restrictions. This was also on my to-do list, but after I finished up more of the code because a lot of what I’m building needs extensive permissions and I can’t predict them all ahead of time. Managing AWS IAM is a real pain so I was saving it for the end.
Here is how I am fixing things:
- Billing alerts are now enabled, with a threshold just above my monthly average.
- I am creating an IAM policy and Access Key that restricts the application to my main development region, and removes the ability to make IAM changes or access/adjust CloudTrail.
- All dev work using that Access Key will be in a region that supports CloudTrail.
- When I am done with development I will create a more tailored IAM policy that only grants required operations. If I actually designed this thing ahead of time instead of hacking on it a little every day I could have done this ahead of time.
In the end I was lucky. I am only out $500 (well, Securosis is), 45 minutes of investigation and containment effort, and my ego (which has plenty of deflation room). They could have much mucked with my account more deeply. They couldn’t lock me out with an Access Key, but still could have cost me many more hours, days, and cash.
Lesson learned. Not that I didn’t know it already, but I suppose a hammer to the head every now and then helps keep you frosty.
I’d like to thank “Alex R.”, who I assume is on the AWS security team. I am also impressed that AWS monitors GitHub for this, and then has a process to validate whether the keys were potentially used, to help point compromised customers in the right direction.