Securosis

Research

It’s Time for a Microsoft Trustworthy Cloud Initiative

“All cloud security failures are IAM failures, and all IAM failures are governance failures.” — me on Twitter (too many years ago to find) CISA just released their report on the big Summer 2023 Microsoft Exchange Online Intrusion. You could call it blistering, but I call it more of a third degree plasma burn. It’s also the kind of validation I wish never had to happen. Like many other cloud security professionals, I have been concerned with the security of Microsoft’s cloud (Azure/Office). When I first started using Azure I noticed it tended towards more-open and less-secure defaults. For example, the default for running a VM in a VNet was… no Network Security Groups. The VM would be wide open to the Internet for both inbound and outbound traffic. In AWS and GCP you can’t even deploy anything without an SG attached. (The portal does now try to get you to deploy with an NSG). Other examples? The Azure activity log doesn’t record Read activity, so you can’t identify reconnaissance. Then there are the series of security flaws discovered by the teams and Wiz, Orca, and others. The report has great detail, but the structural issues and recommendations are the real highlights. Here are the ones I think stand out — which have implications (both good and bad) beyond Microsoft. It’s a governance failure: The Board concluded that Microsoft’s security culture was inadequate (page 17). Because features and innovation are prioritized over security: as written in stone by the first cave dwellers. Other CSPs have better security practices: Don’t blame me, it’s item 3 on page 17, and no surprise to those of us who do this for a living. Microsoft did not correct inaccurate information and still does not know what happened: This means multiple failures at multiple levels. Page 17, again. There has been more than one nation-state breach: We knew this, and they refer to Midnight Blizzard. The mistakes there are also… troubling. The Board believes Microsoft has deprioritized security and risk management: Bottom of page 18. The Board recommends Microsoft slow innovation until they fix security: It’s been done before, but I’m not sure how Copilot feels about that. The report then mentions the Microsoft Secure Future Initiative. I wrote on LinkedIn when that came out that it seemed inadequate. It’s like a Band-Aid when you need a tourniquet. The report goes into more detail on some specific security practices it recommends changing; but also seems to indicate they consider other cloud providers to be doing a better job with security around keys, tokens, and credentials. I can only assume they also know about SAS tokens. I mean, this report is rough, and anyone using Azure and Office needs to read it. And yes, I do use both myself for various things, but I’m not… a bank or the United States Government. Outside Microsoft specifically, there are some things in the report that make us cloud security types scream “I KNEW IT! I TOLD YOU SO!!!” at our screens: NIST needs to update 800-53 for cloud: Page 21, and if you know me you’ve heard me complaining about that for years. M&A is a security risk: Okay, Chris Farris and I are literally days from publishing a thing which might just call M&A a threat. CSPs need to stop charging for security-relevant logs: I’m screaming religious words right now. Which is weird, since I’m an atheist. CSPs should be transparent and report incidents and ALL vulnerabilities: Another one that’s an issue beyond Microsoft. CSPs and the government should have better victim notification: This is interesting and unexpected. They straight up call for non-spoofable mobile notifications. The government is watching and should use FedRAMP and its buying power to incentivize change: The original Trustworthy Computing Initiative was largely the result of serious government… threats?… to look at alternative operating systems. It’s time for a replay. It’s time for a Microsoft Trustworthy Cloud Initiative. Especially if they want us to trust them to be the leading AI provider. And FREE THE LOGS!!! Adding link to Joseph Menn’s Washington Post article. He’s banned in Russia so you know you can trust him. Share:

Share:
Read Post

Resolve 90% of Cloud Incidents with RECIPE PICKS

As any long-time readers know, I constantly abuse my past experiences and hobbies to try and make my current work sound WAY more interesting than it probably is. Or maybe it’s just an ego thing, I don’t want to think too hard about it. But, on occasion, lessons from my parallel lives actually inspire some original work. As a paramedic and a pilot I have had to memorize many dozens of mnemonics, and I’ve forgotten many more. Mnemonics are proven to be highly effective memory devices even in the midst of intense stress, like flying a plane or working a 9-1-1 call. For example, I learned “SAMPLE” for taking a patient’s history probably 30 years ago and I still use it today because in the insanity that is some calls it can be easy to lose track and forget a fundamental. This I always remember to ask about Signs and Symptoms, Allergies, Medications, Prior medical history, Last oral intake, and Event (why did they call us today?). Having issues ventilating an intubated patient? Use DOPE. Accidentally put your airplane into a spin? Use PARE (Power, Aileron, Rudder, Elevator). The more you drill these the better they work. I memorized RAKETS for my private pilot checkride but I definitely need to look that one up (it’s used to figure out if you can still fly a plane with a broken part). We don’t really use these in infosec, and I think it’s time to change that. Thus I present to you RECIPE PICKS for cloud incident response. This one hit me yesterday on an internal dev review call in one window while finishing my paramedic recertification in an open browser tab. For 4 years now here is how I’ve taught what to look for first in a cloud incident: I have the students leave that one up when we start the scenarios and live fire exercises. But standing in the shower I came up with a much better way to remember what to do. NOTE: the order doesn’t matter, as with SAMPLE it’s to make sure you don’t miss anything (the format breaks a little at the end due to this sites rendering, sorry):               Resource (current config/state)               Events (api call(s) on that resource)               Changes (diff plus associated API calls)               Identity (who made the triggering change or API call)               Permissions (of the identity; informs the blast radius)               Entitlements (of the resource: e.g. it’s IAM role or managed identity)               Public (is it public?)               IP (all API calls from that IP address)              Caller (all other API calls from the calling identity) tracK (look for indications of a pivot; e.g. role chaining) forenSics (on a resource, or digging into resource logs) These steps shouldn’t be done in order, except the last two probably need to be the last two (especially the forensics). This is all based on the process I’ve figured out over the years and I estimate you can probably close 90% of incidents relatively quickly by pulling this data. I’m definitely going to start trying to build more of these into my trainings, and I’ll do some more blog posts in the coming weeks on how to use RECIPE PICKS. I’d also be remiss if I didn’t link over to a work blog post on how our platform does most of this automatically on every incident. Let me know what you think and if I missed anything. Just email rmogull@securosis.com since I have comments turned off due to all the ridiculous spam. Share:

Share:
Read Post

Check out the shiny new Cloud Security Maturity Model 2.0!

I’m pretty excited about this one. We are finally releasing version 2.0 of the Cloud Security Maturity Model. This is the culmination of nearly 9 months of research and analysis, a massive update from the original released in 2020. The tl;dr is that this version is not only updated to reflect current cloud security practices, but it includes around 100 cloud security control objectives to use as Key Performance Indicators — each matched 1:1 (where possible) with a technical control you can assess (AWS for now— we plan to expand to Azure and GCP next). You can download it here — no registration wall, and it includes the spreadsheet and PDFs. The CSMM 2.0 was developed by Securosis (that’s us!) and IANS Research in cooperation with the Cloud Security Alliance. Version 2.0 underwent a public peer review process at the CSA and internal review at IANS. We will keep updating it based on public feedback. The model includes nearly 100 control objectives and controls, organized into 12 Categories in 3 Domains. IANS released a free diagnostic self-assessment survey tool. You can quickly and easily generate a custom maturity report. FireMon added a free CSMM dashboard to Cloud Defense, which will automatically assess, rate, and track your cloud maturity using the CSMM! It’s really cool. But I’m biased because I pushed hard to build it. Okay, that’s what it is, but here’s why you should care. When Mike and I first built the CSMM we designed it more as a discussion tool to describe the cloud journey. Then we started using it with clients and realized it also worked well as a framework to organize a cloud security program. Two of the big issues with cloud governance we’ve seen in the decade-plus we’ve been doing this are: Existing security frameworks have been extended to cloud, but not designed for cloud, which creates confusion because they lack clear direction. Those don’t tell you “do this for cloud” — they tell you “add this cloud stuff”. We saw need for a cloud-centric view. Security teams quickly get tossed into cloud, and while tooling has improved immensely over time, those flood you with data and don’t tell you where to start. We don’t lack tools, but we do lack priorities. Version 2.0 of the CSMM was built directly to address these issues. We reworked the CSMM to work as a cloud security framework. What does that mean? The model focuses on the 12 main categories of cloud security activities, which you can use to organize your program. The maturity levels and KPIs then help define your goals and guide your program without the minutiae of handling individual misconfigurations. What’s the difference between the Diagnostic and the Dashboard? The IANS diagnostic is where you should start. It’s a survey tool anyone can fill out without technical access to their deployments. The objective of the diagnostic is to help you quickly self-assess your program and then, using that, determine your maturity objectives. Let’s be realistic — not all organizations can or should be at “Level 5”. The diagnostic helps set realistic goals and timelines, based on where you are now. The FireMon Cloud Defense CSMM Dashboard is a quantitative real-time assessment and tracking tool. Once you integrate it with your cloud accounts you’ll have a dashboard to track maturity for the entire organization, different business units, and even specific accounts. It’s the tool to track how you are meeting the goals established with the diagnostic. It’s self-service and covers as many AWS accounts as you have (Azure will be there once the CSMM adds Azure controls). You can also just use the CSMM spreadsheet. Options are good. Free options are better. Finally, please send me your feedback. These are living documents and tools, and we plan to keep them continuously updated. The usual disclosure: I’m an IANS faculty member and I manage the Cloud Defense product. But both of these are available absolutely free, no strings attached, as is the model itself. Share:

Share:
Read Post

I Broke the 3-2-1 Rule and Almost Paid The Price!

This post isn’t about some fancy new research. Consider it a friendly nudge to floss. I’m pretty Type A about backing up and have data going back 20+ years at this point. I’m especially particular about my family photos. Until yesterday (this is called foreshadowing) my strategy was: Time Machine running on a Drobo for my main Mac Drobo as a company is dead, but this is a direct attached 5D, which has worked well and has enough capacity that I can lose drives and recover (which has happened). The Drobo as mass storage for the large files I don’t store on my SSD. Archives, VMs, videos. A WD MyBook with 12 TB, also directly connected to my Mac. Data replicated from there using Carbon Copy Cloner. Backblaze for cloud backups. With a personal encryption key. iCloud (I’m on the 6TB plan) for all my photos and related iCloud stuff. iCloud is synced across multiple systems. Box for Securosis corporate documents. Some older S3/Glacier archives. Probably more. I’m old and forget things. My entire house could burn down and I shouldn’t lose anything. But I broke the 3-2-1 rule. The 3-2-1 rule of backups is 3 copies of everything, at least 2 of them local, and 1 offsite. My Drobo died. Completely and suddenly. Not a single drive, but the entire thing. And the moment it happened I couldn’t remember whether I was backing up ALL of the Drobo anywhere else. It was RAID — what were the odds of losing the entire device? I knew I needed to replace it soon because the drivers weren’t being updated, but I kept putting it off. Well okay, I should be fine with my CCC backups… except that wasn’t set as a scheduled job, and I was only replicating one of the Drobo partitions. The other partitions? Well, one of them had my in-progress CloudSLAW video for next week and a demo video for the new CSMM feature we are releasing at work (remember, foreshadowing). Two time-sensitive things I REALLY didn’t want to recreate. Cloud/Backblaze to the Rescue and My New Strategy It turns out I really was sending everything from every drive to the cloud, and keeping versions for a year. It cost me just over $100 (for a single machine). I’ve never thought much about it, but all the data was there. The clincher was fast, selective restore. I was able to directly what I needed, including the video files, and download a .zip in less than an hour. Then I ordered a Synology, and I’ll go through the longer restore process once that arrives. Does this mean I can skip keeping 2 local versions on separate devices? And doesn’t RAID count as 2 devices? Nope and nope. But here’s my strategy and reasoning: an evolution of the 3-2-1 rule: Family photos and things I never want to lose are stored on 2-4 local devices and at least 2 different cloud providers, with occasional archives to a third provider. My iCloud Photos sync to my Mac. That’s backed up to via Time Machine and to the (soon to arrive) Synology. It also goes to Backblaze, and a couple times a year I archive to S3. All critical business documents are in 2 cloud services. That’s Box, and since I sync the files locally, they also land in my cloud backups of my local drive. Code and other documents are in places like GitHub and OneDrive, depending on which hat I’m wearing. I just make sure there are 2 of everything at 2 different services. A bootable image of my working Mac. I use Carbon Copy Cloner for this. I’m not as religious about it because I can fully work off my laptop when needed. Archived and media files are single copies on the RAID, but the RAID is backed up to cloud, from where I can selectively restore. These are the things I am okay with not having right away. UPDATE: I will now keep my working video files on a second local drive. This will be directly attached to my Mac, and backed up to both the cloud and the new RAID (Synology), which will be network attached instead of directly connected. So, 3-5 copies of all files. 1-3 local based on priority, 1-3 in cloud, also based on priority. Baby pics are 3 local, 3 in different cloud services. Full system is 2 local, 1 bootable. Work documents at 2 cloud services, at least one with versioning. Large “working” (media) files are 2 local, one on fast storage and the other RAID. Mass storage is 1 local (RAID) and 1 versioned copy in cloud. All critical work applications should be on 2 systems (laptop/desktop, and for me I do a ton on iPad). I lucked out this time. I really did not remember sending the Drobo files to Backblaze, and had a brief panic attack. And I hadn’t used selective restore previously, which helped me rapidly find and download the working files I needed. I’m gonna go floss now. Share:

Share:
Read Post

Regression to the Fundamentals

After 25 years in technology, mostly in security, I recently realized I’m regressing. No, not in terms of my mental acuity or health (although all of you would be better judges on my brain function), but more in terms of my career. And no, I don’t mean I’m going back to the Helpdesk… and according to my children and most of my family I never really left anyway. Not that I’m paid for it. Well, sometimes with some cookies. But never enough cookies. It’s just that the longer I do this the more I realize that it’s the fundamentals that really matter. That as much as I love all the fun advanced research, all that work really only addresses and helps a relatively small percentage of the world. The hard problems aren’t the hard problems; the hard problems are solving the easy problems consistently. We mostly suck at that. What’s fascinating is that this isn’t a problem limited to security. I really noticed it recently when I was working on my paramedic recertification. As a paramedic I can do all sorts of advanced things that involve drugs, electricity, and tubes. In some cases, especially cardiac arrest, the research now shows that you, the bystander, starting good quality CPR early is far more important than me injecting someone with epinephrine. In fact, studies seem to indicate that epi in cardiac arrest does not improve long term patient outcomes. CPR and electricity (AEDs) for the win. Advanced clinicians for myself? Useful and necessary, but useless without the fundamentals before we get there. Back to security. As a researcher (and a vendor) we are drawn to the hard problems. I’m not saying they don’t matter — they very much do. As much as AI is in the hype machine right now it’s there for a reason and we need experts engaged early, even if most of what they’ll do will fail because AI is a truly disruptive innovation. If you don’t believe me just re-read this sentence after the 2024 election. And some basic problems need new innovations instead of banging our heads against the wall. Passwordless is a great example of attacking an intractable problem with hard engineering that is invisible to users. As much as I’d like to be doing more leading-edge research, I keep finding myself focusing on the basics, and trying to help other people do the basics better. Let’s take cloud incident response, my current bread and butter. Will Bengtson and I keep coming up with all sorts of cool, advanced cloud attacks to include in our IR training at Black Hat. The reality is those are mostly there so people think we are smart and to keep the rare advanced students interested. Nearly all cloud attacks a student working on a real IR team will encounter are the same two or three “simple” things. Lost or stolen credentials used for crypto, ransomware, or data exfiltration, or hacking a vulnerable public-facing instance for… crypto,  ransomware, or data exfiltration. Instead of spending my time on leading-edge research I’m building training for people with zero experience. I’m working on simple models which hopefully help people focus better. On the product side I’m focusing more on basic problems that seem to slip through the gaps. Chris Farris and I are working on a new talk and threat modeling approach to focus consistently on the fundamentals which really matter, not all the crazy advanced stuff in your inbox every day. Researchers and research teams mostly publish on the fun, interesting and advanced things because that’s more intellectually interesting and gets the headlines. There’s nothing wrong with that — we need it — but never forget that the basics matter more. I still get FOMO from time to time, but in the end I can do a lot more good at a much larger scale focusing on helping with fundamentals. Simple isn’t sexy, but without plumbers we’re all covered in shit pretty damn quickly. As a paramedic the one thing we are exceptional at is facing utter chaos, identifying what will kill you, and keeping things from getting worse. Maybe I biased my career from the start. Chris says he objects to being called a simple problem. Please humor him. Will just asked that I spell his name correctly.   Share:

Share:
Read Post

Is This Thing Still On?

I started a blog in 2006. This blog, to be precise. I kinda just wanted a blog. Blogs were cool. Twitter wasn’t really a thing yet. YouTube was only like a year old. The iPhone was hiding in an engineering and design lab. I didn’t expect securosis.com to be around 18 years later. I certainly didn’t expect it would become my full time job for 15 of those years. I most definitely didn’t expect to take on partners, spin out a product startup, have kids, lose my hair, grow… other hair, lose a partner (to a bank, not the grave, if there’s a difference), and, as of last weekend, migrate the entire site to our fourth hosting provider and third new software stack without losing any significant content. And most embarrassing of all, I didn’t expect to not write on my own site for… 3 years. But that’s what happens when you build a startup that gets acquired (and I still work there full time), your consulting customers keep you super busy with hands-on technical projects, and you spend a chunk of the pandemic running around playing paramedic. Oh, and when your kids hit the age where you and your wife effectively become unpaid ride share drivers. Now it’s time to come home. I’m still working and writing at FireMon and other places, but thanks to the success of CloudSLAW (my lab a week newsletter/blog/YouTube channel) I have the itch to just start blogging about random non-day-job security stuff again. I also have some new research on the way, and maybe some friends will be dropping in. Securosis (the company) is just for side projects now, and weirdly I think that gives me a freedom in my writing I forgot about. We just moved the site and I’m slowly updating things. In the coming weeks I also plan to pull some old posts from the 18-year history of this site and rip them to shreds with my modern knowledge and sensibilities. I hope some of you stick around for the ride, but I plan to have fun no matter what. Share:

Share:
Read Post

Understanding COVID, ARDS, and Mechanical Ventilation

April 7 Update: some research is emerging since I posted this that COVID related ARDS is not typical ARDS. Here’s the medical reference for providers but it’s very early evidence so far we should keep an eye on: COVID-19 Does Not Lead to a “Typical” ARDS. This was further validated by an article in MedScape that previews some emerging peer-reviewed research. Thus while my explanations of ARDS and ventilators is accurate, the ties to COVID-19 are not and new treatment protocols are emerging. Although this is a security blog, this post has absolutely nothing to do with security. No parallels from medicine, no mindset lessons, just some straight-up biology. As many readers know I am a licensed Paramedic. I first certified in the early 1990’s, dropped down to EMT for a while, and bumped back up to full medic two years ago. Recently I became interested in flight and critical care and completed an online critical care and flight medic course from the great team at FlightBridgeED. Paramedics don’t normally work with ventilators – it is an add-on skill specific for flight and critical care (ICU) transports. I’m a neophyte to ventilator management, with online and book training but no practice, but I understand the principles, and thanks to molecular biology back in college, have a decent understanding of cellular processes. COVID-19 dominates all our lives now, and rightfully so. Ventilators are now a national concern and one the technology community is racing to help with. Because of my background I’ve found myself answering a lot of questions on COVID-19, ARDS, and ventilators. While I’m a neophyte at running vents, I’m pretty decent at translating complex technical subjects for non-experts. Here’s my attempt to help everyone understand things a bit better. The TL;DR is that COVID-19 damages the lungs, which for some people triggers the body to overreact with too much inflammation. This extra fluid interferes with gas exchange in the lungs, and oxygen can’t as easily get into the bloodstream. You don’t actually stop breathing, so we use the ventilators to change pressure and oxygen levels, in an attempt to diffuse more oxygen through this barrier and into the lungs without, causing more damage by overinflating them. We start with respiration Before we get into COVID and ventilators we need to understand a little anatomy and physiology. Cells need oxygen to convert fuel into energy. Respiration is the process of getting oxygen into cells and removing waste products – predominantly CO2. We get oxygen from our environment and release CO2 through ventilation: air moving in and out of our lungs. Those gases are moved around in our blood, and the actual gas exchange occurs in super-small capillaries which basically wrap around our cells. The process of getting blood to tissues is called perfusion. Theis is all just some technical terminology to say: our lungs take in oxygen and release carbon dioxide, we move the gases around using our circulatory system, and we exchange gases in and out of cells in super-small capillaries. Pure oxygen is a toxin, and CO2 diffused in blood is an acid, so our bodies have all sorts of mechanisms to keep things running. Everything works thanks to diffusion and a few gas laws (Graham’s, Henry’s, and Dalton’s are the main ones). Our lungs have branches and end in leaves called alveoli. Alveoli are pretty wild – they have super-thin walls to allow gases to pass through, and are surrounded by capillaries to transfer gasses into and out of our blood. They look like clumps of bubbles, because they maximize surface area to facilitate the greatest amount of gas exchange in the smallest amount of space. Healthy alveoli are covered in a thin liquid called surfactant, which keeps them lubricated so they can open and close and slide around each other as we breathe. Want to know one reason smokers and vapers have bad lungs? All those extra chemicals muck up surfactant, thicken cell walls, and cause other damage. In smokers a bunch of the alveoli clump together, losing surface area, in a process called atelectasis (remember that word). Our bodies try to keep things in balance, and have a bunch of tools to nudge things in different directions. The important bit for our discussion today is that ventilation is managed through how much we breathe in for a given breath (tidal volume), and how many times a minute we breathe (respiratory rate). This combination is called our minute ventilation and is normally about 6-8 liters per minute. This is linked to our circulation (cardiac output), which is around 5 liters per minute at rest. The amount of oxygen delivered to our cells is a function of our cardiac output and the amount of oxygen in our blood. We need good gas exchange with our environment, good gas exchange into our bloodstream, and good gas exchange into our cells. COVID-19 screws up the gas exchange in our lungs, and everything falls apart from there. Acute Respiratory Distress Syndrome ARDS is basically your body’s immune system gone haywire. It starts with lung damage – which can be an infection, trauma, or even metabolic. One of the big issues with ventilators is that they can actually cause ARDS with the wrong settings. This triggers an inflammatory response. A key aspect of inflammation is various chemical mediators altering cell walls, especially those capillaries – and then they start leaking fluid. In the lungs this causes a nasty cascade: Fluid leaks from the capillaries and forms a barrier/buffer of liquid between the alveoli and the capillaries, and separates them. This reduces gas exchange. Fluid leaks into the alveoli themselves, further inhibiting gas exchange. The cells are damage by all this inflammation, triggering another stronger immune response. Your body is now in a negative reinforcement cycle and making things worse by trying to make them better. This liquid and a bunch of the inflammation chemicals dilute the surfactant and damage the alveolar walls, causing atelectasis. In later stages of ARDS your

Share:
Read Post

Mastering the Journey—Building Network Manageability and Security for your Path

This is the third post in our series, “Network Operations and Security Professionals’ Guide to Managing Public Cloud Journeys”, which we will release as a white paper after we complete the draft and have some time for public feedback. You might want to start with our first and second posts. Special thanks to Gigamon for licensing. As always, the content is being developed completely independently using our Totally Transparent Research methodology. Learning cloud adoption patterns doesn’t just help us identify key problems and risks – we can use them to guide operational decisions to address the issues they consistently raise. This research focuses on managing networks and network security, but the patterns include broad security and operational implications which cover all facets of your cloud journey. Governance issues aside, we find that networking is typically one of the first areas of focus for organizations, so it’s a good target for our first focused research. (For the curious, IAM and compliance are two other top areas organizations focus on, and struggle with, early in the process). Recommendations for a Safe and Smooth Journey Developer Led Mark sighed with relief and satisfaction as he validated the VPN certs were propagated and approved the ticket for firewall rule change. The security group was already in good shape and they managed to avoid having to add any kind of direct connect to the AWS account for the formerly-rogue project. He pulled up their new cloud assessment dashboard and all the critical issues were clear. It would still take the IAM team and the project’s developers a few months to scale down unneeded privileges but… not his problem. The federated identity portal was already hooked up and he would get real time alerts on any security group changes. “Now onto the next one,” he mumbled after he glanced at his queue and lost his short-lived satisfaction. “Hey, stop complaining!” remarked Sarah, “We should be clear after this backlog now that accounting is watching the credit cards for cloud charges; just run the assessment and see what we have before you start complaining.” Having your entire organization dragged into the cloud thanks to the efforts of a single team is disconcerting, but not unmanageable. The following steps will help you both wrangle the errant project under control, and build a base for moving forward. This was the first adoption pattern we started to encounter a decade ago as cloud starting growing, so there are plenty of lessons to pull from. Based on our experiences, a few principles really help manage the situation: Remember that to meet this pattern you should be new to either the cloud in general, or to this cloud platform specifically. These are not recommendations for unsanctioned projects covered by your existing experience and footprint. Don’t be antagonistic. Yes, the team probably knew better and shouldn’t have done it… but your goal now is corrective actions, not punitive. You goal is to reduce urgent risks while developing a plan to bring the errant project into the fold. Don’t simply apply your existing policies and tooling from other environments to this one. You need tooling and processes appropriate for this cloud provider. In our experience, despite the initial angst, these projects are excellent opportunities to learn your initial lessons on this platform, and to start building out for a larger supported program. If you keep one eye on immediate risks and the other on long-term benefits, everything should be fine. The following recommendations go a long way towards reducing risks and increasing your chances of success. But before the bullet points we have one overarching recommendation: As you gain control over the unapproved project, use it to learn the particulars of this cloud provider and build out your core cloud management capabilities. When you assess, set yourself up to support your next ten assessments. When you enable monitoring and visibility, do so in a way which supports your next projects. Wherever possible build a core service rather than a one-off. Step one is to figure out what you are dealing with: How many environments are involved? How many accounts, subscriptions, or projects? How are the environments structured? This involves mapping out the application, the PaaS services offered by the provider (they offer PaaS services such as load balancers and serverless capabilities), the IAM, the network(s), and the data storage. How are the services configured? How are the networks structured and connected? The Software Defined Networks (SDN) used by all major cloud platforms only look the same on the surface – under the hood they are quite a bit different. And, most importantly, Where does this project touch other enterprise resources and data?!? This is essential for understanding your exposure. Are there unknown VPN connections? Did someone peer through an existing dedicated network pipe? Is the project talking to an internal database over the Internet? We’ve seen all these and more. Then prioritize your biggest risks: Internet exposures are common and one of the first things to lock down. We commonly see resources such as administrative servers and jump boxes exposed to the Internet at large. In nearly every single assessment we find at least one instance or container with port 22 exposed to the world. The quick fix for these is to lock them down to your known IP address ranges. Cloud providers’ security groups are very effective because they just drop traffic which doesn’t meet the rules, so they are an extremely effective security control and a better first step than trying to push everything through an on-premise firewall or virtual appliance. Identity and Access Management is the next big piece to focus on. This research is focused more on networking, so we won’t spend much time on this here. But when developers build out environments they almost always over-privilege access to themselves and application components. They also tend to use static credentials, because unsanctioned projects are unlikely to integrate into your federated identity management. Sweep out static credentials, enable federation, and turn

Share:
Read Post

Defining the Journey—the Four Cloud Adoption Patterns

This is the second post in our series, “Network Operations and Security Professionals’ Guide to Managing Public Cloud Journeys”, which we will release as a white paper after we complete the draft and have some time for public feedback. You might want to start with our first post. Special thanks to Gigamon for licensing. As always, the content is being developed completely independently using our Totally Transparent Research methodology. Understanding Cloud Adoption Patterns Cloud adoption patterns represent the most common ways organizations move from traditional operations into cloud computing. They contain the hard lessons learned by those who went before. While every journey is distinct, hands-on projects and research have shown us a broad range of consistent experiences, which organizations can use to better manage their own projects. The patterns won’t tell you exactly which architectures and controls to put in place, but they can serve as a great resource to point you in the right general direction and help guide your decisions. Another way to think of cloud adoption patterns is as embodying the aggregate experiences of hundreds of organizations. To go back to our analogy of hiking up a mountain, it never hurts to ask the people who have already finished the trip what to look out for. Characteristics of Cloud Adoption Patterns We will get into more descriptive detail as we walk through each pattern, but we find this grid useful to define the key characteristics. Characteristics Developer Led Data Center Transformation Snap Migration Native New Build Size Medium/Large Large Medium/Large All (project-only for mid-large) Vertical All (minus financial and government) All, including Financial and Government Variable All Speed Fast then slow Slow (2-3 years or more) 18-24 months Fast as DevOps Risk High Low(er) High Variable Security Late Early Trailing Mid to late Network Ops Late Early Early to mid Late (developers manage) Tooling New + old when forced Culturally influenced; old + new Panic (a lot of old) New, unless culturally forced to old Budget Owner Project based/no one IT, Ops, Security IT or poorly defined Project-based, some security for shared services Size: The most common organization sizes. For example developer-led projects are rarely seen in small startups, because they can skip directly to native new builds, but common in large companies. Vertical: We see these patterns across all verticals, but in highly-regulated ones like financial services and government, certain patterns are less common due to tighter internal controls and compliance requirements. Speed: The overall velocity of the project, which often varies during the project lifetime. We’ll jump into theis more, but an example is developer-led, where initial setup and deployment are very fast, but then wrangling in central security and operational control can take years. Risk: This is an aggregate of risk to the organization and of project failure. For example in a snap migration everything tends to move faster than security and operations can keep up, which creates a high chance of configuration error. Security: When security is engaged and starts influencing the project. Network Ops: When network operations becomes engaged and starts influencing the project. While the security folks are used to being late to the party, since developers can build their own networks with a few API calls, this is often a new and unpleasant experience for networking professionals. Tooling: The kind of tooling used to support the project. “New” means new, cloud-native tools. “Old” means the tools you already run in your data centers. Budget Owner: Someone has to pay at some point. This is important because it represents potential impact on your budget, but also indicates who tends to have the most control over a project. Characteristics of Cloud Adoption Patterns In this section we will describe what the patterns look like, and identify some key risks. In our next section we will offer some top-line recommendations to improve your chances of success. One last point before we jump into the patterns themselves: while they focus on the overall experiences of an organization, patterns also apply at the project level, and an organization may experience multiple patterns at the same time. For example it isn’t unusual for a company with a “new to cloud” policy to also migrate existing resources over time as a long-term project. This places them in both the data center transformation and native new build patterns. Developer Led Mark was eating a mediocre lunch at his desk when a new “priority” ticket dropped into his network ops queue. “Huh, we haven’t heard from that team in a while… weird.” He set the microwaved leftovers to the side and clicked open the request… Request for firewall rule change: Allow port 3306 from IP 52.11.33.xxx/32. Mission critical timeline. “What the ?!? That’s not one of our IPs?” Mark thought as he ran a lookup. “amazonaws.com? You have GOT to be kidding me? We shouldn’t have anything up there. Mark fired off emails to his manager and the person who sent the ticket, but he had a bad feeling he was about to get dragged into the kind of mess that would seriously ruin his plans for the next few months. Developer-led projects are when a developer or team builds something in a cloud on their own, and central IT is then forced to support it. We sometimes call this “developer tethering”, because these often unsanctioned and/or uncoordinated projects anchor an organization to a cloud provider, and drag the rest of the organization in after them. These projects aren’t always against policy – this pattern is also common in mergers and acquisitions. This also isn’t necessarily a first step into the cloud overall – it can also be a project which pulls an enterprise into a new cloud provider, rather than their existing preferrred cloud provider. This creates a series of tough issues. To meet the definition of this pattern we assume you can’t just shut the project down, but actually need to support it. The project has been developed and deployed without the input of security or

Share:
Read Post

Your Cloud Journeys is Unique, but Not Unknown

This is the first post in a new series, our “Network Operations and Security Professionals’ Guide to Managing Public Cloud Journeys”, which we will release as a white paper after we complete the draft and have some time for public feedback. Special thanks to Gigamon for licensing. As always, the content is being developed completely independently using our Totally Transparent Research methodology. Cloud computing is different, disruptive, and transformative. It has no patience for traditional practices or existing architectures. The cloud requires change, and there is a growing body of documentation on end states you should strive for, but a lack of guidance on how to get there. Cloud computing may be a journey, but it’s one with many paths to what is often an all-too-nebulous destination. Although every individual enterprise has different goals, needs, and capabilities for their cloud transition, our experience and research has identified a series of fairly consistent patterns. You can think of moving to cloud as a mountain with a single peak, with everyone starting from the same trailhead. But this simplistic view, which all too often underlies conference presentations and tech articles, fails to capture the unique opportunities and challenges you will face. At the other extreme, we can think of the journey as involving a mountain range with innumerable peaks, starting points, and paths… and a distinct lack of accurate maps. This is the view that tends to end with hands thrown up in the air, expressions of impossibility, and analysis paralysis. But our research and experience guide us between those extremes. Instead of a single constrained path which doesn’t reflect individual needs, or totally individualized paths which require you to build everything and learn every lesson from scratch, we see a smaller set of common options, with consistent characteristics and experiences. Think of it as starting from a few trailheads, landing on a few peaks, and only dealing with a handful of well-marked trails. These won’t cover every option, but can be a surprisingly useful way to help structure your journey, move up the hill more gracefully, and avoid falling off some REALLY sharp cliff edges. Introducing Cloud Adoption Patterns Cloud adoption patterns represent a consolidated set of cloud adoption journeys, compiled through discussions with hundreds of enterprises and dozens of hands-on projects. Less concrete than specific cloud controls, they are a more general way of predicting and understanding the problems facing organizations when moving to cloud, based on starting point and destination. These patterns have different implications across functional teams, and are especially useful for network operations and network security, because they tend to fairly accurately predict many architectural implications, which then map directly to management processes. For example there are huge differences between a brand-new startup or cloud project without any existing resources, a major data center migration, and a smaller migration of key applications. Even a straight-up lift and shift migration is extremely different if it’s a one-off vs. a smaller project vs. wrapped up in a massive data center move with a hard cutoff deadline (often thanks to an hosting contract which is not being renewed). Each case migrates an existing application stack into the cloud, but the different scope and time constraints dramatically affect the actual migration process. We’ll cover them in more detail in our next post, but the four patterns we have identified are: Developer led: A development team starts building something in the cloud outside normal processes, and then pulls the rest of the organization behind them. Data center transformation: An operations-led process defined by an organization planning a methodical migration out of existing data centers and into the cloud, sometimes over a decade or more. Snap migration: An enterprise is forced out of some or all their data centers on a short timeline, due to contract renewals or other business drivers. Native new build: The organization plans to build a new application or several completely in the cloud using native technologies. You likely noticed we didn’t mention some common terms like “refactor” and “new to cloud”. Those are important concepts but we consider them options on the journey, not to define the journey. Our four patterns are about the drivers for your cloud migration and your desired end state. Using Cloud Adoption Patterns The adoption patterns offer a framework for thinking about your upcoming (or in-process) journey, and help identify both strategies for success and potential failure points. These aren’t proscriptive like the Cloud Security Maturity Model or the Cloud Controls Matrix; they won’t tell you exactly which controls to implement, but are more helpful when choosing a path, defining priorities, mapping architectures, and adjusting processes. Going back to our mountain-climbing analogy, the cloud adoption patterns point you down the right path and help you decide which gear to take, but it’s still up to you to load your pack, know how to use the gear, plan your stops, and remember your sunscreen. These patterns represent a set of characteristics we consistently see based on how organizations move to cloud. Any individual organization might experience multiple patterns across different projects. For example a single project might behave more like a startup, even while you concurrently run a larger data center migration. Our next post will detail the patterns with defining characteristics. You can use it to determine your overall organizational journey, as well as to plan out individual projects with their own characteristics. To help you better internalize these patterns, we will offer fictional examples based on real experiences and projects. Once you know which path you are on, our final sections will include top-line recommendations for network operations and security, and tie back to our examples to show how they play out in real life. We will also highlight the most common pitfalls and their potential consequences. This research should help you better understand what approaches will work best for your project in your organization. We are focusing this first round on networking, but future work will build on this basis to

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.