We have a major problem. It isn’t really getting better, and soon a critical window of opportunity will close that we can’t afford to lose. I don’t say this lightly, and I think anyone who has read my prior work knows I am not prone to FUD.
No one can possibly know the actual percentage of enterprise workloads and applications that have moved to cloud, but every statistic I could find estimates that, at most, it is somewhere in the range of 25% (here’s one Gartner take). I think under 25% is likely accurate, but I estimate that well over 90% of organizations have some production workloads in cloud, including SaaS and PaaS/IaaS. The lake is wide but only deep for a relatively small number of enterprises. This is natural and expected; it takes decades to transition existing workloads, especially when they are running happily in datacenters and there’s no major driver to move them out.
This is our window. Most organizations are in the shallow end of the pool, staring wistfully at the adventurous kids jumping off the high dive and frolicking around in the deep end. We have a choice — wait, learn to swim, or strap on some floaties and hope for the best. Oh, and there’s no lifeguard and there are most definitely some sharks. With lasers.
If organizations don’t improve their cloud governance, they have no chance of meaningfully improving their cloud security. That’s bad enough with today’s relatively limited cloud adoption, but as we gradually move more and more workloads to the cloud, without effective governance the problem will increase exponentially.
Nearly every single cloud security issue and breach is the direct result of a governance failure, not a technology failure.
Cloud Governance Anti-Patterns
I started working hands-on in cloud security in 2010. In any given year I probably talk with hundreds of organizations, if you include training classes and webinars. As an IANS faculty member I take, on average, 3-5 advisory calls a week, mostly with large enterprises. Each year I run multiple cloud security assessments, advisory and consulting engagements (most with larger organizations, including some of the largest in the world). I also provide advisory calls through the Cloud Security Alliance. This post is based on consistent trends I see throughout these calls, projects, and other relationships.
In many calls the customer starts to describe a narrow problem which I quickly recognize is a larger governance issue. I often stop them, describe the anti-pattern, and it’s almost like I just magically described their entire childhood. It’s like my work as a paramedic: using key symptoms to identify the larger problem.
This is a big dataset, and some of these issues directly contradict each other as different organizations make different mistakes on opposite sides of the spectrum:
- “We can’t slow down developers.” Security may be allowed to put in some basic requirements, but is often not allowed to install any significant preventative controls. They are often forced to rely on a CSPM/CNAPP and, at best, get to escalate only critical and high issues. IAM is a disaster with teams making major use of static credentials, like AWS IAM Users (which cause 66% of all AWS customer security incidents, according to AWS).
- “We don’t trust cloud and have to comply with our existing security policies and processes.” Security does get to slow things down, but typically lacks a sufficient technical understanding of cloud and tries to shoehorn it into existing processes. The organization tries to rely on existing security tools, and focuses too much on the network and too little on IAM. Cloud usage is so constrained that teams may just give up and keep deploying into the datacenter.
- “Cloud is just another datacenter.” There is little acceptance that cloud computing is a fundamentally different technology, which requires a different skillset. Neither infrastructure, development, nor security teams are effectively trained and tooled; instead they are expected to learn as they go. Many projects are just rehosted into the cloud, which reduces reliability and security while increasing costs. There are two subtypes of this pattern:
- “We must migrate n% of workloads by x date.” Usually driven by datacenter contract renewal dates.
- “We have $n credits from our vendor (usually Microsoft or Oracle), so we need to use those.”
- “We are going multicloud.” These organizations usually haven’t finished establishing an effective security program for one cloud, but they are going into other clouds. This is often tied to “we can’t slow down developers.” Multicloud isn’t inherently wrong, but it’s horrifically wrong without proper governance and investment in tools and people. The security teams in these organizations almost entirely rely on CSPM tools for blocking and tackling, and there is almost never investment in having at least one security subject matter expert for each cloud. There are four subtypes I often see:
- “We are going to be cloud agnostic and run everything in containers.” The expectations is that everything will work in containers wherever you want to deploy it, because the enterprise either thinks they can save money and dynamically move workloads wherever they are cheaper, or because developers want to use their favorite toys. For the record, I am as likely to see a living unicorn as a truly cloud-agnostic workload.
- “We need to backup our workloads in case our cloud provider has an outage.” If you want to completely rebuild your entire application stack on multiple platforms which don’t share any fundamental technology characteristics, be prepared to pay up.
- “We got some credits from $provider we need to use.” So either you lose credits, or you pay to up-skill your teams, or you… do neither, and have a poorly supported workload running on a platform on which nobody is expert.
- “We need to go multicloud in case $provider has an outage.” Have you tracked outages? Do you architect within your existing provider to handle outages?
- Executive leadership is disengaged and doesn’t set the ground rules. This one isn’t in quotes because that’s never how the client words it, and it is often combined with one of the other anti-patterns. In these organizations decisions are made based on the political strengths and skills of individuals. For the record, we in security are often bad at politics.
There are more, but that covers most of what I see.
Key Symptoms
Within 5-10 minutes on a call I can usually diagnose an organization suffering from these anti-patterns using key indicators:
- Security relies on a CSPM/CNAPP and either creating tickets or emailing teams to fix misconfigurations — which tend to linger for days, weeks, or longer.
- Security is either not in charge of managing IAM for deployments, or is in charge but takes days or weeks to approve and/or implement changes.
- IAM is decoupled from the IaC that builds the services; in other words, it isn’t an integrated process where the dev team and security work together to meet application needs while minimizing risk.
- The organization does not have at least one security subject matter expert per cloud platform (for IaaS — SaaS is different).
- I hear any of the above quotes within the first 15 minutes of a call. Especially “multicloud” or “we can’t slow developers”.
- All cloud workloads must be on the one true cloud network which connects back to the datacenter, and there are a ton of virtual firewalls everywhere that are decoupled from Security Groups (which are usually set to be overly permissive), because that’s the only control the existing security team is familiar with.
- Not all virtual firewall usage is bad, just when it isn’t coordinated with the cloud native capabilities and is used instead of them due to lack of knowledge or control. The two should always be used to complement each other, not replace.
- There are no established cloud security control objectives. At best there is a 1-3 page cloud security policy, which often just focuses on CIS benchmarks.
- There is a lack of clear organizational hierarchy, and clarity for who is responsible for what in cloud.
Why It Matters Now and Where to Start
As I mentioned, we are still in the early days of cloud adoption in terms of the number or workloads running in the cloud. Smaller usage, even at today’s large scale, means smaller problems. Considering everything you read in the headlines, imagine how much worse things will be as we grow to 50% or greater adoption.
If we do not fix governance today, security will be unable to effectively protect us, our security debt will grow, and we will pay more later — in terms of both breaches and enormous expenses to fix deep-rooted problems.
This is why we added Governance as the top category in the Cloud Security Maturity Model. I suspect the issue is broader than cloud security, since NIST added governance to NIST CSF 2.0.
So where do you start? Nearly all the effective best practices I’ve learned over the past 14 years are in the Cloud Security Alliance Security Guidance v5, Domain 2. Changing governance is hard — sometimes impossible — but the combination of the Maturity Model and the Guidance can set you on the right path. It requires a clear organizational structure and executives willing to make decisions, combined with an interlocking set of approved control objectives and control specifications. All backed by people who understand cloud, with enough subject matter expertise to actually know how things work and how to improve them — instead of making decisions based on what a tool tells you.
Comments