DLP in the Cloud
It’s been quite a while since we updated our Data Loss Prevention (DLP) research. It’s not that DLP hasn’t continued to be an area of focus (it has), but a bunch of other shiny things have been demanding our attention lately. Yeah, like the cloud. Well, it turns out a lot of organizations are using this cloud thing now, so they inevitably have questions about whether and how their existing controls (including DLP) map into the new world. As we update our Understanding and Selecting DLP paper, we’d be remiss if we didn’t discuss how to handle potential leakage in cloud-based environments. But let’s not put the cart ahead of the horse. First we need to define what we mean by cloud with applicable use cases for DLP. We could bust out the Cloud Security Alliance guidance and hit you over the head with a bunch of cloud definitions. But for our purposes it’s sufficient to say that in terms of data access you are most likely dealing with: SaaS Software as a Service (SaaS) is the new back office. That means whether you know about it or not, you have critical data in a SaaS environment, and it must be protected. Cloud File Storage: These services enable you to extend a device’s file system to the cloud, replicating and syncing between devices and facilitating data sharing. Yes, these services are a specific subtype of SaaS (and PaaS, Platform as a Service), but the amount of critical data they hold, along with how differently they work than a typical SaaS application, demands that we treat them differently. IaaS: Infrastructure as a Service (IaaS) is the new data center. That means many of your critical applications (and data) will be moving to a cloud service provider – most likely Amazon Web Services, Microsoft Azure, or Google Cloud Platform. And inspection of data traversing a cloud-based application is, well… different, which that means protecting that data is also… different. DLP is predicated on scanning data at rest and inspecting and enforcing policies on data in motion, which is a poor fit for IaaS. You don’t really have endpoints suitable for DLP agent installation. Data is in either structured (like a database) or unstructured (filesystem) datastores. Data protection for structured datastores defaults to application-centric methods, will unstructured cloud file systems are really just cloud file storage (which we will address later). So inserting DLP agents into an application stack isn’t the most efficient or effective way to protect an application. Compounding the problem, traditional network DLP don’t fit IaaS well either. You have limited visibility into the cloud network; to inspect traffic, you would need to route it through an inspection point, which is likely to be expensive and/or lose key cloud advantages – particularly elasticity and anywhere access. Further, cloud network traffic is encrypted more often, so even with access to full traffic, inspection at scale presents serious implementation challenges. So we will focus our cloud DLP discussion on SaaS and cloud file storage. Cloud Versus Traditional Data Protection The cloud is clearly different, but what exactly does that mean? If we boil it down to its fundamental core, you still need to perform the same underlying functions – whether the data resides in a 20-year-old mainframe or the ether of a multi-cloud SaaS environment. To protect data you need to know where it is (discover), understand how it’s being used (monitor), and then enforce policies to govern what is allowed and by whom – along with any additional necessary security controls (protect). When looking at cloud DLP many users equate protection with encryption but that’s a massive topic with a lot of complexity, especially in SaaS. A good start is our recent research on Multi-Cloud Key Management. There is considerable detail in that paper, but managing keys across cloud and on-premise environments is significantly more complicated; you’ll need to rely more heavily on your provider, and architect data protection and encryption directly into your cloud technology stack. Thinking about discovery, do you remember the olden days – back as far as 7 years ago – when your critical data was either in your data centers or on devices you controlled? To be fair, even then it wasn’t easy to find all your critical data, but at least you knew where to look. You could search all your file servers and databases for critical data, profile and/or fingerprint it, and then look for it across your devices and your network’s egress points. But as critical data started moving to SaaS applications and cloud file storage (sometimes embedded within SaaS apps), controlling data loss became more challenging because data need not always traverse a monitored egress point. So we saw the emergence of Cloud Access Security Brokers (CASB), to figure out which cloud services were in use, so you could understand (kind of) where your critical data might be. At least you had a place to look, right? Enforcement of data usage policies is also a bit different in the cloud – you don’t completely control SaaS apps, nor do you have an inspection/enforcement point on the network where you can look for sensitive data and block it from leaving. We keep hearing about lack of visibility in the cloud, and this is another case where it breaks the way we used to do security. So what’s the answer? It’s found in 3 letters you should be familiar with. A. P. I. API Are Your Friends Fortunately many SaaS apps and cloud file storage services provide APIs which allow you to interact with their environments, providing visibility and some degree of enforcement for your data protection policies. Many DLP offerings have integrated with the leading SaaS and cloud file storage vendors to offer you the ability to: Know when files are uploaded to the cloud and analyze them. Know who is doing what with the files. Encrypt or otherwise protect the files. With this access you don’t need to see the data pass