DLP in the Cloud

By Mike Rothman

It’s been quite a while since we updated our Data Loss Prevention (DLP) research. It’s not that DLP hasn’t continued to be an area of focus (it has), but a bunch of other shiny things have been demanding our attention lately. Yeah, like the cloud. Well, it turns out a lot of organizations are using this cloud thing now, so they inevitably have questions about whether and how their existing controls (including DLP) map into the new world.

As we update our Understanding and Selecting DLP paper, we’d be remiss if we didn’t discuss how to handle potential leakage in cloud-based environments. But let’s not put the cart ahead of the horse. First we need to define what we mean by cloud with applicable use cases for DLP.

We could bust out the Cloud Security Alliance guidance and hit you over the head with a bunch of cloud definitions. But for our purposes it’s sufficient to say that in terms of data access you are most likely dealing with:

  • SaaS Software as a Service (SaaS) is the new back office. That means whether you know about it or not, you have critical data in a SaaS environment, and it must be protected.
  • Cloud File Storage: These services enable you to extend a device’s file system to the cloud, replicating and syncing between devices and facilitating data sharing. Yes, these services are a specific subtype of SaaS (and PaaS, Platform as a Service), but the amount of critical data they hold, along with how differently they work than a typical SaaS application, demands that we treat them differently.
  • IaaS: Infrastructure as a Service (IaaS) is the new data center. That means many of your critical applications (and data) will be moving to a cloud service provider – most likely Amazon Web Services, Microsoft Azure, or Google Cloud Platform. And inspection of data traversing a cloud-based application is, well… different, which that means protecting that data is also… different.

DLP is predicated on scanning data at rest and inspecting and enforcing policies on data in motion, which is a poor fit for IaaS. You don’t really have endpoints suitable for DLP agent installation. Data is in either structured (like a database) or unstructured (filesystem) datastores. Data protection for structured datastores defaults to application-centric methods, will unstructured cloud file systems are really just cloud file storage (which we will address later). So inserting DLP agents into an application stack isn’t the most efficient or effective way to protect an application.

Compounding the problem, traditional network DLP don’t fit IaaS well either. You have limited visibility into the cloud network; to inspect traffic, you would need to route it through an inspection point, which is likely to be expensive and/or lose key cloud advantages – particularly elasticity and anywhere access. Further, cloud network traffic is encrypted more often, so even with access to full traffic, inspection at scale presents serious implementation challenges.

So we will focus our cloud DLP discussion on SaaS and cloud file storage.

Cloud Versus Traditional Data Protection

The cloud is clearly different, but what exactly does that mean? If we boil it down to its fundamental core, you still need to perform the same underlying functions – whether the data resides in a 20-year-old mainframe or the ether of a multi-cloud SaaS environment. To protect data you need to know where it is (discover), understand how it’s being used (monitor), and then enforce policies to govern what is allowed and by whom – along with any additional necessary security controls (protect).

When looking at cloud DLP many users equate protection with encryption but that’s a massive topic with a lot of complexity, especially in SaaS. A good start is our recent research on Multi-Cloud Key Management. There is considerable detail in that paper, but managing keys across cloud and on-premise environments is significantly more complicated; you’ll need to rely more heavily on your provider, and architect data protection and encryption directly into your cloud technology stack.

Thinking about discovery, do you remember the olden days – back as far as 7 years ago – when your critical data was either in your data centers or on devices you controlled? To be fair, even then it wasn’t easy to find all your critical data, but at least you knew where to look. You could search all your file servers and databases for critical data, profile and/or fingerprint it, and then look for it across your devices and your network’s egress points.

But as critical data started moving to SaaS applications and cloud file storage (sometimes embedded within SaaS apps), controlling data loss became more challenging because data need not always traverse a monitored egress point. So we saw the emergence of Cloud Access Security Brokers (CASB), to figure out which cloud services were in use, so you could understand (kind of) where your critical data might be. At least you had a place to look, right?

Enforcement of data usage policies is also a bit different in the cloud – you don’t completely control SaaS apps, nor do you have an inspection/enforcement point on the network where you can look for sensitive data and block it from leaving. We keep hearing about lack of visibility in the cloud, and this is another case where it breaks the way we used to do security.

So what’s the answer? It’s found in 3 letters you should be familiar with. A. P. I.

API Are Your Friends

Fortunately many SaaS apps and cloud file storage services provide APIs which allow you to interact with their environments, providing visibility and some degree of enforcement for your data protection policies. Many DLP offerings have integrated with the leading SaaS and cloud file storage vendors to offer you the ability to:

  1. Know when files are uploaded to the cloud and analyze them.
  2. Know who is doing what with the files.
  3. Encrypt or otherwise protect the files.

With this access you don’t need to see the data pass by, so long as the API reliably tells you new data has moved into the environment, with sufficient flexibility to monitor and manage it. The key to DLP in the cloud is integration with API for the services you use.

But what happens when you don’t (or can’t) get adequate integration with cloud environments via their API? You need to see the data somehow, so that’s where a Cloud Access Security Broker comes into play.

Coexistence with CASB

CASB offer many functions, including providing visibility into cloud service usage within your environment. A CASB can also inspect traffic directed to cloud services by running it through a proxy. Of course this normally adds inefficiency by routing traffic unnaturally through the proxy, but the impact is highly dependent on the latency and response time requirements of the application and the network architecture. Many CASB tools can also connect to cloud providers directly via their API to evaluate activity without a proxy. This depends more on the cloud provider offering an API with the needed capabilities than on the CASB product itself, which is why proxy mode is often needed.

Because CASB inspect traffic, vendors claim to provide DLP-like functions for traffic they see heading for cloud environments. Of course DLP on your CASB cannot provide visibility or enforcement for on-premise data. So your decision depends on whether you want or need consistent policy across both on-premise and cloud environments, or separate solutions to monitor are content are sufficient.

There is no right or wrong answer for this decision – it depends heavily on whether the policies you implement on internal networks map well enough to data moving to SaaS and cloud file storage.

Workflow Consistency

Once an alert triggers, where the data resides doesn’t impact the processes your internal folks use to verify the potential leak and then assess the damage. So any workflow you have in place to handle data leakage should extend to wherever the data resides. Of course the tools for these processes differ, and your access to potentially compromised systems is radically different. For SaaS you simply have no access, as a rule. Either way, once you have a verified leak it’s time for your incident response process.

So preventing data leaks in SaaS and cloud file storage can be very challenging. That said, as with most cloudy things, the place to start is by revisiting your processes and technologies to see whether your existing environment is ready for the cloud.

But one thing we know is that there will be more cloud use tomorrow than today, so the sooner you get your arms around protecting your content – regardless of where it resides – the better for your organization.

No Related Posts

If you like to leave comments, and aren’t a spammer, register for the site and email us at and we’ll turn off moderation for your account.