Understanding and Selecting a DLP Solution: Part 3, Data-In-Motion Technical Architecture
Welcome to part 3 of our series on Data Loss Prevention/Content Monitoring and Filtering. You should go read Part 1 and Part 2 before digging into this one. In this episode we’re going to spend some time looking at the various architectures we typically see in DLP products. This is a bit of a tough one since we tend to see a bunch of different technical approaches. There’s no way to cover all the options in a little old blog post, so I’ll focus on the big picture things you should look for. To structure things a bit, we’ll look at DLP for data-in-motion (network monitoring/filtering) data-at-rest (storage) and data-in-use (endpoint). For space reasons, this post will focus on data-in-motion, and the next post will drill into at-rest and in-use. Network Monitor In the heart of most DLP solutions lies a passive network monitor. This is where DLP was born and is where most of you will start your data protection adventure. The network monitoring component is typically deployed at or near the gateway on a SPAN port (or a similar tap). It performs full packet capture, session reconstruction, and content analysis in real time. Performance numbers tend to be a little messy. First, on the client expectation side, everyone wants full gigabit Ethernet performance. That’s pretty unrealistic since I doubt many of you fine readers are really running that high a level of communications traffic. Remember, you don’t use DLP to monitor your web applications, but to monitor employee communications. Realistically we find that small enterprises run well less than 50 MB/s of relevant traffic, medium enterprises run closer to 50-200 MB/s, and large enterprises around 300 MB/s (maybe as high as 500 in a few cases). Because of the content analysis overhead, not every product runs full packet capture. You might have to choose between pre-filtering (and thus missing non-standard traffic) or buying more boxes and load balancing. Also, some products lock monitoring into pre-defined port and protocol combinations, rather than using service/channel identification based on packet content. Even if full application channel identification is included, you want to make sure it’s not off by default. Otherwise, you might miss non-standard communications such as tunneling over an unusual port. Most of the network monitors are just dedicated servers with DLP software installed. A few vendors deploy as a true specialized appliance. While some products have their management, workflow, and reporting built into the network monitor, most offload this to a separate server or appliance. This is where you’ll want the big hard drives to store policy violations, and this central management server should be able to handle distributed hierarchical deployments. Email The next major component is email integration. Since email is store and forward you can gain a lot of capabilities, like quarantine, encryption integration, and filtering, without the same complexity of blocking synchronous traffic. Most products embed an MTA (Mail Transport Agent) into the product, allowing you to just add it as another hop in the email chain. Quite a few also integrate with some of the major existing MTAs/email security solutions directly for better performance. One weakness of this approach is it doesn’t give you access to internal email. If you’re on an Exchange server, internal messages never make it through the MTA since there’s no reason to send that traffic out. To monitor internal mail you’ll need direct Exchange/Notes integration, which is surprisingly rare in the market. We’re also talking true integration, not just scanning logs/libraries after the fact, which is what a few consider internal mail support. Good email integration is absolutely critical if you ever want to do any filtering, as opposed to just monitoring. Actually, this is probably a good time to drill into filtering a bit… Filtering/Blocking and Proxy Integration Nearly anyone deploying a DLP solution will eventually want to start blocking traffic. There’s only so long you can take watching all your juicy sensitive data running to the nether regions of the Internet before you start taking some action. But blocking isn’t the easiest thing in the world, especially since we’re trying to allow good traffic, only block bad traffic, and make the decision using real time content analysis. Email, as we just mentioned, is pretty easy. It’s not really real-time and is proxied by its very nature. Adding one more analysis hop is a manageable problem in even the most complex environments. Outside of email most of our communications traffic is synchronous- everything runs in real time. Thus if we want to filter it we either need to bridge the traffic, proxy it, or poison it from the outside. With a bridge we just have a system with two network cards and we perform content analysis in the middle. If we see something bad, the bridge closes the connection. Bridging isn’t the best approach for DLP since it might not stop all the bad traffic before it leaks out. It’s like sitting in a door watching everything go past with a magnifying glass- by the time you get enough traffic to make an intelligent decision, you may have missed the really good (bad) stuff. Very few products take this approach, although it does have the advantage of being protocol agnostic. Our next option is a proxy. A proxy is protocol/application specific and queues up traffic before passing it on, allowing for deeper analysis (get over it Hoff, I’m simplifying on purpose here). We mostly see gateway proxies for HTTP, FTP, and IM. Almost no DLP solutions include their own proxies; they tend to integrate with existing gateway/proxy vendors such as Blue Coat/Cisco/Websense instead. Integration is typically through the iCAP protocol, allowing the proxy to grab the traffic, send it to the DLP product for analysis, and cut communications if there’s a violation. This means you don’t have to add another piece of hardware in front of your network traffic and the DLP vendors can avoid the difficulties of building dedicated network hardware for inline analysis. A couple of gateways, like Blue Coat and Palo Alto (I may be missing