Welcome to part 3 of our series on Data Loss Prevention/Content Monitoring and Filtering. You should go read Part 1 and Part 2 before digging into this one.

In this episode we’re going to spend some time looking at the various architectures we typically see in DLP products. This is a bit of a tough one since we tend to see a bunch of different technical approaches. There’s no way to cover all the options in a little old blog post, so I’ll focus on the big picture things you should look for. To structure things a bit, we’ll look at DLP for data-in-motion (network monitoring/filtering) data-at-rest (storage) and data-in-use (endpoint). For space reasons, this post will focus on data-in-motion, and the next post will drill into at-rest and in-use.

Network Monitor

In the heart of most DLP solutions lies a passive network monitor. This is where DLP was born and is where most of you will start your data protection adventure. The network monitoring component is typically deployed at or near the gateway on a SPAN port (or a similar tap). It performs full packet capture, session reconstruction, and content analysis in real time. Performance numbers tend to be a little messy. First, on the client expectation side, everyone wants full gigabit Ethernet performance. That’s pretty unrealistic since I doubt many of you fine readers are really running that high a level of communications traffic. Remember, you don’t use DLP to monitor your web applications, but to monitor employee communications. Realistically we find that small enterprises run well less than 50 MB/s of relevant traffic, medium enterprises run closer to 50-200 MB/s, and large enterprises around 300 MB/s (maybe as high as 500 in a few cases). Because of the content analysis overhead, not every product runs full packet capture. You might have to choose between pre-filtering (and thus missing non-standard traffic) or buying more boxes and load balancing. Also, some products lock monitoring into pre-defined port and protocol combinations, rather than using service/channel identification based on packet content. Even if full application channel identification is included, you want to make sure it’s not off by default. Otherwise, you might miss non-standard communications such as tunneling over an unusual port.

Most of the network monitors are just dedicated servers with DLP software installed. A few vendors deploy as a true specialized appliance.

While some products have their management, workflow, and reporting built into the network monitor, most offload this to a separate server or appliance. This is where you’ll want the big hard drives to store policy violations, and this central management server should be able to handle distributed hierarchical deployments.


The next major component is email integration. Since email is store and forward you can gain a lot of capabilities, like quarantine, encryption integration, and filtering, without the same complexity of blocking synchronous traffic. Most products embed an MTA (Mail Transport Agent) into the product, allowing you to just add it as another hop in the email chain. Quite a few also integrate with some of the major existing MTAs/email security solutions directly for better performance. One weakness of this approach is it doesn’t give you access to internal email. If you’re on an Exchange server, internal messages never make it through the MTA since there’s no reason to send that traffic out. To monitor internal mail you’ll need direct Exchange/Notes integration, which is surprisingly rare in the market. We’re also talking true integration, not just scanning logs/libraries after the fact, which is what a few consider internal mail support. Good email integration is absolutely critical if you ever want to do any filtering, as opposed to just monitoring. Actually, this is probably a good time to drill into filtering a bit…

Filtering/Blocking and Proxy Integration

Nearly anyone deploying a DLP solution will eventually want to start blocking traffic. There’s only so long you can take watching all your juicy sensitive data running to the nether regions of the Internet before you start taking some action. But blocking isn’t the easiest thing in the world, especially since we’re trying to allow good traffic, only block bad traffic, and make the decision using real time content analysis. Email, as we just mentioned, is pretty easy. It’s not really real-time and is proxied by its very nature. Adding one more analysis hop is a manageable problem in even the most complex environments. Outside of email most of our communications traffic is synchronous- everything runs in real time. Thus if we want to filter it we either need to bridge the traffic, proxy it, or poison it from the outside.

With a bridge we just have a system with two network cards and we perform content analysis in the middle. If we see something bad, the bridge closes the connection. Bridging isn’t the best approach for DLP since it might not stop all the bad traffic before it leaks out. It’s like sitting in a door watching everything go past with a magnifying glass- by the time you get enough traffic to make an intelligent decision, you may have missed the really good (bad) stuff. Very few products take this approach, although it does have the advantage of being protocol agnostic.

Our next option is a proxy. A proxy is protocol/application specific and queues up traffic before passing it on, allowing for deeper analysis (get over it Hoff, I’m simplifying on purpose here). We mostly see gateway proxies for HTTP, FTP, and IM. Almost no DLP solutions include their own proxies; they tend to integrate with existing gateway/proxy vendors such as Blue Coat/Cisco/Websense instead. Integration is typically through the iCAP protocol, allowing the proxy to grab the traffic, send it to the DLP product for analysis, and cut communications if there’s a violation. This means you don’t have to add another piece of hardware in front of your network traffic and the DLP vendors can avoid the difficulties of building dedicated network hardware for inline analysis. A couple of gateways, like Blue Coat and Palo Alto (I may be missing a few, drop ‘em in the comments), can reverse-SSL proxy to let you sniff SSL connections. You need to make changes on your endpoints to deal with all the certificate alerts, but you can now peer into encrypted traffic. For instant messaging you’ll need an IM tool like FaceTime, Symantec, or Aconix, and you’ll have to choose a DLP product that specifically partners with whatever IM tool you’re using.

The last method of filtering is TCP poisoning. You monitor the traffic and when you see something bad, you inject a TCP reset packet to kill the connection. This works on every protocol but isn’t very efficient. For one thing, some protocols will keep trying to get their traffic through. If you TCP poison a single email message, the server will keep trying to send it for 4 days, as often as every 15 minutes. Yeah, that’s what I thought too. The other problem is the same as bridging- since you don’t queue the traffic at all, by the time you notice something bad it might be too late. It’s a good stop-gap to cover nonstandard protocols, but you’ll want to proxy as much as possible.

Internal Network

Although technically capable of monitoring internal networks, we’re not really seeing DLP used on internal traffic other than email. Gateways provide convenient choke points, whereas internal monitoring is daunting because of cost, performance, policy management, and false positives. A few DLP vendors have lined up partnerships, but I haven’t talked with anyone doing internal monitoring beyond email and IM.

Distributed Deployments

The last part of the architecture we’ll look at is support for distributed deployments. Unless you’re a one man shop like me you’ll have multiple gateways and locations. This is where the central management server comes in; it should support multiple monitoring points, including a mix of passive network monitoring, proxy points, email servers, and remote locations. While processing/analysis can be offloaded to remote enforcement points, you’ll want all events sent back to the central management server for workflow, reporting, investigation, and archiving. Remote offices are usually easy to support since you can just push policies down and pull reports back, but not every product has these features.

The more advanced products support hierarchical deployments. Imagine you’re in a large multinational corporation with different business units and legal monitoring requirements spread around the world. You might need a big honking server in the middle but want to support local policies and enforcement in different regions, running on their own management servers. Early products only supported one management server but now we have options to deal with these distributed situations, with a mix of corporate-regional-business unit policies, reporting, and workflow.

This pretty much covers things for network monitoring and filtering. Basically, just picture a policy and management server in the middle, with different, specialized, enforcement points scattered out where you need them.

That’s about it for the network side of DLP; in the next post we’ll focus on data at rest and endpoint control.