Understanding and Selecting Data Masking: Use Cases
As we approach the end of this series, it has become clear that I should really have started with use cases. Not only because they are the primary driver of interest in masking products, but also because many advanced features and deployment models really only make sense in terms of particular use cases. The critical importance of clustered servers, and the necessity for post-masking validation for some applications, are really only clear in light of particular usage scenarios. I will sort this out in the final paper, putting use cases first, which will help with the more complex later discussions. But here they are. Use Cases Test Data Management: This is, by far, the most important reason customers gave for masking. When polled, most customers say their #1 use for masking technologies is to produce test data. They want to make sure employees don’t do something stupid with corporate data, like making private data sets public, or moving production data to insecure test environments. That is technically true as far as it goes, but fails to capture the essence of what customers look for in masking products. In actuality, masking data for testing and sharing is almost a trivial subset of the full customer requirement; tactical production of test data is just a feature. The real goal is administration of the entire data security lifecycle – including locating, moving, managing, and masking data. The mature version of today’s simpler use case is a set of enterprise data management capabilities which control the flow of data to and from hundreds of different databases. This capability answers many of the most basic security questions we hear customers ask, such as “Where is my sensitive data?” “Who is using it?” and “How can we effectively reduce the risks to that information?” Companies understand that good data makes employees’ jobs easier. And employees are really crafty at procuring data to help with their day jobs, even if it’s against the rules. If salespeople can get the entire customer database to help meet their quotas, or quality assurance personnel think they need production data to test web applications, they usually find ways to get it. The same goes for decentralized organizations where regional offices need to be self-sufficient, or companies need to share data with partners. The mental shift we see in enterprise environments is to stop fight these internal user requirements, but find a way to satisfy this demand safely. In some cases this means automated production of test data on a regular schedule, or self-service interfaces to produce masked content on demand. These platforms are effectively implementing a data security strategy for fast and efficient production of test data. Compliance: Compliance is the second major reason cited by customers for why they buy masking products. Unlike most of today’s emerging security technologies, it’s not just the Payment Card Industry’s Data Security Standard (PCI-DSS) driving sales – many different regulatory controls, across various industry verticals, are driving broad interest in masking. Early customers came specifically from finance, but adoption is well distributed across different segments, including particularly retail, telecomm, health care, energy, education, and government. The diversity of customer requirements makes it difficult to pinpoint any one regulatory concern that stands out from the rest. During discussions we hear about all the usual suspects – including PCI, NERC, GLBA, FERPA, HIPAA, and in some cases multiple requirements at the same time. These days we hear about masking being deployed as a more generic control – customers cite protection of Personally Identifiable Information (PII), health records, and general customer records, among other concerns; but we no longer see every customer focused on one specific regulation or requirement. Now masking is perceived as addressing a general need to avoid unwanted data access, or to reduce exposure as part of an overall compliance posture. For compliance masking is used to protect data with minimal modification to systems or processes which use the (now masked) data. Masking provides consistent coverage across files and databases with very little adjustment. Many customers layered masking and encryption in combination; using encryption to secure data at rest and masking to secure data in use. Customers find masking better at maintaining relationships within databases; they also appreciate that it can be applied dynamically and causes fewer application side effects. In some cases encryption is deployed as part of the infrastructure, while others employ encryption as part of the data masking process – particularly to satisfy regulations that prescribe encryption. But the key difference is that masking offers full control over the data lifecycle from discovery to archival, whereas encryption is used in a more focused manner, often at multiple different points, to address specific risks. Masking platform manage the compliance controls, including which columns of data are to be protected, how they are protected, and where the data resides. Production Database Protection: The first two use cases drive the vast majority of market demand for masking. While replacement of sensitive data – specifically through ETL style deployments – is by far the dominant model, it is not the only way to protect data in a database. At some firms protection of the production database is the primary goal for masking, with test data secondary. Masking can do both, which makes it attractive in these scenarios. Production data generally cannot be fully removed, so this model redirects requests to masked data where possible. This use case centers around protecting information with finer control over user access and dynamic determination whether or not to provide access – something roles and credentials are not designed to support. Dynamic masking effectively redirects suspect queries to a masked view of the real data, along with reverse proxy servers, in a handful of cases. These customers appreciate the dual benefits of dynamically detecting misuse while also monitoring database usage; they find it useful to have a log of which view of information has been presented to which users, and when. It is worth mentioning a few use cases I