Securosis

Research

Understanding and Selecting Data Masking: Series Introduction

Data masking has been around a long time. I have been masking since the early ’90s to create test data from production copies of customer insurance records, as well as to alter database columns before sending database exports out for “data cleansing”. At the time masking was little more than UNIX shell scripts or home grown Perl scripts to alter particular columns in .csv files. A few years later I was giddy with excitement to have my first masking ‘program’, running on a paleolithic version of Windows, which actually had a ‘wizard’ for walking through the process. No, it did not help with extraction of information from a database, but it identified the columns to be altered, provided a list of masks to apply, and dumped an error file when it ran into trouble. That saved a lot of tweaking scripts and manually reviewing dump files. And all this was several years before I heard anyone mention ‘ETL’ (Extract, Transform, Load) because ODBC and JDBC drivers to connect to databases were just arriving on the scene, and nobody had automated bulk loads back into another database. That was still science fiction. Masking products don’t look like that any longer – now they are full-blown data security and management platforms. It feels a bit nostalgic to review data masking technologies, and somewhat surprising to find how far they have evolved into full production-quality enterprise platforms. I have been following data masking for almost two decades, and seen more evolution in the last couple years than over the first dozen. These advancements have come in two forms. First, evolution of the technology in recent years, building the capability to handle just about any type of database or data source, full automation, workflow integration, and a dozen or so data obfuscation techniques. Second, in response to substantial market demand from IT security and compliance departments, the way these tools are used has changed. Increased demands from new buying centers have forced changes in workflow, user interface, and how core capabilities are packaged. It only took a couple public breaches, where production data was easily exfiltrated from unsecured test databases, to drive masking into companies’ production data flows. Compliance requirements such as PCI-DSS cemented the need and are now a principal driver for adoption. The upshot is that most of these tools have seen significant advancement, and now include multiple robust user interfaces to support both technical and non-technical users, as well as pre-packaged solutions for different compliance mandates. Somewhere along the way, masking grew up! I started following this vertical again because we received a number of customer questions, specifically around compliance. We have been seeing steady growth in adoption of masking over the last four years – perhaps 20% YoY – as more customers use masking to reduce information risk. In some ways it’s a more elegant solution than encryption; and for several deployment models masking is cheaper and easier than surrounding sensitive data with layers of security controls such as user rights management, encryption, database security, and various firewall technologies. When you think about securing Big Data, data analytics systems, HIPPA compliance, and using public cloud computing resources, there is plenty of reason to believe masking’s rapid adoption will continue. I have written a lot about masking on the blog, but never a focused research paper; it seems to be time for a thorough explanation of what masking does and how it helps security. So I am excited to launch a new series: Understanding and Selecting Data Masking Solutions. I have designed this series to help would-be buyers understand what to look for in a product, and show existing customers how to leverage their investments to solve emerging problems. I’ll delve into the technology, deployment models, data flow, and management capabilities. I will discuss the four principal use cases and how the technology solves certain compliance and security issues, and close out with a brief buyers’ guide on what features to look for based upon your criteria. The outline follows: Core Features: We’ll define masking, introduce the basic technology, and discuss how it’s applied to data. We will also define the major masking options (shuffling, averaging, substitution, field nulling/redaction, and mathematical transposition) and de-identification methods. And we’ll explain the need for data type & format preservation, uniqueness, and semantic & referential integrity. How It Works: We will examine how masking works, focusing on how data flows through it and how information is secured. We’ll describe different options for sources, destinations, extraction methods, loading options, and where & how masking is performed. We will contrast masking against encryption and tokenization to frame advantages of particular techniques for specific use cases later. Technical Architecture: Deployment models (ETL, in-place, and the various options for dynamic masking), issues, and concerns with each. We will discuss support for files and databases, and how masking integrates with these platforms. We’ll include diagrams to compare and contrast the models. Advanced Features: We’ll cover current trends in data discovery, risk & criticality assessment, and mask validation. We will talk about centralized policy management, data set management, and secure data transfer. We’ll discuss integration with other systems such as trouble ticketing, encryption, tokenization, and DLP for automated workflow. Use Cases: We will outline both traditional and new use cases, bringing together the evolving requirements with ongoing changes to masking technologies, along with how these use cases prompt new deployment models. This section will focus on specific customers requirements that have come up in our research; we’ll also evaluate specific masking alternatives to meet security and compliance mandates. We will cover automated workflows and scripting, as well as use of pre-defined templates for defining masks. We’ll discuss compliance masks and pre-built regulatory options, as well as control reporting. Evaluate Your Needs: We’ll wrap up by mapping out evaluation criteria and a process to guide a customer buying decisions. We will distinguish between “must-have” and “nice-to-have” requirements, compliance, integration, setup, and management. As with all Securosis research projects, we are focused on

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.