Understanding and Selecting Data Masking: Technical Architecture

Today we will discuss platform architectures and deployment models. Before I jump into the architectural models, it’s worth mentioning that these architectures are designed in response to how enterprises use data. Data is valuable because we use it to support business functions. Data has value in use. The more places we can leverage data to make decisions, the more valuable it is. However, as we have seen over the last decade, data propagation carries many risks. Masking architectures are designed to fit within existing data management frameworks and mitigate risks to information without sacrificing usefulness. In essence we are inserting controls into existing processes, using masking as a guardian, to identify risks and protect data as it migrates through the enterprise applications that automate business processes. As I mentioned in the introduction, we have come a long way from masking as nothing more than a set of scripts run by an admin or database administrator. Back then you connected directly to a database, or ran scripts from the console, and manually moved files around. Today’s platforms proactively discover sensitive data and manage policies centrally, handling security and data distribution across dozens of different types of information management systems, automatically generating masked data as needed for different audiences. Masking products can stand alone, serving disparate data management systems simultaneously, or be embedded as a core function of a dedicated data management service. Base Architecture Single Server/Appliance: A single appliance or software installation that performs static ‘ETL’ data masking services. The server is wholly self-contained – performing all extraction, masking, and loading from a single location. This model is typically used in small and mid-sized enterprises. It can scale geographically, with independent servers in regional offices to handle masking functions, usually in response to specific regional regulatory requirements. Distributed: This option consists of a central management server with remote agents/plug-ins/appliances that perform discovery and masking functions. The central server distributes masking rules, directs endpoint functionality, catalogs locations and nature of sensitive data, and tracks masked data sets. Remote agents periodically receive updates with new masking rules from the central server, and report back sensitive data that has been discovered, along with the results of masking jobs. Scaling is by pushing processing load out to the endpoints. Centralized Architecture: Multiple masking servers, centrally located and managed by a single management server, are used primarily for production and management of masked data for multiple test and analytics systems. Proxy/Bridge Cluster: One or more appliances or agents that dynamically mask streamed content, typically deployed in front of relational databases, to provide proxy-based data masking. This model is used for real-time masking of non-static data, such as database queries or loading into NoSQL databases. Multiple appliances provide scalability and failover capabilities. This may or may not be used in a two-tier architecture. Appliances, software, and virtual appliance options are all available. But unlike most security products, where appliances dominate the market, masking vendors generally deliver their products as software. Windows, Linux, and UNIX support is all common, as is support for many types of files and relational databases. Support for virtual appliance deployment is common among the larger vendors but not universal, so inquire about availability if that is key to your IT service model. A key masking evolution is the ability to apply masking policies across different data management systems (file management, databases, document management, etc.) regardless of platform type (Windows vs. Linux vs. …). Modern masking platforms are essentially data management systems, with policies set at a central location and applied to multiple systems through direct connection or remote agent software. As data is collected and moved from point A to point B, one or more data masks are applied to one or more ‘columns’ of the data. Deployment and Endpoint Options While masking architecture is conceptually simple, there are many different deployment options, each particularly suited to protecting one or more data management systems. And given masking technologies must work on static data copies, live database repositories, and dynamically generated data (streaming data feeds, application generated content, ad hoc data queries, etc.), a wide variety of deployment options are available to accommodate the different data management environments. Most companies deploy centralized masking servers to produce safe test and analytics data, but vendors offer the flexibility to embed masking directly into other applications and environments where large-footprint masking installations or appliances are unsuitable. The following is a sample of the common deployments used for remote data collection and processing. Agents: Agents are software components installed on a server, usually the same server that hosts the data management application. Agents have the option of being as simple or advanced as the masking vendor cares to make them. They can be nothing more than a data collector, sending data back to a remote masking server for processing, or might provide masking as data is collected. In the latter case, the agent masks data as it is received, either completely in memory or from a temporary file. Agents can be managed remotely by a masking server or directly by the data management application, effectively extending data management and collaboration system capabilities (e.g., MS SharePoint, SAP). One of the advantages of using agents at the endpoint rather than in-database stored procedures – which we will describe in a moment – is that all traces of unmasked data can be destroyed. Either by masking in ‘ephemeral’ memory, or by ensuring temporary files are overwritten, sensitive data is not leaked through temporary storage. Agents do consume local processor, memory, and storage – a significant issue for legacy platforms – but only a minor consideration for virtual machines and cloud deployments. Web Server Plug-ins: Technically a form of agent, these plug-ins are installed as web application services, as part of an Apache/web application stack used to support the local application which manages data. Plug-ins are an efficient way to transparently implement masking within existing application environments, acting on the data stream before it reaches the application or extending the application’s functionality

Read Post

Friday Summary: June 1, 2012

It’s the first of June, and I’m sure most of you are thinking about vacation, if not actually on vacation at this point. I’m here holding down the fort while the rest of Securosis is visiting places cooler and more fun. I’m taking time to reflect on security topics and my research agenda. I have been mulling over the topic of IT buying security products for the sake of security. Sounds irrational, right? We have known for years that people only buy security products to help satisfy compliance requirements, and then only grudgingly, to meet the minimum requirements. But people buying security to help secure things keeps popping up here and there, and I have been waiting for better evidence before blogging about it. Just before the RSA conference I decided to bring it up in an internal meeting, and the conversation went a bit like this: Me: “I think I should mention buying security for the sake of security as a trend.” Partner #1: “Why?” Me: “The number of security driven inquiries has doubled.” Partner #1: “Twice nothing is nothing. Move on.” Me: “Agreed, but twice 3-5% is something to take notice of.” Partner #2: “Where are you getting your data from?” Me: “Customer conversations and anecdotal vendor evidence. At least a dozen, maybe 15 references, since January, mostly in the area of data and database security.” Partner #2: “Meh. Not a great sample pool, or sample size. It’s so small in comparison to compliance it’s an afterthought. It’s really not worth mentioning.” Me: “Yeah, OK, agreed. But the customer questions seem to be driven by risk analysis, and the conversations just seems different. I think we could keep our eyes open on this.” So it’s not really worth talking about, but here I am mentioning it because it keeps popping up. I figured I’d open it up for discussion with our readers, to see what others are seeing. It’s not an actual trend, but it’s interesting – to me, at least. The evidence clearly shows that security is a compliance-driven market, and there is not enough evidence to say we see a real a change. But the conversations are a bit different than they used to be. More often focused on security, more focused on data, with some understanding of risk and a bit of a six-sigma-esque approach to security roadmaps. So maybe it’s not security at all – maybe it’s sophistication of buyers and their internal processes. And why do I care? Because if security or risk is the driver, it changes who buys the products and what features they focus on and ask about – because the use cases differ between security and compliance buyers. I am thinking out loud, but I’d love to hear what’s driving your product selection today. The other issue to talk about is my research agenda. It’s been hectic here since a month before RSA and it’s only just starting to let up. So it’s time to take a breath and look at the topics you want to hear about. Since Mike joined we have really filled out endpoint and network security; and we have continued to do a lot in analytics, data security, and security management. But despite the amount of expertise we have in house, we have done very little with application security, cloud, and access management. WAF management has been among the top 4 items on my research agenda for 2.5 years now, but has yet to percolate to the top. Identity and Access Management for cloud computing is an incredibly confusing topic which I think we could really shed some light on. And there are plenty of interesting technologies for application security we should delve into as well. We will reset the research agenda again soon, so now is a good time to weigh in on the areas you’re most interested in. Oh, and if you visit Arizona in the coming weeks, stay away from flashlights. Apparently they’re dangerous. Yikes! On to the Summary: Webcasts, Podcasts, Outside Writing, and Conferences The Macalope consults The Mogull Adrian presents on selecting a tokenization strategy. We missed Rich’s TidBITS article on hardening Mac OS X. Favorite Securosis Posts Adrian Lane: Low Hanging Fruit. When my encrypted tunnel failed the other day and email immediately decided to synch, I prayed no one was listening. Made me change all my passwords just in case. Mike Rothman: Pragmatic Key Management: Introduction. Rich had me at Pragmatic. I look forward to this series – crypto is integral to the cloud and we all need to revisit our Bob & Alice flowcharts. Other Securosis Posts White Paper: Understanding and Selecting a Database Security Platform. White Paper: Vulnerability Management Evolution. Security, Metrics, Martial Arts, and Triathlon: a Meandering Friday Summary. Evolving Endpoint Malware Detection: Control Lost. Continuous Learning. Friday Summary: May 18, 2012. Understanding and Selecting Data Masking: How It Works. Understanding and Selecting Data Masking: Defining Data Masking. Favorite Outside Posts Adrian Lane: The Cost of Fixing Vulnerabilities vs. Antivirus Software. Jeremiah asks whether our security investment dollars can be spent better. Most firms I speak with keep metrics to determine whether security programs are helping, improve over time, and provide some hints about the relative cost/benefit tradeoffs of different security investments. The data supports Jeremiah’s assertion. Mike Rothman: E-Soft ( Uses Bogus Copyright Claims to Stifle Research. I guess some companies never learn from others. Security by obscurity is not a winning strategy. How about actually fixing the damn bug? Yeah, that’s too radical. Project Quant Posts Malware Analysis Quant: Index of Posts. Malware Analysis Quant: Metrics – Monitor for Reinfection. Malware Analysis Quant: Metrics – Remediate. Malware Analysis Quant: Metrics – Find Infected Devices. Malware Analysis Quant: Metrics – Define Rules and Search Queries. Malware Analysis Quant: Metrics – The Malware Profile. Malware Analysis Quant: Metrics – Dynamic Analysis. Research Reports and Presentations Report: Understanding and Selecting a Database Security Platform. Vulnerability Management Evolution: From Tactical Scanner to Strategic Platform. Watching the Watchers:

Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.