After a short hiatus we are back with the next installment of our Data Centric Security series. This post will discuss why customers are interested in this approach, and specific use cases they are looking to address. It should be no surprise that all these use cases are driven by security or compliance. What’s interesting is why other tools and technologies do not meet their needs. What prompts people to look for a different approach to data security? Those are the questions we will address with today’s post. NoSQL / Big Data Security The single biggest reason we are asked about data centric security models is “Big Data”: moving information into NoSQL analytics clusters. Big data systems are simply a new type of database that facilitates fast analysis and lookup capabilities on much larger data sets – at a dramatically lower cost – than previously possible. To get the most out of these databases, lots of data is collected from dozens of sources. The problem is that many sources fall under one or more regulatory controls and contain sensitive data, but big data projects are typically started outside regulatory or IT guidance. As the custodians become aware of their responsibility for the NoSQL data and services, they realize they are unable to adequately secure the cluster – or even know exactly what it contains. To aggravate the problem, reporting and data controls within NoSQL databases are often deficient or completely unavailable. But NoSQL databases have proven their value, and offer previously unavailable scale for analytics, meaning genuine value to the organization. Unfortunately they are often too immature for enterprises to fully trust. Data centric security provides critical security for systems which process sensitive data but cannot themselves be fully trusted, so this approach is very attractive for either protecting data before moving it into a big data repository or transforming existing data into something non-sensitive which can be analyzed but does not need to be secured. The term for this process is “data de-identification”. Examples include substitution of an individual’s Social Security Number with a random number that could be an SSN, or a person’s name with a name randomly chosen or assembled from a directory, or a date with a random proximate date. In this way the original sensitive data is removed entirely, but the value of the data set is retained for analysis. We will detail how later in this series. Cloud and Data Governance Most countries have laws on how citizen data must be secured, outlining custodial responsibilities for companies which store and manage it. These laws differ on which data must be secured, which controls are acceptable, and what is required in case of a breach of sensitive data. If your IT systems are all within a single data center, in a single location under your control, you only need worry about your local laws. But cloud computing make compliance much more complex, especially in public clouds. First, cloud service providers are legally third parties, with deliberately opaque controls and limited access for tenants (customers like you). Second, for reliability and performance many cloud data centers are located in multiple geographic locations, with different laws. This means multiple – possibly conflicting – regulations apply to sensitive data, and you share responsibility with your cloud service providers. The legal issues break down into three type: functional, jurisdictional, and contractual. Functional issues include how legal discovery is performed, what happens in the event of a subpoena or legal hold, proof of data guardianship, and legal seizure in multi-tenant environments. Jurisdictional issues require you to understand applicable legislation, under what circumstances the law applies, and how legal processes differ. Contractual issues cover access to data, data lifecycle management, audit rights, contract termination, and a whole heap of other issues including security and vulnerability management. Data governance and legal issues require substantial research and knowledge to implement polices, often at great expense. Many firms want to leverage low-cost, on-demand cloud computing resources, but hesitate at the huge burden of data governance in and across cloud providers. This is a case where data centric security can reduce compliance burdens and resolve many legal issues. This typically means fewer reports, fewer controls, and less complexity to manage. PHI Queries on how to address HIPAA and Protected Health Information (PHI) were almost non-existent a couple years ago, but we are now asked with increasing frequency. Health care data encompasses many different kinds of sensitive data, and the surrounding issues are complex. A patient’s name is sensitive data in some contexts. Medical history, medications, age, and just about every other piece of data is critical to some audiences, but too sensitive to shared with others. Some patients’ data can be shared in certain limited cases, but not in others. And there many audiences for PHI: state and federal governments, hospitals, insurance companies, employers, organizations conducting clinical trials, pharmaceutical companies, and many more. Each audience has its own relevant data subset and restrictions on access. Data centric security is in use today, providing carefully selected subsets of the complete original data to different audiences, and surrogate data for elements which are required but not permitted. As data storage and management systems become cheaper, faster, and more powerful, providing a unique subset to each audience has become feasible. Each recipient can securely access its own copy, containing only its permitted data. Data centric security enables organizations to provide just those data elements which partners need, without exposing data they cannot access. And this can all be done in real time on demand, by applying appropriate controls to transform the original data into the secured subset. Many tools and techniques developed over the last several years for test data management are now employed to generate custom data sets for individual partners on an ongoing basis. Payment Card Security Tokenization for credit card security was the first data centric security approach to be widely accepted. Hundreds of thousands of organizations replace credit card numbers with data surrogates. Some