Trends in Data Centric Security: Tools
The three basic data centric security tools are tokenization, masking, and data element encryption. Now we will discuss what they are, how they work, and which security challenges they best serve. Tokenization: You can think of tokenization like a subway or arcade token: it has no cash value but can be used to ride the train or play a game. In data centric security, a token is provided in lieu of sensitive data. The most common use case today is in credit card processing systems, as a substitute for credit card numbers. A token is basically just a random number – that’s it. The token can be made to look just like the original data type; in the case of credit cards the tokens are typically 16 digits long, they usually preserve the last four original numbers, and can even be generated such that they pass the LUHN validation check. But it’s a random value, with no mathematical relationship to the original, and no value other than as a reference to the original in some other (more secure) database. Users may choose to maintain a “token database” which associates the original value with the token in case they need to look up the original at some point in the future, but this is optional. Tokenization has advanced far beyond simple value replacement, and is lately being applied to more advanced data types. These days tokens are not just for simple things like credit cards and Social Security numbers, but also for JSON & XML files and web pages. Some tokenization solutions replace data stored within databases, while others can work on data streams – such as replacing unique cell IDs embedded in cellphone tower data streams. This enables both simple and complex data to be tokenized, at rest or in motion – and tokens can look like anything you want. Very versatile and very secure – you can’t steal what’s not there! Tokenization is used to ensure absolute security by completely removing the original sensitive values from secured data. Random values cannot be reverse engineered back to the original data. For example given a database where the primary key is a Social Security number, tokenization can generate unique and random tokens which fits in the receiving database. Some firms merely use the token as a placeholder and don’t need the original value. In fact some firms discard (or never receive) the original value – they don’t need it. Instead they use tokens simply because downstream applications might break without a SSN or compatible surrogate. Users who need to occasionally reference the original values use token vaults or equivalent technologies. They are designed to only allow credentialed administrators access to the original sensitive values under controlled conditions, but a vault compromise would expose all the original values. Vaults are commonly used for PHI and financial data, as mentioned in the last post. Masking: This is another very popular tool for protecting data elements while retaining aggregate values of data sets. For example we might substitute an individual’s Social Security number with a random number (as in tokenization), or a name randomly selected from a phone book, but retain gender. We might replace date of birth with a random value within X days of the original value to effectively preserve age. This way the original (sensitive) value is removed entirely without randomizing the value of the aggregate data set, to support later analysis. Masking is the principal method of creating useful new values without exposing the original. It is ideally suited for creating data sets which can be used for meaningful analysis without exposing the original data. This is important when you don’t have sufficient resources to secure every system within your enterprise, or don’t fully trust the environment where the data is being stored. Different masks can be applied to the same data fields, to produce different masked data for different use cases. This flexibility exposes much of the value of the original with minimal risk. Masking is very commonly used with PHI, test data management, and NoSQL analytics databases. That said, there are potential downsides as well. Masking does not offer quite as strong security as tokenization or encryption (which we will discuss below). The masked data does in fact bear some relationship to the original – while individual fields are anonymized to some degree, preservation of specific attributes of a person’s health record (age, gender, zip code, race, DoB, etc.) may provide more than enough information to reverse engineer the masked data back to the original data. Masking can be very secure, but that requires selection of good masking tools and application of a well-reasoned mask to achieve security goals while supporting desired analytics. Element/Data Field Encryption / Format Preserving Encryption (FPE): Encryption is the go-to security tool for the majority of IT and data security challenges we face today. Properly implemented, encryption provides obfuscated data that cannot be reversed into the original data value without the encryption key. What’s more, encryption can be applied to any type of data such as first and names, or entire data structures such as a file or database table. And encryption keys can be provided to select users, keeping data secret from those not entrusted with keys. But not all encryption solutions are suitable for a data centric security model. Most forms of encryption take human readable data and transform it into binary format. This is a problem for applications which expect text strings, or databases which require properly formatted Social Security numbers. These binary values create unwanted side effects and often cause applications to crash. So most companies considering data centric security need an encryption cipher that preserves at least format, and often data type as well. Typically these algorithms are applied to specific data fields (e.g.: name, Social Security number, or credit card number), and can be used on data at rest or applied to data streams as information moves from one place to the next. These encryption variants are commercially available, and provide