Tokenization Architecture: The Basics
Fundamentally, tokenization is fairly simple. You are merely substituting a marker of limited value for something of greater value. The token isn’t completely valueless – it is important within its application environment – but that value is limited to the environment, or even a subset of that environment. Think of a subway token or a gift card. You use cash to purchase the token or card, which then has value in the subway system or a retail outlet. That token has a one to one relationship with the cash used to purchase it (usually), but it’s only usable on that subway or in that retail outlet. It still has value, we’ve just restricted where it has value. Tokenization in applications and databases does the same thing. We take a generally useful piece of data, like a credit card or Social Security Number, and convert it to a local token that’s useless outside the application environment designed to accept it. Someone might be able to use the token within your environment if they completely exploit your application, but they can’t then use that token anywhere else. In practical terms, this not only significantly reduces risks, but also (potentially) the scope of any compliance requirements around the sensitive data. Here’s how it works in the most basic architecture: Your application collects or generates a piece of sensitive data. The data is immediately sent to the tokenization server – it is not stored locally. The tokenization server generates the random (or semi-random) token. The sensitive value and the token are stored in a highly-secured and restricted database (usually encrypted). The tokenization server returns the token to your application. The application stores the token, rather than the original value. The token is used for most transactions with the application. When the sensitive value is needed, an authorized application or user can request it. The value is never stored in any local databases, and in most cases access is highly restricted. This dramatically limits potential exposure. For this to work, you need to ensure a few things: That there is no way to reproduce the original data without the tokenization server. This is different than encryption, where you can use a key and the encryption algorithm to recover the value from anywhere. All communications are encrypted. The application never stores the sensitive value, only the token. Ideally your application never even touches the original value – as we will discuss later, there are architectures and deployment options to split responsibilities; for example, having a non-user-accessible transaction system with access to the sensitive data separate from the customer facing side. You can have one system collect the data and send it to the tokenization server, another handle day to day customer interactions, and a third for handling transactions where the real value is needed. The tokenization server and database are highly secure. Modern implementations are far more complex and effective than a locked down database with both values stored in a table. In our next posts we will expand on this model to show the architectural options, and dig into the technology itself. We’ll show you how tokens are generated, applications connected, and data stored securely; and how to make this work in complex distributed application environments. But in the end it all comes down to the basics – take something of wide value and replacing it with a token with restricted value. Understanding and Selecting a Tokenization Solution: Part 1, Introduction Part 2, Business Justification Share: