Login  |  Register  |  Contact

Data Security Lifecycle- Technologies, Part 1

A week or so ago I published the Data Security Lifecycle, and so far the feedback has been very positive. The lifecycle is a high-level list of controls, but now we need to dig into the technologies to support those controls.

The Data Security Lifecycle is designed to be useful today while still being visionary- it's important to keep in mind that not all these technologies are at the same maturity level. Most data security technologies are only in an adolescent stage of development- they provide real value, but are not necessarily mature. Some technologies, especially Enterprise DRM, aren't yet suitable for widespread deployment and work best for smaller teams or business units. Others, like logical controls, are barely productized, if at all. As we go through these tools, I will try to clearly address maturity level and suitability for deployment of each one. Over time I'll be digging into each of these technologies, as I've started doing with DLP, and will be able to discuss some of the more detailed implementation and maturity issues.

200710041348

In today's post we'll focus on the first two stages- Create and Store. Since we'll be delving into each technology in more detail down the road, these posts will just give a high-level overview. There are also technologies used for data security, such as data-in-motion encryption and enterprise kay management, that fall outside the lifecycle and will be covered separately.

200710041349

Create

Classify: Eventually, in this stage the content-aware combination of DLP/CMF/CMP and Enterprise DRM will classify content at the time of creation and apply rights, based on enterprise policies. Today, classification at the time of creation is a manual process. Structured data is somewhat classified based on where it's stored in the database, but since this isn't a content-aware decision and still relies on manual controls, there's no real technology to implement. In both cases I expect technology advancements over the next 1-3 years to provide classification-on-creation capabilities.

Assign Rights: Currently a manual process, but implemented through two technologies:

  1. Label Security: A feature of some database management systems that adds a label to a database row, column, or table, classifying the content in that object. The DBMS can then implement access and logical controls based on the data label.
  2. Enterprise Digital Rights Management (EDRM): Content is encrypted, and access and use rights are controlled by metadata embedded with the content. The EDRM market has been somewhat self-limiting due to the complexity of enterprise integration and assigning and managing rights. Eventually it will combine with CMF/CMP (notice I dropped DLP on purpose here) for content and policy-based rights assignment.

Access Controls: One of the most fundamental data security technologies, built into every file and management system, and one of the most poorly used.

  1. DBMS Access Controls: Access controls within a database management system, including proper use of Views vs. direct table access. Use of these controls is often complicated by connection pooling, which tends to anonymize the user between the application and the database.
  2. Administrator Separation of Duties: Newer technologies implemented in databases to limit database administrator access. On Oracle this is called Database Vault, and on IBM DB2 I believe you use the Security Administrator role and Label Based Access Controls.
  3. File System Access Controls: Normal file access controls, applied at the file or repository level. Eventually I expect to see tools to help manage these more centrally.
  4. Document Management System Access Controls: For content in a document management system (e.g., Documentum, SharePoint), the access controls built into the management system.

Encryption: The most overhyped technology for protecting data, but still the most important. More often than not encryption is used incorrectly and doesn't provide the expected level of security, but that's fodder for a future discussion.

  1. Field-Level Encryption: Encrypting fields within a database, normally at the column level. Can take 2-3 years to implement in large, legacy systems. A feature of all DBMSs, but many people look to third party solutions that are more manageable. Long-term this will just be a feature of the DBMS with third-party management tools, but that's still a few years out.
  2. Application-Level Encryption: Encrypting a piece of data at the application on collection. Better security than encrypting at the database level, but needs to be coded into the application. Can create complexities when the encrypted data is needed outside of the application, e.g., for batch jobs or other back-end processing. Tools do exist to encrypt at the application layer using keys available to other applications and systems, but that market is still very young.
  3. File/Media Encryption: In the context of databases, this is the encryption of the database files or the media they're stored on. Only protects data from physical theft and certain kinds of system-level intrusions. Can be very effective when used in combination with Database Activity Monitoring.
  4. Media Encryption: Encryption of an entire hard drive, CD/DVD, USB stick, tape, or other media. Encrypting the entire hard drive is particularly useful for protecting laptops.
  5. File Encryption: Encryption of individual files and/or directories on a system using software on that system and typically managed on a system-by-system basis by users.
  6. Distributed Encryption: Distributed encryption consists of two parts- a central policy server for key management and access control lists, and distributed agents on systems with the data. When a user attempts to access a file, the agent on the local system checks with the server and retrieves the keys if access is approved (in reverse, it can encrypt data using individual or group keys assigned by the server). Distributed encryption provides file-level granularity, while maintaining central control and easing management difficulties.

Rights Management: The enforcement of rights assigned during the Create stage.

  1. Row-Level Security: Non-label based row-level access controls. Capable of deeper logic than label security.
  2. Label Security: Described in Create
  3. Enterprise DRM: Described in Create

Content Discovery: Content-aware scanning of files, databases, and other storage repositories to identify sensitive content and take protective actions based on enterprise policies.

  1. Database Content Discovery: Use of a database-specific tool to scan for sensitive content outside of expected fields. For example, searching for credit card numbers stored outside of the encrypted credit card column. This is a very early market and a feature we sometimes see in a Database Activity Monitoring tool.
  2. Data Loss Prevention/Content Monitoring and Filtering Discovery: A feature of most top-tier products to scan data-at-rest for sensitive data outside of approved repositories. More details here.
  3. Storage/Data Classification Tools: Non-security tools used in ILM to classify content. While focused on a non-security buying center, and often with limited content analysis capabilities, these tools are capable of dealing with very large storage environments and we expect to see increasing overlap, if not outright merger, with DLP/CMF Content Discovery.

Again, this is all still in an early stage of development so I'm extremely interested in your feedback. I know one technology for sure I've left off here, and I'll be interested to see who picks up on it first...

—Rich

Previous entry: Off Topic: Must See Video On The FCC | | Next entry: Retailers B*tch Slap PCI Security Standards Council, If You Believe Them

Comments:

If you like to leave comments, and aren't a spammer, register for the site and email us at info@securosis.com and we'll turn off moderation for your account.

By Adrian Lane  on  10/04  at  08:58 PM

Wow, there is a lot here to digest, and I have a lot of comments, so I will be succinct as possible:
1.This looks like a ‘sensitive data security lifecycle’, as opposed to ‘data security lifecycle’, the subtle distinction being the data has a lifecycle, not the process.
2.It is implied that you know where the sensitive data is prior to the process.  That is not always the case, in fact that complete picture of data and location may be the exception rather than the rule. I understand the model is to set policy at the time of data creation, but I have never been that lucky to not carry legacy data forward.  So are you advocating a migration approach where you push all existing data through the new process or systems over time? Are you suggesting that the lifecycle is to be applied over existing systems and processes?
3. I have never been comfortable with classification and setting rights prior to discovery.  You may not have a complete picture of the sensitive data that is present, and you may not understand its use.  Some peers have stated that they believe a ‘Monitor First’ approach to discover, in essence, what you don’t know.  Then you set policy.  I tend to prefer the assessment or discovery first approach prior to setting policy.  Still, I think it has to be one or the other, otherwise your policies are quickly obsolete.
4.On the subject of Policy Management, where would this fit in? 
5.t of the Administrator Separation of duties, your point about the need for products to mature in the next 1-3 years really hits the mark, and this is one of the biggest problems.  Today all of the claims are that the products provide this, but few even come close.  The ability to set and implement policy, collect data and statistic, verify that the data meets the policy, and (even tougher) verify that the policy is appropriate is next to impossible to do so and have a reasonable expectation of catching fraud.  To hammer this point home, Oracle’s Database Vault does provide the separation of duties within the credentials, but how many auditors could describe what label security is?  How many auditors are going to be able to discover the table identifying number and the event code for a failed login to implement the policy?  Some form of tool needs to take a policy and implement it, allowing the setter of the policy not to have deep database knowledge while still providing separation of duties. 
6.Content discovery, in and of itself is good, but in terms of structural data a ‘structural discovery’ has bearing on the data security and usage. 
7.Audit of events, especially in the area of policy change?  Is this meant to be covered by monitoring and implicit to access controls?

By rmogull  on  10/04  at  09:56 PM

Adrian, some responses and some follow up questions to make sure I’‘m reading you right:

1. I think I’‘m missing your point here? We need to apply controls of some sort to pretty much all data, not just the sensitive stuff, and this is centered on the data, not the process. Maybe I just need more coffee.
2. That’s what the Discovery controls are for in a couple of the phases- since this will never be perfect, and classification will even change during the life of the data depending on changing contexts or business policies, Discovery is the control to sweep the environment, find violations based on current policy, and make necessary changes. Thu we can apply the lifecycle onto existing systems and slowly migrate data into it. For example, when I write up the process for implementing data classification it will work in manageable stages, focusing on a few critical policies at a time, rather than trying to fix everything at once. Some of this will realistically take YEARS to roll out, but we have to start someplace.
3. That’s the bit about automatic classification (which doesn’‘t really exist yet) in the Create phase. We can have a mix of mandatory and discretionary policies to help guide the initial classification. It won’‘t be 100%, but that’s whay we have the layers and multiple places where we can reclassify (via Discovery) the data. We know for sure there’s some stuff that needs to be protected at a certain level. For other things we need to make changes over time. I’‘m realizing this isn’‘t reflected well in the lifecycle and I think I need to figure out a way to manage changing classification over time better, which ties in with your business context comments from the other day. I’‘m open to suggestions.
4. Ouside the cycle- I haven’‘t been publishing everything in order, but that falls in with enterprise key management and a couple other things that support the cycle but aren’‘t focused on an individual data element.
5. Absolutely agree, and it’s a big market opportunity for a database security company (nudge nudge). Oracle/IBM are basically just patching this on, they aren’‘t really solving the problem at the fundamental level. It’s also why I’‘m so hot on Database Activity Monitoring.
6. Yep, totally agree and want to see more products for this. As I think you know I’‘ve been pushing for it for at least 2 years.
7. Auditing is part of Activity Monitoring, so it fits in a few places. I haven’‘t gotten to those controls yet, but will in the next post. There’s also auditing outside of the lifecycle, which is part of that "management backplane" I keep referencing and need to write up.

Reasonable responses? Or you think the model needs to be changed?

By Adrian Lane  on  10/04  at  11:46 PM

1. The problem is having too many control ‘‘frameworks’’ buzzing in my head, because I tend to think of the entire process being assessed on some cyclic basis, so the misunderstanding was mine.  In this case, the data has a cycle that it goes through.  I love the idea that certain types of data having a shelf life, which is a concept I had not even heard discussed until the last 6 months. 
2. Both.  Got it.

I think the rest are reasonable responses. The model seems to be flexible enough conceptually to allow for most considerations. 

I do have a question that may simplify the model, based upon a bias of mine.  The bias is I do not believe that there is no such a thing as DRM on unstructured data, but rather the process of cataloging, quantifying, assigning ownership rights to, storing, encrypting and (possibly) placing in context to an application (an attribute list) in fact produces a structural reference. At that point is a media file any different than a row in a database? If you buy that assertion, does this model lose anything if you treat everything as structured data outside of a formal ‘‘database’‘?

By rmogull  on  10/05  at  01:40 AM

Damn- you’‘re forcing me to reveal some cards before I’‘m ready.

I think we’‘re moving into a world where all data is broken into discrete elements with labeling/metadata to support lifecycle (security and otherwise) controls. Those labels will flow as data moves between structured and unstructured data (think back to my old slide on "there’s no such thing as data at rest" I know you’‘ve seen).

For example, if I run a query in a database, the data returned is labeled, and if I put that data into a spreadsheet the spreadsheet is modified to meet the classification label of the most sensitive data element. Longer term, we’‘ll even become granular about data elements *within* structured data, but that’s easily 5-10 years out.

Right now I feel I need to break it apart between structured and unstructured just to make it more practical when securing operational enviroments. There’s a lot of momentum to keep them separate, with different teams managing different parts of the problem, and that’s a fight I can’‘t win.

Long term, as you point out, that distinction goes away. We’‘ll talk about "data elements", and be able to apply controls on an element basis no matter where they are or how they’‘re used.

We won’‘t be out of jobs for a very long time.

By rybolov  on  10/08  at  08:53 PM

Hi Rich, sorry it took me awhile to comment on this.

I do notice a DLP slant on your lifecycle.  Not a bad thing, but you need to consider that as a design bias.

I have a theory that data is like matter and never created or destroyed, just altered.  In this case, where you have create I would say something along the lines of "aggregate" or "collect".

Classification also goes hand-in-hand with determining what regulations that data goes under.  Most policy, standards, regulations, and compliance are based on the type and quantity of data.  IE, if you take credit cards, you just signed up for PCI.  Classification tells you that.

Use or share should include derivative data.  A good example is the US Census.  They collect little bits of data from everyone and then use that data to derive a whole bunch of statistical data.  The statistical data has less protection needs than the raw data, but in some cases it happens the opposite way (think intelligence work and the aggregate and analysis being more valuable than the individual points).  Point is, the derivative data needs to be forked back into the beginning of the cycle and be classified to determine how to protect it.

"Share" *could* include something like non-disclosure or other managerial controls.  Typically what you do is take your level of protection for that data type and try to hold your partners to a similar level.  Ideally, we hand them a list of protections and they take the data and spin off their own data security lifecycle.

Archive should contain something about retention periods, escrow, and off-siting.  Not sure how it all fits in, but maybe mull it over a bit.


For fun, contrast your data security lifecycle with the Apple marketing from a couple years ago: "Rip, Mix, Burn." =)

By rmogull  on  10/10  at  03:13 AM

You’‘re the second person to bring up 2 issues- the retention side, and the change in context over the life of the data.

I need to figure out a better way of reflecting both in the lifecycle. Retention is easy, but changing business/use context is more difficult. I’‘m open to suggestions…

By Data Security Lifecycle- Technologies  on  10/21  at  11:34 PM

[...] our last post on this topic we covered the technologies that encompass the Create and Store stages of the Data Security [...]

Name:

Email:

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below: