Cloud Data Security Cycle: Create (Rough Cut)

Last week I started talking about data security in the cloud, and I referred back to our Data Security Lifecycle from back in 2007. Over the next couple of weeks I’m going to walk through the cycle and adapt the controls for cloud computing. After that, I will dig in deep on implementation options for each of the potential controls. I’m hoping this will give you a combination of practical advice you can implement today, along with a taste of potential options that may develop down the road.

We do face a bit of the chicken and egg problem with this series, since some of the technical details of controls implementation won’t make sense without the cycle, but the cycle won’t make sense without the details of the controls. I decided to start with the cycle, and will pepper in specific examples where I can to help it make sense. Hopefully it will all come together at the end.

In this post we’re going to cover the Create phase:

Definition

Create is defined as generation of new digital content, either structured or unstructured, or significant modification of existing content. In this phase we classify the information and determine appropriate rights. This phase consists of two steps – Classify and Assign Rights.

Steps and Controls

div class=”bodyTable”>

Control	Structured/Application	Unstructured
Classify	Application Logic Tag/Labeling	Tag/Labeling
Assign Rights	Label Security	Enterprise DRM

Classify

Classification at the time of creation is currently either a manual process (most unstructured data), or handled through application logic. Although the potential exists for automated tools to assist with classification, most cloud and non-cloud environments today classify manually for unstructured or directly-entered database data, while application data is automatically classified by business logic. Bear in mind that these are controls applied at the time of creation; additional controls such as access control and encryption are managed in the Store phase. There are two potential controls:

Application Logic: Data is classified based on business logic in the application. For example, credit card numbers are classified as such based on on field definitions and program logic. Generally this logic is based on where data is entered, or via automated analysis (keyword or content analysis)
Tagging/Labeling: The user manually applies tags or labels at the time of creation e.g., manually tagging via drop-down lists or open fields, manual keyword entry, suggestion-assisted tagging, and so on.

Assign Rights

This is the process of converting the classification into rights applied to the data. Not all data necessarily has rights applied, in which cases security is provided through additional controls during later phases of the cycle. (Technically rights are always applied, but in many cases they are so broad as to be effectively non-existent). These are rights that follow the data, as opposed to access controls or encryption which, although they protect the data, are decoupled from its creation. There are two potential technical controls here:

Label Security: A feature of some database management systems and applications that adds a label to a data element, such as a database row, column, or table, or file metadata, classifying the content in that object. The DBMS or application can then implement access and logical controls based on the data label. Labels may be applied at the application layer, but only count as assigning rights if they also follow the data into storage.
Enterprise Digital Rights Management (EDRM): Content is encrypted, and access and use rights are controlled by metadata embedded with the content. The EDRM market has been somewhat self-limiting due to the complexity of enterprise integration and assigning and managing rights.

Cloud SPI Tier Implications

Software as a Service (SaaS)

Classification and rights assignment are completely controlled by the application logic implemented by your SaaS provider. Typically we see Application Logic, since that’s a fundamental feature of any application – SaaS or otherwise. When evaluating your SaaS provider you should ask how they classify sensitive information and then later apply security controls, or if all data is lumped together into a single monolithic database (or flat files) without additional labels or security controls to prevent leakage to administrators, attackers, or other SaaS customers.

In some cases, various labeling technologies may be available. You will, again, need to work with your potential SaaS provider to determine if these labels are used only for searching/sorting data, or if they also assist in the application of security controls.

Platform as a Service (PaaS)

Implementation in a PaaS environment depends completely on the available APIs and development environment. As with internal applications, you will maintain responsibility for how classification and rights assignment are managed.

When designing your PaaS-based application, identify potential labeling/classification APIs you can integrate into program logic. You will need to work with your PaaS provider to understand how they can implement security controls at both the application and storage layers – for example, it’s important to know if and how data is labeled in storage, and if this can be used to restrict access or usage (business logic).

Infrastructure as a Service (IaaS)

Classification and rights assignments depend completely on what is available from your IaaS provider. Here are some specific examples:

Cloud-based database: Work with your provider to determine if data labels are available, and with what granularity. If they aren’t provided, you can still implement them as a manual addition (e.g., a row field or segregated tables), but understand that the DBMS will not be enforcing the rights automatically, and you will need to program management into your application.
Cloud-based storage: Determine what metadata is available. Many cloud storage providers don’t modify files, so anything you define in an internal storage environment should work in the cloud. The limitation is that the cloud provider won’t be able to tie access or other security controls to the label, which is sometimes an option with document management systems. Enterprise DRM, for example, should work fine with any cloud storage provider.

This should give you a good idea of how to manage classification and rights assignment in various cloud environments. One exciting aspect is that use of tags, including automatically generated tags, is a common concept in the Web 2.0 world, and we can potentially tie this into our security controls. Users are better “trained” to tag content during creation with web-based applications (e.g., photo sharing sites & blogs), and we can take advantage of these habits to improve security.

5 Replies to “Cloud Data Security Cycle: Create (Rough Cut)”

Fernando Medrano September 9, 2009 at 7:44 pm

Is the use of data labeling in the cloud any more effective than when it’s used internally? I ask primarily because of your thoughts on a previous post:

http://securosis.com/blog/comments/data-labels-suck/

Zach Lanier September 8, 2009 at 7:59 pm

Marv,

Though those concepts were put forth 20+ years ago, they’re still being applied today — and really only *just* now are they hitting the mainstream (e.g. outside of niche, MAC-enabled platforms).

Cheers.

Marv Shaffer September 8, 2009 at 7:52 pm

Wow, label-based access control. Innovative. Check the TCSEC (aka Orange Book) for some pointers from 1985. The problem isn’t with the theory, it’s with the practice. It turns out that managing all those tags and permissions on everything is ridiculously complex. Try using Trusted Solaris sometime. Go fool around with the label-encodings file. Even just try to set up a Java security.policy file and limit the privilege on things. Until we figure out how to make security rules easier to manage, then just forget it.

Zach Lanier September 8, 2009 at 7:08 pm

Is there ever an problem with consistency in metadata and/or tagging when storing content with a cloud provider?

I see you have “determine what metadata is available”. I’m just wondering if there are ever issues with limitations imposed by the provider wherein it could be difficult to “attach” certain metadata or labels to data when moving them around, or even if you had different chunks of data spread across different providers.

Adrian Lane September 8, 2009 at 4:23 am

@Zach – With the exception of DLP, it’s the application that has to maintain that. With a DRM applications, some of the metadata is part of the stored information and is always available. And logical elements are not divisible. DRM is by design not trusting of it’s infrastructure or storage, and in some cases, not trust the application either, so it distributes into the cloud nicely. The downside of these types of solutions is the need for some authority to act as an arbiter on record state. As we can copy data infinitely, which copy is the right one? When it comes down to the application to provide version control. None of these concerns are specific to the cloud per se, but can pop up with virtualization as well.
…
Labeling may be a little different, especially if the application was using SimpleDB to store data and metadata, it would be required to define the tuples/relationships and enforce. But these flat file ‘databases’ are not so great at data consistency, integrity and relationships. Nothing inherently enforces uniqueness in key values, for example. Really not sure how labeling would work in that type of environment.
…
Anyway, you posed an interesting question.

Blog

Cloud Data Security Cycle: Create (Rough Cut)

Definition

Steps and Controls

Classify

Assign Rights

Cloud SPI Tier Implications

Software as a Service (SaaS)

Platform as a Service (PaaS)

Infrastructure as a Service (IaaS)

Comments

5 Replies to “Cloud Data Security Cycle: Create (Rough Cut)”

Leave a Reply Cancel reply

Research

Firestarter: Multicloud Deployment Structures and Blast Radius

Firestarter: So you want to multicloud?

Firestarter: 2019: Insert Winter is Coming Meme Here

Firestarter: re:Invent Security Review

Firestarter: Hardware Hacks and Lift and Pray

Sign Up for Our Newsletter

Contact

About

Quick Links

Blog

Cloud Data Security Cycle: Create (Rough Cut)

Definition

Steps and Controls

Classify

Assign Rights

Cloud SPI Tier Implications

Software as a Service (SaaS)

Platform as a Service (PaaS)

Infrastructure as a Service (IaaS)

Comments

Reader interactions

5 Replies to “Cloud Data Security Cycle: Create (Rough Cut)”

Leave a Reply Cancel reply

Research

Firestarter: Multicloud Deployment Structures and Blast Radius

Firestarter: So you want to multicloud?

Firestarter: 2019: Insert Winter is Coming Meme Here

Firestarter: re:Invent Security Review

Firestarter: Hardware Hacks and Lift and Pray

Sign Up for Our Newsletter

Contact

About

Quick Links