Login  |  Register  |  Contact

Information-centric Security

Wednesday, August 10, 2011

Data Security Lifecycle 2.0: Functions, Actors, and Controls

By Rich

In our last post we added location and access attributes to the Data Security Lifecycle. Now let’s start digging into the data flow and controls.

To review, so far we’ve completed our topographic map for data:


This illustrates, at a high level, how data moves in and out of different environments, and to and from different devices. It doesn’t yet tell us which controls to use or where to place them. That’s where the next layer comes in, as we specify locations, actors (‘who’), and functions:

Functions and Controls


There are three things we can do with a given datum:

  • Access: View/access the data, including copying, file transfers, and other exchanges of information.
  • Process: Perform a transaction on the data: update it, use it in a business processing transaction, etc.
  • Store: Store the data (in a file, database, etc.).

The table below shows which functions map to which phases of the lifecycle:

Functions Table

Each of these functions is performed in a location, by an actor (person).


Essentially, a control is what we use to restrict a list of possible actions down to allowed actions. For example, encryption can be used to restrict access to data, application controls to restrict processing via authorization, and DRM storage to prevent unauthorized copies/accesses.

To determine the necessary controls; we first list out all possible functions, locations, and actors; and then which ones to allow. We then determine what controls we need to make that happen (technical or process). Controls can be either preventative or detective (monitoring), but keep in mind that monitoring controls that don’t tie back into some sort of alerting or analysis merely provide an audit log, not a functional control.

This might be a little clearer for some of you as a table:

Controls Table

Here you would list a function, the actor, and the location, and then check whether it is allowed or not. Any time you have a ‘no’ in the allowed box, you would implement and document a control.

Tying It together

Functions and Cycle

In essence what we’ve produced is a high-level version of a data flow diagram (albeit not using standard programming taxonomy). We start by mapping the possible data flows, including devices and different physical and virtual locations, and at which phases in its lifecycle data can move between those locations. Then, for each phase of the lifecycle in a location, we determine which functions, people/systems, and more-granular locations for working with the data are possible. We then figure out which we want to restrict, and what controls we need to enforce those restrictions.

This looks complex, but keep in mind that you aren’t likely to do it for all data within an entire organization. For given data in a given application/implementation you’ll be working with a much more restrictive subset of possibilities. This clearly becomes more involved with bigger applications, but practically speaking you need to know where data flows, what’s possible, and what should be allowed, to design your security.

In a future post we’ll show you an example, and down the road we also plan to produce a controls matrix which will show you where the different data security controls fit in.


Tuesday, August 09, 2011

Data Security Lifecycle 2.0 and the Cloud: Locations and Access

By Rich

In our last post we reviewed the Data Security Lifecycle, but other than some minor wording changes (and a prettier graphic thanks to PowerPoint SmartArt) it was the same as our four-year-old original version.

But as we mentioned, quite a bit has changed since then, exemplified by the emergence and adoption of cloud computing and increased mobility. Although the Lifecycle itself still applies to basic, traditional infrastructure, we will focus on these more complex use cases, which better reflect what most of you are dealing with on a day to day basis.


One gap in the original Lifecycle was that it failed to adequately address movement of data between repositories, environments, and organizations. A large amount of enterprise data now transitions between a variety of storage locations, applications, and operating environments. Even data created in a locked-down application may find itself backed up someplace else, replicated to alternative standby environments, or exported for processing by other applications. And all of this can happen at any phase of the Lifecycle.

We can illustrate this by thinking of the Lifecycle not as a single, linear operation, but as a series of smaller lifecycles running in different operating environments. At nearly any phase data can move into, out of, and between these environments – the key for data security is identifying these movements and applying the right controls at the right security boundaries.

As with cloud deployment models, these locations may be internal, external, public, private, hybrid, and so on. Some may be cloud providers, other traditional outsourcers, or perhaps multiple locations within a single data center.

For data security, at this point there are four things to understand:

  1. Where are the potential locations for my data?
  2. What are the lifecycles and controls in each of those locations?
  3. Where in each lifecycle can data move between locations?
  4. How does data move between locations (via what channel)?


Now that we know where our data lives and how it moves, we need to know who is accessing it and how. There are two factors here:

  1. Who accesses the data?
  2. How can they access it (device & channel)?

Data today is accessed from all sorts of different devices. The days of employees only accessing data through restrictive applications on locked-down desktops are quickly coming to an end (with a few exceptions). These devices have different security characteristics and may use different applications, especially with applications we’ve moved to SaaS providers – who often build custom applications for mobile devices, which offer different functionality than PCs.

Later in the model we will deal with who, but the diagram below shows how complex this can be – with a variety of data locations (and application environments), each with its own data lifecycle, all accessed by a variety of devices in different locations. Some data lives entirely within a single location, while other data moves in and out of various locations… and sometimes directly between external providers.


This completes our “topographic map” of the Lifecycle. In our next post we will dig into mapping data flow and controls. In the next few posts we will finish covering background material, and then show you how to use this to pragmatically evaluate and design security controls.


Introducing the Data Security Lifecycle 2.0

By Rich

Four years ago I wrote the initial Data Security Lifecycle and a series of posts covering the constituent technologies. In 2009 I updated it to better fit cloud computing, and it was incorporated into the Cloud Security Alliance Guidance, but I have never been happy with that work. It was rushed and didn’t address cloud specifics nearly sufficiently.

Adrian and I just spent a bunch of time updating the cycle and it is now a much better representation of the real world. Keep in mind that this is a high-level model to help guide your decisions, but we think this time around we were able to identify places where it can more specifically guide your data security endeavors.

(As a side note, you might notice I use “data security” and “information-centric security” interchangeably. I think infocentric is more accurate, but data security is more recognized, so that’s what I tend to use.)

If you are familiar with the previous model you will immediately notice that this one is much more complex. We hope it’s also much more useful. The old model really only listed controls for data in different phases of the lifecycle – and didn’t account for location, ownership, access methods, and other factors. This update should better reflect the more complex environments and use cases we tend to see these days.

Due to its complexity, we need to break the new Lifecycle into a series of posts. In this first post we will revisit the basic lifecycle, and in the next post we will add locations and access.

Data Security Lifecycle

The lifecycle includes six phases from creation to destruction. Although we show it as a linear progression, once created, data can bounce between phases without restriction, and may not pass through all stages (for example, not all data is eventually destroyed).

  1. Create: This is probably better named Create/Update because it applies to creating or changing a data/content element, not just a document or database. Creation is the generation of new digital content, or the alteration/updating of existing content.
  2. Store: Storing is the act committing the digital data to some sort of storage repository, and typically occurs nearly simultaneously with creation.
  3. Use: Data is viewed, processed, or otherwise used in some sort of activity.
  4. Share: Data is exchanged between users, customers, and partners.
  5. Archive: Data leaves active use and enters long-term storage.
  6. Destroy: Data is permanently destroyed using physical or digital means (e.g., cryptoshredding).

These high-level activities describe the major phases of a datum’s life, and in a future post we will cover security controls for each phase. But before we discuss controls we need to incorporate two additional aspects: locations and access devices.


Tuesday, September 22, 2009

Cloud Data Security: Archive and Delete (Rough Cut)

By Rich

In our last post in this series, we covered the cloud implications of the Share phase of Data Security Cycle. In this post we will move on to the Archive and Destroy phases.



Archiving is the process of transferring data from active use into long-term storage. This can include archived storage at your cloud provider, or migration back to internal archives.

From a security perspective we are concerned with two controls: encrypting the data, and tracking the assets when data moves to removable storage (tapes, or external drives for shipping transfers). Since many cloud providers are constantly backing up data, archiving often occurs outside customer control, and it’s important to understand your provider’s policies and procedures.

Steps and Controls

EncryptionDatabase EncryptionTape Encryption
Storage Encryption
Asset ManagementAsset Management


In the Store phase we covered a variety of encryption options, and if content is kept encrypted as it moves into archived storage, no additional steps are needed. Make sure your archiving system takes the encryption keys into account, since restored data is useless if the corresponding decryption keys are unavailable. In cloud environments data is often kept live due to the elasticity of cloud storage, and might just be marked with some sort of archive tag or metadata.

  1. Database Encryption: We reviewed the major database encryption options in the Store phase. The only archive-specific issue is ensuring the database replication/archiving method supports maintenance of the existing encryption. Another option is to use file encryption to secure the database archives. For larger databases, tape or storage encryption is often used.
  2. Tape Encryption: Encryption of the backup tapes using either hardware or software. There are a number of tools on the market and this is a common practice. Hardware provides the best performance, and inline appliances can work with most existing tape systems, but we are increasingly seeing encryption integrated into backup software and even tape drives. If your cloud provider manages tape backups (which many do), it’s important to understand how those tapes are protected – is any existing encryption maintained, and if not, how are the tapes encrypted and keys managed?
  3. Storage Encryption: Encryption of data archived to disk, using a variety of techniques. Although some hardware tools such as inline appliances and encrypted drivesxist, this is most commonly performed in software. We are using Storage Encryption as a generic term to cover any file or media encryption for data moved to long-term disk storage.

Asset Management

One common problem in both traditional and cloud environments is the difficulty of tracking the storage media containing archived data. Merely losing the location of unencrypted media may require a breach disclosure, even if the tape or drive is likely still located in a secure area – if you can’t prove it’s there, it is effectively lost. From a security perspective, we aren’t as concerned with asset management for encrypted content – it’s more of an issue for unencrypted sensitive data. Check with your cloud provider to understand their asset tracking for media, or implement an asset management system and procedures if you manage your own archives of cloud data.

Cloud SPI Tier Implications

Software as a Service (SaaS)

Archive security options in a SaaS deployment are completely dependent on your provider. Determine their backup procedures (especially backup rotations), any encryption, and asset management (especially for unencrypted data). Also determine if there are any differences between backups of live data and any long-term archiving for data moved off primary systems.

Platform as a Service (PaaS)

Archive security in PaaS deployments is similar to SaaS when you transition data to, or manage data with, the PaaS provider. You will need to understand the provider’s archive mechanisms and security controls. If the data resides in your systems, archive security is no different than managing secure archives for your traditional data stores.

Infrastructure as a Service (IaaS)

For completely private cloud deployments, IaaS Archive security is no different than managing traditional archived storage. You’ll use some form of media encryption and asset management for sensitive data. For cloud storage and databases, as with PaaS and SaaS you need to understand the archival controls used by your provider, although any data encrypted before moving to the cloud is clearly still secure.



Destroy is the permanent destruction of data that’s no longer needed, and the use of content discovery to validate that it is not lingering in active storage or archives.

Organizations commonly destroy unneeded data, especially sensitive data that may be under regulatory compliance requirements. The cloud may complicate this if your provider’s data management infrastructure isn’t compatible with your destruction requirements (e.g., the provider is unable to delete data from archived storage). Crypto-shredding may be the best option for many cloud deployments, since it relies less on complete access to all physical media, which may be difficult or impossible even in completely private/internal cloud deployments.

Steps and Controls

Crypto-ShreddingEnterprise Key Management
Secure DeletionDisk/Free Space Wiping
Physical DestructionPhysical Destruction
Content DiscoveryDatabase DiscoveryDLP/CMP Discovery
Storage/Data Classification Tools
Electronic Discovery


Crypto-shredding is the deliberate destruction of all encryption keys for the data; effectively destroying the data until the encryption protocol used is (theoretically, some day) broken or capable of being brute-forced. This is sufficient for nearly every use case in a private enterprise, but shouldn’t be considered acceptable for highly sensitive government data. Encryption tools must have this as a specific feature to absolutely ensure that the keys are unrecoverable. Crypto-shredding is an effective technique for the cloud since it ensures that any data in archival storage that’s outside your physical control is also destroyed once you make the keys unavailable. If all data is encrypted with a single key, to crypto-shred you’ll need to rotate the key for active storage, then shred the “old” key, which will render archived data inaccessible.

We don’t mean to oversimplify this option – if your cloud provider can’t rotate your keys or ensure key deletion, crypto-shredding isn’t realistic. If you manage your own keys, it should be an important part of your strategy.

Disk/Free Space Wiping and Physical Destruction

These options is only available when you have low-level administrative access to the physical storage. It includes software or hardware designed to destroy data on hard drives and other media, or physical destruction of the drives. At a minimum the tool should overwrite all writable space on the media 1-3 times, and 7 times is recommended for sensitive data. Merely formatting over data is not sufficient. Secure wiping is highly recommended for any systems with sensitive data that are sold or reused, especially laptops and desktops. File-level secure deletion tools exist for when it’s necessary to destroy just a portion of data in active storage, but are not as reliable as a full media wipe.

For physical destruction (again, assuming you have access to the drives), there are two options:

  1. Degaussing: Use of strong magnets to scramble magnetic media like hard drives and backup tapes. Dedicated solutions should be used to ensure data is unrecoverable, and it’s highly recommended you confirm the efficiency of a degaussing tool by randomly performing forensic analysis on wiped media.
  2. Physical Destruction: Complete physical destruction of storage devices, focusing on shredding the actual magnetic media (platters or tape).

Due to the abstraction involved in cloud computing, these will often not be available, although your provider may include them as part of their procedures for management of their drives. When managing a private/internal cloud, you can include physical media wiping or destruction as part of your procedures for managing drives removed from active service. In IaaS deployments, you may retain the low level access to overwrite data in individual virtual machines or storage.

Content Discovery

When truly sensitive data reaches end-of-life, you need to make sure that the destroyed data is really destroyed. Use of content discovery tools helps ensure that no copies or versions of the data remain accessible in the enterprise. Considering how complex our storage, archive, and backup strategies in the cloud are today, it is impossible to absolutely guarantee the data is unrecoverable, but content discovery does reduce the risk of retrieval.

As with content discovery in the Store phase, these tools are only effective if they have access to the storage infrastructure; they cannot work through an application interface unless they are built into the application.

For details on Database Discovery and DLP/CMP please see the Store phase. There are two additional technology categories we also see used for this purpose:

  1. Storage/Data Classification and Search: These are tools typically used and managed by enterprise storage teams. Their content analysis is generally less detailed than DLP/CMP tools, but can be helpful for broad searches for stored data. Storage/Data classification tools are third-party tools which crawl a storage environment and use rule sets (usually keywords and regular expressions) to apply metadata tags to files. If your cloud storage offers standard file access, they may be helpful. Search is either built into the application or a third-party tool that indexes stored data. While these are not ideal tools for content discovery to ensure data destruction, search may be your only option in some SaaS deployments.
  2. Electronic Discovery: Tools dedicated to the electronic discovery of data for legal proceedings. Likely the same tools that will be used to search for destroyed data if there’s ever reason to attempt recovery in the future. As with most of the tools in this section, they are not cloud specific and may not be an option.

Cloud SPI Tier Implications

Software as a Service (SaaS)

As with Archive, your data destruction options are completely dependent on your provider. Typically you will be limited to some level of deletion, although in some applications crypto-shredding may be an option. What’s most important is to understand how your provider handles data destruction, and to obtain any documentation and service level agreements that are available. Search will usually be your best content discovery option.

Platform as a Service (PaaS)

For data stored with your PaaS provider, unless you have file system access of some sort you will face the same limitations as with SaaS providers. If you encrypt data on your side before sending it to the platform, crypto-shredding is a good option. Any data stored in your environment is obviously easier to destroy, since you have greater control of the infrastructure and physical media. Content discovery may be an option, but this depends completely on how your PaaS-based application is designed.

Infrastructure as a Service (IaaS)

For cloud data storage (database and file based), crypto-shredding is likely your best option. For other infrastructure deployments, particularly those with virtual machines and disks, you may be able to overwrite stored data. Content discovery using DLP/CMP will probably work, again depending on the details of your deployment.


Monday, September 21, 2009

Cloud Data Security: Share (Rough Cut)

By Rich

In our last post in this series, we covered the cloud implications of the Use phase of our Data Security Cycle. In this post we will move on to the Share phase. Please remember that we are only covering technologies at a high level in this series on the cycle; we will run a second series on detailed technical implementations of data security in the cloud a little later.


Share includes controls we use when exchanging data between users, customers, and partners. Where Use focuses on controls when a user interacts with the data as an individual, Share includes the controls once they start to exchange that data (or back-end data exchange). In cloud computing we see a major emphasis on application and logical controls, with encryption for secure data exchange, DLP/CMP to monitor communications and block policy violations, and activity monitoring to track back-end data exchanges.

Cloud computing introduces two new complexities in managing data sharing:

  • Many data exchanges occur within the cloud, and are invisible to external security controls. Traditional network and endpoint monitoring probably won’t be effective. For example, when you share a Google Docs document to another user, the only local interactions are through a secure browser connection. Email filtering, a traditional way of tracking electronic document exchanges, won’t really help.
  • For leading edge enterprises that build dynamic data security policies using tools like DLP/CMP, those tools may not work off a cloud-based data store. If you are building a filtering policy that matches account numbers from a customer database, and that database is hosted in the cloud as an application or platform, you may need to perform some kind of mass data extract and conversion to feed the data security tool.

Although the cloud adds some complexity, it can also improve data sharing security in a well-designed deployment. Especially in SaaS deployments, we gain new opportunities to employ logical controls that are often difficult or impossible to manage in our current environments.

Although our focus is on cloud-specific tools and technologies, we still review some of the major user-side options that should be part of any data security strategy.

Steps and Controls

Activity Monitoring and EnforcementDatabase Activity Monitoring
Cloud Activity Monitoring/Logs
Application Activity Monitoring
Network DLP/CMP
Endpoint DLP/CMP
EncryptionNetwork/Transport Encryption
Application-Level Encryption
Email Encryption
File Encryption/EDRM
Network/Transport Encryption
Logical ControlsApplication Logic
Row Level Security
Application Securitysee Application Security Domain section

Activity Monitoring and Enforcement

We initially covered Activity Monitoring and Enforcement in the Use phase, and many of these controls are also used in the Share phase. Our focus now switches from watching how users interact with the data, to when and where they exchange it with others. We include technologies that track data exchanges at four levels:

  • Individual users exchanging data with other internal users within the cloud or a managed environment.
  • Individual users exchanging data with outside users, either via connections made from the cloud directly, or data transferred locally and then sent out.
  • Back-end systems exchanging data to/from the cloud, or within multiple cloud-based systems.
  • Back-end systems exchanging data to external systems/servers; for example, a cloud-based employee human resources system that exchanges healthcare insurance data with a third-party provider.
  1. Database Activity Monitoring (DAM): We initially covered DAM in the Use phase. In the Share phase we use DAM to track data exchanges to other back-end systems within or outside the cloud. Rather than focusing on tracking all activity in the database, the tool is tuned to focus on these exchanges and generate alerts on policy violations (such as a new query being run outside of expected behavior), or track the activity for auditing and forensics purposes. The challenge is to deploy a DAM tool in a cloud environment, but an advantage is greater visibility into data leaving the DBMS than might otherwise be possible.
  2. Application Activity Monitoring: Similar to DAM, we initially covered this in the Use phase. We again focus our efforts on tracking data sharing, both by users and back-end systems. While it’s tougher to monitor individual pieces of data, it’s not difficult to build in auditing and alerting for larger data exchanges, such as outputting from a cloud-based database to a spreadsheet.
  3. Cloud Activity Monitoring and Logs: Depending on your cloud service, you may have access to some level of activity monitoring and logging in the control plane (as opposed to building it into your specific application). To be considered a Share control, this monitoring needs to specify both the user/system involved and the data being exchanged.
  4. Network Data Loss Prevention/Content Monitoring and Protection: DLP/CMP uses advanced content analysis and deep packet inspection to monitor network communications traffic, alerting on (and sometimes enforcing) policy violations. DLP/CMP can play multiple roles in protecting cloud-based data. In managed environments, network DLP/CMP policies can track (and block) sensitive data exchanges to untrusted clouds. For example, policies might prevent users from attaching files with credit card numbers to a cloud email message, or block publishing of sensitive engineering plans to a cloud-based word processor. DLP can also work in the other direction: monitoring data pulled from a cloud deployment to the desktop or other non-cloud infrastructure. DLP/CMP tools aren’t limited to user activities, and can monitor, alert, and enforce policies on other types of TCP data exchange, such as FTP, which might be used to transfer data from the traditional infrastructure to the cloud. DLP/CMP also has the potential to be deployed within the cloud itself, but this is only possible in a subset of IaaS deployments, considering the deployment models of current tools. (Note that some email SaaS providers may also offer DLP/CMP as a service).
  5. Endpoint DLP/CMP: We initially covered Endpoint DLP/CMP in the Use phase, where we discussed monitoring and blocking local activity. Many endpoint DLP/CMP tools also track network activity – this is useful as a supplement when the endpoint is outside the corporate network’s DLP/CMP coverage.


In the Store phase we covered encryption for protecting data at rest. Here we expand to cover data in motion. Keep in mind that additional encryption is only needed if the data would otherwise be exchanged as plain text – there’s no reason or need to redundantly re-encrypt already encrypted network traffic.

  1. Network/Transport Encryption: As data moves between applications, databases, the cloud, and other locations, the network connections should be encrypted using a standard network-layer protocol. For larger systems where this could affect performance, hardware acceleration is recommended. Virtual Private Networks are useful for encrypting data moving in and out of clouds in certain deployment models.
  2. Application Level Encryption: As we discussed in the Store phase, data encrypted by an application on collection is ideally protected as it moves throughout the rest of the application stack. Don’t forget that at some point the data is probably decrypted to be used, so it’s important to map the data flow and determine potential weak points.
  3. Email Encryption: Email encryption isn’t cloud-specific, but since email is one of the most common ways of exchanging data, including reports and data dumps from cloud services, encryption is often relevant for cloud deployments – especially when built into the cloud application/service.
  4. File Encryption and Enterprise Digital Rights Management: These technologies were discussed in detail in the Store phase. They also apply in the Share phase since encrypted files or DRM protected documents are still protected as they are moved, not just in storage. For cloud security purposes, encryption or EDRM may be built into various data exchange mechanisms – with EDRM for user files, and encryption as a more general option.

Logical Controls

We discussed Logical Controls in the Use phase, and they can also be used to manage data exchange, not just transaction activity.

Application Security

As with logical controls, we discussed Application Security in the Use phase. Again, a full discussion of cloud application security issues is beyond the scope of this post, and we recommend you read the Cloud Security Alliance Guidance for more details.

Cloud SPI Tier Implications

Software as a Service (SaaS)

Data sharing in SaaS deployments is encapsulated within the application, is connected to back-end external applications, or involves generating data dumps to transfer the content to a local system. Application and logical controls are your best defense, combined with encryption to cover any data transfers. Once data leaves the SaaS application, DLP/CMP may be useful for tracking the content, or to protect it from leaving your managed environment. DLP/CMP is also useful to determine if the data should go to the cloud at all, and ensure that any data is transferred conforms to policy requirements. Since most SaaS solutions rely principally on HTTP for communications/access, most off-the-shelf DLP tools will work.

Platform as a Service (PaaS)

Depending on your PaaS deployment, it’s again likely that application logic will be your best security option, followed by proper use of encryption to secure communications. You may also be able to deploy monitoring in your application that connects to the PaaS provider if they don’t offer a desired level of monitoring/logging, but that will only track connections from your managed environment (someone trying to compromise the PaaS directly, without going through your application, won’t appear in your application logs).

Infrastructure as a Service (IaaS)

VPNs are commonly used to protect communications to IaaS infrastructure, both internal and external. When VPNs aren’t an option, such as with many types of cloud-based storage, SSL/TLS network encryption is usually available. Any additional Share controls rely completely on what you can deploy in the infrastructure. Any monitoring/auditing such as DLP require some sort of network traffic to analyze, or an alternative hook, such as a local agent.


Friday, September 18, 2009

Cloud Data Security: Use (Rough Cut)

By Rich

In our last post in this series, we covered the cloud implications of the Store phase of Data Security Cycle (our first post was on the Create phase). In this post we’ll move on to the Use phase. Please remember we are only covering technologies at a high level in this series – we will run a second series on detailed technical implementations of data security in the cloud a little later.


Use includes the controls that apply when the user is interacting with the data – either via a cloud-based application, or the endpoint accessing the cloud service (e.g., a client/cloud application, direct storage interaction, and so on). Although we primarily focus on cloud-specific controls, we also cover local data security controls that protect cloud data once it moves back into the enterprise. These are controls for the point of use – we will cover additional network based controls in the next phase.

Users interact with cloud data in three ways:

  • Web-based applications, such as most SaaS applications.
  • Client applications, such as local backup tools that store data in the cloud.
  • Direct/abstracted access, such as a local folder synchronized with cloud storage (e.g., Dropbox), or VPN access to a cloud-based server.

Cloud data may also be accessed by other back-end servers and applications, but the usage model is essentially the same (web, dedicated application, direct access, or an abstracted service).

Steps and Controls

Activity Monitoring and EnforcementDatabase Activity Monitoring
Application Activity Monitoring
Endpoint Activity Monitoring
File Activity Monitoring
Portable Device Control
Endpoint DLP/CMP
Cloud-Client Logs
Rights ManagementLabel Security
Enterprise DRM
Logical ControlsApplication Logic
Row Level Security
Application Securitysee Application Security Domain section

Activity Monitoring and Enforcement

Activity Monitoring and Enforcement includes advanced techniques for capturing all data access and usage activity in real or near-real time, often with preventative capabilities to stop policy violations. Although activity monitoring controls may use log files, they typically include their own collection methods or agents for deeper activity details and more rapid monitoring. Activity monitoring tools also include policy-based alerting and blocking/enforcement that log management tools lack.

None of the controls in this category are cloud specific, but we have attempted to show how they can be adapted to the cloud. These first controls integrate directly with the cloud infrastructure:

  1. Database Activity Monitoring (DAM): Monitoring all database activity, including all SQL activity. Can be performed through network sniffing of database traffic, agents installed on the server, or external monitoring, typically of transaction logs. Many tools combine monitoring techniques, and network-only monitoring is generally not recommended. DAM tools are managed externally to the database to provide separation of duties from database administrators (DBAs). All DBA activity can be monitored without interfering with their ability to perform job functions. Tools can alert on policy violations, and some tools can block certain activity. Current DAM tools are not cloud specific, and thus are only compatible with environments where the tool can either sniff all network database access (possible in some IaaS deployments, or if provided by the cloud service), or where a compatible monitoring agent can be installed in the database instance.
  2. Application Activity Monitoring: Similar to Database Activity Monitoring, but at the application level. As with DAM, tools can use network monitoring or local agents, and can alert and sometimes block on policy violations. Web Application Firewalls are commonly used for monitoring web application activity, but cloud deployment options are limited. Some SaaS or PaaS providers may offer real time activity monitoring, but log files or dashboards are more common. If you have direct access to your cloud-based logs, you can use a near real-time log analysis tool and build your own alerting policies.
  3. File Activity Monitoring: Monitoring access and use of files in enterprise storage. Although there are no cloud specific tools available, these tools may be deployable for cloud storage that uses (or presents an abstracted version of) standard file access protocols. Gives an enterprise the ability to audit all file access and generate reports (which may sometimes aid compliance reporting). Capable of independently monitoring even administrator access and can alert on policy violations.

The next three tools are endpoint data security tools that are not cloud specific, but may still be useful in organizations that manage endpoints:

  1. Endpoint Activity Monitoring: Primarily a traditional data security tool, although it can be used to track user interactions with cloud services. Watching all user activity on a workstation or server. Includes monitoring of application activity; network activity; storage/file system activity; and system interactions such as cut and paste, mouse clicks, application launches, etc. Provides deeper monitoring than endpoint DLP/CMF tools that focus only on content that matches policies. Capable of blocking activities such as pasting content from a cloud storage repository into an instant message. Extremely useful for auditing administrator activity on servers, assuming you can install the agent. An example of cloud usage would be deploying activity monitoring agents on all endpoints in a customer call center that accesses a SaaS for user support.
  2. Portable Device Control: Another traditional data security tool with limited cloud applicability, used to restrict access of, or file transfers to, portable storage such as USB drives and DVD burners. For cloud security purposes, we only include tools that either track and enforce policies based on data originating from a cloud application or storage, or are capable of enforcing policies based on data labels provided by that cloud storage or application. Portable device control is also capable of allowing access but auditing file transfers and sending that information to a central management server. Some tools integrate with encryption to provide dynamic encryption of content passed to portable storage. Will eventually be integrated into endpoint DLP/CMF tools that can make more granular decisions based on the content, rather than blanket policies that apply to all data. Some DLP/CMF tools already include this capability.
  3. Endpoint DLP: Endpoint Data Loss Prevention/Content Monitoring and Filtering tools that monitor and restrict usage of data through content analysis and centrally administered policies. While current capabilities vary highly among products, tools should be able to monitor what content is being accessed by an endpoint, any file storage or network transmission of that content, and any transfer of that content between applications (cut/paste). For performance reasons endpoint DLP is currently limited to a subset of enforcement policies (compared to gateway products) and endpoint-only products should be used in conjunction with network protection in most cases (which we will discuss in the next phase of the lifecycle).

At this time, most activity monitoring and enforcement needs to be built into the cloud infrastructure to provide value. We often see some degree of application activity monitoring built into SaaS offerings, with some logging available for cloud databases and file storage. The exception is IaaS, where you may have full control to deploy any security tool you like, but will need to account for the additional complexities of deploying in virtual environments which impact the ability to route and monitor network traffic.

Rights Management

We covered the rights management options in the Create and Store sections. They are also a factor in the this phase (Use), since this is another point where they can be actively enforced during user interaction

In the Store phase rights are applied as data enters storage, and access limitations are enforced. In the Use phase, additional rights are controlled, such as data modification, export, or more-complex usage patterns (like printing or copying).

Logical Controls

Logical controls expand the brute-force restrictions of access controls or EDRM that are based completely on who you are and what you are accessing. Logical controls are implemented in applications and databases and add business logic and context to data usage and protection. Most data-security logic controls for cloud deployments are implemented in application logic (there are plenty of other logical controls available for other aspects of cloud computing, but we are focusing on data security).

  1. Application Logic: Enforcing security logic in the application through design, programming, or external enforcement. Logical controls are one of the best options for protecting data in any kind of cloud-based application.
  2. Object (Row) Level Security: Creating a ruleset restricting use of a database object based on multiple criteria. For example, limiting a sales executive to only updating account information for accounts assigned to his territory. Essentially, these are logical controls implemented at the database layer, as opposed to the application layer. Object level security is a feature of the Database Management System and may or may not be available in cloud deployments (it’s available in some standard DBMSs, but is not currently a feature of any cloud-specific database system).
  3. Structural Controls: Using database design features to enforce security. For example, using the database schema to limit integrity attacks or restricting connection pooling to improve auditability. You can implement some level of structural controls in any database with a management system, but more advanced structural options may only be available in robust relational databases. Tools like SimpleDB are quite limited compared to a full hosted DBMS. Structural controls are more widely available than object level security, and since they don’t rely on IP addresses or external monitoring they are a good option for most cloud deployments. They are particularly effective when designed in conjunction with application logic controls.

Application Security

Aside from raw storage or plain hosted database access, most cloud deployments involve enterprise applications. Effective application security is thus absolutely critical to protect data, and often far more important than any access controls or other protections. A full discussion of cloud application security issues is beyond the scope of this post, and we recommend you read the Cloud Security Alliance Guidance for more details.

Cloud SPI Tier Implications

Software as a Service (SaaS)

Most usage controls in SaaS deployments are enforced in the application layer, and depend on what’s available from your cloud provider. The provider may also enforce additional usage controls on their internal users, and we recommend you ask for documentation if it’s available. In particular, determine what kinds of activity monitoring they perform for internal users vs. cloud-based users, and if those logs are ever available (such as during the investigation of security incidents). We also often see label security in SaaS deployments.

Platform as a Service (PaaS)

Depending on your PaaS deployment, it’s likely that application logic will be your best security option, followed by activity monitoring. If your PaaS provider doesn’t provide the level of auditing you would like, you may be able to capture activity within your application before it makes a call to the platform, although this won’t capture any potential direct calls to the PaaS that are outside your application.

Infrastructure as a Service (IaaS)

Although IaaS technically offers the most flexibility for deploying your own security controls, the design of the IaaS may inhibit deployment of many security controls. For example, monitoring tools that rely on network access or sniffing may not be deployable. On the other hand, your IaaS provider may include security controls as part of the service, especially some degree of logging and/or monitoring.

Database control availability will depend more on the nature of the infrastructure – as we’ve mentioned, full hosted databases in the cloud can enforce many, if not all, of the traditional database security controls.

Endpoint-based usage controls are enforceable in managed environments, but are only useful in private cloud deployments where access to the cloud can be restricted to only managed endpoints.


Thursday, September 17, 2009

Cloud Data Security: Store (Rough Cut)

By Rich

In our last post in this series, we covered the cloud implications of the Create phase of the Data Security Cycle. In this post we’re going to move on to the Store phase. Please remember that we are only covering technologies at a high level in this series on the cycle; we will run a second series on detailed technical implementations of data security in the cloud a little later.


Store is defined as the act of committing digital data to structured or unstructured storage (database vs. files). Here we map the classification and rights to security controls, including access controls, encryption and rights management. I include certain database and application controls, such as labeling, in rights management – not just DRM. Controls at this stage also apply to managing content in storage repositories (cloud or traditional), such as using content discovery to ensure that data is in approved/appropriate repositories.

Steps and Controls

Access ControlsDBMS Access Controls
Administrator Separation of Duties
File System Access Controls
Application/Document Management System Access Controls
EncryptionField Level Encryption
Application Level Encryption
Transparent Database Encryption
Media Encryption
File/Folder Encryption
Virtual Private Storage
Distributed Encryption
Rights ManagementApplication Logic
Enterprise DRM
Content DiscoveryCloud-Provided Database Discovery Tool
Database Discovery/DAM
DLP/CMP Discovery
Cloud-Provided Content Discovery DLP/CMP Content Discovery

Access Controls

One of the most fundamental data security technologies, built into every file and management system, and one of the most poorly used. In cloud computing environments there are two layers of access controls to manage – those presented by the cloud service, and the underlying access controls used by the cloud provider for their infrastructure. It’s important to understand the relationship between the two when evaluating overall security – in some cases the underlying infrastructure may be more secure (no direct back-end access) whereas in others the controls may be weaker (a database with multiple-tenant connection pooling).

  1. DBMS Access Controls: Access controls within a database management system (cloud or traditional), including proper use of views vs. direct table access. Use of these controls is often complicated by connection pooling, which tends to anonymize the user between the application and the database. A database/DBMS hosted in the cloud will likely use the normal access controls of the DBMS (e.g., hosted Oracle or MySQL). A cloud-based database such as Amazon’s SimpleDB or Google’s BigTable comes with its own access controls. Depending on your security requirements, it may be important to understand how the cloud-based DB stores information, so you can evaluate potential back-end security issues.
  2. Administrator Separation of Duties: Newer technologies implemented in databases to limit database administrator access. On Oracle this is called Database Vault, and on IBM DB2 I believe you use the Security Administrator role and Label Based Access Controls. When evaluating the security of a cloud offering, understand the capabilities to limit both front and back-end administrator access. Many cloud services support various administrator roles for clients, allowing you to define various administrative roles for your own staff. Some providers also implement technology controls to restrict their own back-end administrators, such as isolating their database access. You should ask your cloud provider for documentation on what controls they place on their own administrators (and super-admins), and what data they can potentially access.
  3. File System Access Controls: Normal file access controls, applied at the file or repository level. Again, it’s important to understand the differences between the file access controls presented to you by the cloud service, vs. their access control implementation on the back end. There is an incredible variety of options across cloud providers, even within a single SPI tier – many of them completely proprietary to a specific provider. For the purposes of this model, we only include access controls for cloud based file storage (IaaS), and the back-end access controls used by the cloud provider. Due to the increased abstraction, everything else falls into the Application and Document Management System category.
  4. Application and Document Management System Access Controls: This category includes any access control restrictions implemented above the file or DBMS storage layers. In non-cloud environments this includes access controls in tools like SharePoint or Documentum. In the cloud, this category includes any content restrictions managed through the cloud application or service abstracted from the back-end content storage. These are the access controls for any services that allow you to manage files, documents, and other ‘unstructured’ content. The back-end storage can consist of anything from a relational database to flat files to traditional storage, and should be evaluated separately.

When designing or evaluating access controls you are concerned first with what’s available to you to control your own user/staff access, and then with the back end to understand who at your cloud provider can see what information. Don’t assume that the back end is necessarily less secure – some providers use techniques like bit splitting (combined with encryption) to ensure no single administrator can see your content at the file level, with strong separation of duties to protect data at the application layer.


The most overhyped technology for protecting data, but still one of the most important. Encryption is far from a panacea for all your cloud data security issues, but when used properly and in combination with other controls, it provides effective security. In cloud implementations, encryption may help compensate for issues related to multi-tenancy, public clouds, and remote/external hosting.

  1. Application-Level Encryption: Collected data is encrypted by the application, before being sent into a database or file system for storage. For cloud-based applications (e.g., public or private SaaS) this is usually the recommended option because it protects the data from the user all the way down to storage. For added security, the encryption functions and keys can be separated from the application itself, which also limits the access of application administrators to sensitive data.
  2. Field-Level Encryption: The database management system encrypts fields within a database, normally at the column level. In cloud implementations you will generally want to encrypt data at the application layer, rather than within the database itself, due to the complexity.
  3. Transparent Encryption: Encryption of the database structures, files, or the media where the database is stored. For database structures this is managed by the DBMS, while for files it can be the DBMS or third-party file encryption. Media encryption is managed at the storage layer; never by the DBMS. Transparent encryption protects the database data from unauthorized direct access, but does not provide any internal security. For example, you can encrypt a remotely hosted database to prevent local administrators from accessing it, but it doesn’t protect data from authorized database users.
  4. Media Encryption: Encryption of the physical storage media, such as hard drives or backup tapes. In a cloud environment, encryption of a complete virtual machine on IaaS could be considered media encryption. Media encryption is designed primarily to protect data in the event of physical loss/theft, such as a drive being removed from a SAN. It is often of limited usefulness in cloud deployments, although may be used by hosting providers on the back end in case of physical loss of media.
  5. File/Folder Encryption: Traditional encryption of specific files and folders in storage by the host platform.
  6. Virtual Private Storage: Encryption of files/folders in a shared storage environment, where the encryption/decryption is managed and performed outside the storage environment. This separates the keys and encryption from the storage platform itself, and allows them to be managed locally even when the storage is remote. Virtual Private Storage is an effective technique to protect remote data when you don’t have complete control of the storage environment. Data is encrypted locally before being sent to the shared storage repository, providing complete control of user access and key management. You can read more about Virtual Private Storage in our post.
  7. Distributed Encryption: With distributed encryption we use a central key management solution, but distribute the encryption engines to any end-nodes that require access to the data. It is typically used for unstructured (file/folder) content. When a node needs access to an encrypted file it requests a key from the central server, which provides it if the access is authorized. Keys are usually user or group based, not specific to individual files. Distributed encryption helps with the main problem of file/folder encryption, which is ensuring that everyone who needs it gets access to the keys. Rather than trying to synchronize keys continually in the background, they are provide at need.

Rights Management

The actual enforcement of rights assigned during the Create phase.

For descriptions of the technologies, please see the post on the Create phase. In future posts we will discuss cloud implementations of each of these technologies in greater detail.

Content Discovery

Content Discovery is the process of using content or context-based tools to find sensitive data in content repositories. Content aware tools use advanced content analysis techniques, such as pattern matching, database fingerprinting, and partial document matching to identify sensitive data inside files and databases. Contextual tools rely more on location or specific metadata, such as tags, and are thus better suited to rigid environments with higher assurance that content is labeled appropriately.

Discovery allows you to scan storage repositories and identify the location of sensitive data based on central policies. It’s extremely useful for ensuring that sensitive content is only located where the desired security controls are in place. Discovery is also very useful for supporting compliance initiatives, such as PCI, which restrict the usage and handling of specific types of data.

  1. Cloud-Provided Database Discovery Tool: Your cloud service provides features to locate sensitive data within your cloud database, such as locating credit card numbers. This is specific to the cloud provider, and we have no examples of current offerings.
  2. Database Discovery/DAM: Tools to crawl through database fields looking for data that matches content analysis policies. We most often see this as a feature of a Database Activity Monitoring (DAM) product. These tools are not cloud specific, and depending on your cloud deployment may not be deployable. IaaS environments running standard DBMS platforms (e.g., Oracle or MS SQL Server) may be supported, but we are unaware of any cloud-specific offerings at this time.
  3. Data Loss Prevention (DLP)/Content Monitoring and Protection (CMP) Database Discovery: Some DLP/CMP tools support content discovery within databases; either directly or through analysis of a replicated database or flat file dump. With full access to a database, such as through an ODBC connection, they can perform ongoing scanning for sensitive information.
  4. Cloud-Provided Content Discovery: A cloud-based feature to perform content discovery on files stored with the cloud provider.
  5. DLP/CMP Content Discovery: All DLP/CMP tools with content discovery features can scan accessible file shares, even if they are hosted remotely. This is effective for cloud implementations where the tool has access to stored files using common file sharing protocols, such as CIFS and WebDAV.

Cloud SPI Tier Implications

Software as a Service (SaaS)

As with most security aspects of SaaS, the security controls available depend completely on what’s provided by your cloud service. Front-end access controls are common among SaaS offerings, and many allow you to define your own groups and roles. These may not map to back-end storage, especially for services that allow you to upload files, so you should ask your SaaS provider how they manage access controls for their internal users.

Many SaaS offerings state they encrypt your data, but it’s important to understand just where and how it’s encrypted. For some services, it’s little more than basic file/folder or media encryption of their hosting platforms, with no restrictions on internal access. In other cases, data is encrypted using a unique key for every customer, which is managed externally to the application using a dedicated encryption/key management system. This segregates data between co-tenants on the service, and is also useful to restrict back-end administrative access. Application-level encryption is most common in SaaS offerings, and many provide some level of storage encryption on the back end.

Most rights management in SaaS uses some form of labeling or tagging, since we are generally dealing with applications, rather than raw data. This is the same reason we don’t tend to see content discovery for SaaS offerings.

Platform as a Service (PaaS)

Implementation in a PaaS environment depends completely on the available APIs and development environment.

When designing your PaaS-based application, determine what access controls are available and how they map to the provider’s storage infrastructure. In some cases application-level encryption will be an option, but make sure you understand the key management and where the data is encrypted. In some cases, you may be able to encrypt data on your side before sending it off to the cloud (for example, encrypting data within your application before making a call to store it in the PaaS).

As with SaaS, rights management and content discovery tend to be somewhat restricted in PaaS, unless the provider offers those features as part of the service.

Infrastructure as a Service (IaaS)

Your top priority for managing access controls in IaaS environments is to understand the mappings between the access controls you manage, and those enforced in the back-end infrastructure. For example, if you deploy a virtual machine into a public cloud, how are the access controls managed both for those accessing the machine from the Internet, and for the administrators that maintain the infrastructure? If another customer in the cloud is compromised, what prevents them from escalating privileges and accessing your content?

Virtual Private Storage is an excellent option to protect data that’s remotely hosted, even in a multi-tenant environment. It requires a bit more management effort, but the end result is often more secure than traditional in-house storage.

Content discovery is possible in IaaS deployments where common network file access protocols/methods are available, and may be useful for preventing unapproved use of sensitive data (especially due to inadvertent disclosure in public clouds).


Tuesday, September 08, 2009

Cloud Data Security Cycle: Create (Rough Cut)

By Rich

Last week I started talking about data security in the cloud, and I referred back to our Data Security Lifecycle from back in 2007. Over the next couple of weeks I’m going to walk through the cycle and adapt the controls for cloud computing. After that, I will dig in deep on implementation options for each of the potential controls. I’m hoping this will give you a combination of practical advice you can implement today, along with a taste of potential options that may develop down the road.

We do face a bit of the chicken and egg problem with this series, since some of the technical details of controls implementation won’t make sense without the cycle, but the cycle won’t make sense without the details of the controls. I decided to start with the cycle, and will pepper in specific examples where I can to help it make sense. Hopefully it will all come together at the end.

In this post we’re going to cover the Create phase:


Create is defined as generation of new digital content, either structured or unstructured, or significant modification of existing content. In this phase we classify the information and determine appropriate rights. This phase consists of two steps – Classify and Assign Rights.

Steps and Controls


div class=”bodyTable”>

ClassifyApplication Logic
Assign RightsLabel SecurityEnterprise DRM


Classification at the time of creation is currently either a manual process (most unstructured data), or handled through application logic. Although the potential exists for automated tools to assist with classification, most cloud and non-cloud environments today classify manually for unstructured or directly-entered database data, while application data is automatically classified by business logic. Bear in mind that these are controls applied at the time of creation; additional controls such as access control and encryption are managed in the Store phase. There are two potential controls:

  1. Application Logic: Data is classified based on business logic in the application. For example, credit card numbers are classified as such based on on field definitions and program logic. Generally this logic is based on where data is entered, or via automated analysis (keyword or content analysis)
  2. Tagging/Labeling: The user manually applies tags or labels at the time of creation e.g., manually tagging via drop-down lists or open fields, manual keyword entry, suggestion-assisted tagging, and so on.

Assign Rights

This is the process of converting the classification into rights applied to the data. Not all data necessarily has rights applied, in which cases security is provided through additional controls during later phases of the cycle. (Technically rights are always applied, but in many cases they are so broad as to be effectively non-existent). These are rights that follow the data, as opposed to access controls or encryption which, although they protect the data, are decoupled from its creation. There are two potential technical controls here:

  1. Label Security: A feature of some database management systems and applications that adds a label to a data element, such as a database row, column, or table, or file metadata, classifying the content in that object. The DBMS or application can then implement access and logical controls based on the data label. Labels may be applied at the application layer, but only count as assigning rights if they also follow the data into storage.
  2. Enterprise Digital Rights Management (EDRM): Content is encrypted, and access and use rights are controlled by metadata embedded with the content. The EDRM market has been somewhat self-limiting due to the complexity of enterprise integration and assigning and managing rights.

Cloud SPI Tier Implications

Software as a Service (SaaS)

Classification and rights assignment are completely controlled by the application logic implemented by your SaaS provider. Typically we see Application Logic, since that’s a fundamental feature of any application – SaaS or otherwise. When evaluating your SaaS provider you should ask how they classify sensitive information and then later apply security controls, or if all data is lumped together into a single monolithic database (or flat files) without additional labels or security controls to prevent leakage to administrators, attackers, or other SaaS customers.

In some cases, various labeling technologies may be available. You will, again, need to work with your potential SaaS provider to determine if these labels are used only for searching/sorting data, or if they also assist in the application of security controls.

Platform as a Service (PaaS)

Implementation in a PaaS environment depends completely on the available APIs and development environment. As with internal applications, you will maintain responsibility for how classification and rights assignment are managed.

When designing your PaaS-based application, identify potential labeling/classification APIs you can integrate into program logic. You will need to work with your PaaS provider to understand how they can implement security controls at both the application and storage layers – for example, it’s important to know if and how data is labeled in storage, and if this can be used to restrict access or usage (business logic).

Infrastructure as a Service (IaaS)

Classification and rights assignments depend completely on what is available from your IaaS provider. Here are some specific examples:

  • Cloud-based database: Work with your provider to determine if data labels are available, and with what granularity. If they aren’t provided, you can still implement them as a manual addition (e.g., a row field or segregated tables), but understand that the DBMS will not be enforcing the rights automatically, and you will need to program management into your application.
  • Cloud-based storage: Determine what metadata is available. Many cloud storage providers don’t modify files, so anything you define in an internal storage environment should work in the cloud. The limitation is that the cloud provider won’t be able to tie access or other security controls to the label, which is sometimes an option with document management systems. Enterprise DRM, for example, should work fine with any cloud storage provider.

This should give you a good idea of how to manage classification and rights assignment in various cloud environments. One exciting aspect is that use of tags, including automatically generated tags, is a common concept in the Web 2.0 world, and we can potentially tie this into our security controls. Users are better “trained” to tag content during creation with web-based applications (e.g., photo sharing sites & blogs), and we can take advantage of these habits to improve security.


Tuesday, September 01, 2009

Musings on Data Security in the Cloud

By Rich

So I’ve written about data security, and I’ve written about cloud security, thus it’s probably about time I wrote something about data security in the cloud.

To get started, I’m going to skip over defining the cloud. I recommend you take a look at the work of the Cloud Security Alliance, or skip on over to Hoff’s cloud architecture post, which was the foundation of the architectural section of the CSA work. Today’s post is going to be a bit scattershot, as I throw out some of the ideas rolling around my head from I thinking about building a data security cycle/framework for the cloud.

We’ve previously published two different data/information-centric security cycles. The first, the Data Security Lifecycle (second on the Research Library page) is designed to be a comprehensive forward-looking model. The second, The Pragmatic Data Security Cycle, is designed to be more useful in limited-scope data security projects. Together they are designed to give you the big picture, as well as a pragmatic approach for securing data in today’s resource-constrained environments. These are different than your typical Information Lifecycle Management cycles to reflect the different needs of the security audience.

When evaluating data security in the context of the cloud, the issues aren’t that we’ve suddenly blasted these cycles into oblivion, but that when and where you can implement controls is shifted, sometimes dramatically. Keep in mind that moving to the cloud is every bit as much an opportunity as a risk. I’m serious – when’s the last time you had the chance to completely re-architect your data security from the ground up?

For example, one of the most common risks cited when considering cloud deployment is lack of control over your data; any remote admin can potentially see all your sensitive secrets. Then again, so can any local admin (with access to the system). What’s the difference? In one case you have an employment agreement and their name, in the other you have a Service Level Agreement and contracts… which should include a way to get the admin’s name.

The problems are far more similar than they are different. I’m not one of those people saying the cloud isn’t anything new – it is, and some of these subtle differences can have a big impact – but we can definitely scope and manage the data security issues. And when we can’t achieve our desired level of security… well, that’s time to figure out what our risk tolerance is.

Let’s take two specific examples:

Protecting Data on Amazon S3 – Amazon S3 is one of the leading IaaS services for stored data, but it includes only minimal security controls compared to an internal storage repository. Access controls (which may not integrate with your internal access controls) and transit encryption (SSL) are available, but data is not encrypted in storage and may be accessible to Amazon staff or anyone who compromises your Amazon credentials. One option, which we’ve talked about here before, is Virtual Private Storage. You encrypt your data before sending it off to Amazon S3, giving you absolute control over keys and ACLs. You maintain complete control while still retaining the benefits of cloud-based storage. Many cloud backup solutions use this method.

Protecting Data at a SaaS Provider – I’d be more specific and list a SaaS provider, but I can’t remember which ones follow this architecture. With SaaS we have less control and are basically limited to the security controls built into the SaaS offering. That isn’t necessarily bad – the SaaS provider might be far more secure than you are – but not all SaaS offerings are created equal. To secure SaaS data you need to rely more on your contracts and an understanding of how your provider manages your data.

One architectural option for your SaaS provider is to protect your data with individual client keys managed outside the application (this is actually a useful internal data security architectural choice). It’s application-level encryption with external key management. All sensitive client data is encrypted in the SaaS provider’s database. Keys are managed in a dedicated appliance/service, and provided temporally to the application based on user credentials. Ideally the SaaS prover’s admins are properly segregated – where no single admin has database, key management, and application credentials. Since this potentially complicates support, it might be restricted to only the most sensitive data. (All your information might still be encrypted, but for support purposes could be accessible to the approved administrators/support staff). The SaaS provider then also logs all access by internal and external users.

This is only one option, but your SaaS provider should be able to document their internal data security, and even provide you with external audit reports.

As you can see, just because you are in the cloud doesn’t mean you completely give up any chance of data security. It’s all about understanding security boundaries, control options, technology, and process controls.

In future posts we’ll start walking through the Data Security Lifecycle and matching specific issues and control options in each phase against the SPI (SaaS, PaaS, IaaS) cloud models.


Friday, February 06, 2009

The Business Justification for Data Security- Version 1.0

By Rich

We’ve been teasing you with previews, but rather than handing out more bits and pieces, we are excited to release the complete version of the Business Justification for Data Security.

This is version 1.0 of the report, and we expect it to continue to evolve as we get more public feedback. Based on some of that initial feedback, we’d like to emphasize something before you dig in. Keep in mind that this is a business justification tool, designed to help you align potential data security investments with business needs, and to document the justification to make a case with those holding the purse strings. It’s not meant to be a complete risk assessment model, although it does share many traits with risk management tools.

We’ve also designed this to be both pragmatic and flexible- you shouldn’t need to spend months with consultants to build your business justification. For some projects, you might complete it in an hour. For others, maybe a few days or weeks as you wrangle business unit heads together to force them to help value different types of information.

For those of you that don’t want to read a 38 page paper we’re going to continue to post the guts of the model as blog posts, and we also plan on blogging additional content, such as more examples and use cases.

We’d like to especially thank our exclusive sponsor, McAfee, who also set up a landing page here with some of their own additional whitepapers and content. As usual, we developed the content completely independently, and it’s only thanks to our sponsors that we can release it for free (and still feed our families). This paper is also released in cooperation with the SANS Institute, will be available in the SANS Reading Room, and we will be delivering a SANS webcast on the topic on March 17th.

This was one of our toughest projects, and we’re excited to finally get it out there. Please post your feedback in the comments, and we will be crediting reviewers that advance the model when we release the next version.

And once again, thanks to McAfee, SANS, and (as usual) Chris Pepper, our fearless editor.


Wednesday, January 28, 2009

The Business Justification For Data Security: Data Valuation

By Rich

Man, nothing feels better than finishing off a few major projects. Yesterday we finalized the first draft of the Business Justification paper this series is based on, and I also squeezed out my presentation for IT Security World (in March) where I’m talking about major enterprise software security. Ah, the thrills and spills of SAP R/3 vs. Netweaver security!

In our first post we provided an overview of the model. Today we’re going to dig into the first step- data valuation. For the record, we’re skipping huge chunks of the paper in these posts to focus on the meat of the model- and our invitation for reviewers is still open (official release date should be within 2 weeks).

We know our data has value, but we can”t assign a definitive or fixed monetary value to it. We want to use the value to justify spending on security, but trying to tie it to purely quantitative models for investment justification is impossible. We can use educated guesses but they”re still guesses, and if we pretend they are solid metrics we”re likely to make bad risk decisions. Rather than focusing on difficult (or impossible) to measure quantitative value, let”s start our business justification framework with qualitative assessments. Keep in mind that just because we aren”t quantifying the value of the data doesn’t mean we won”t use other quantifiable metrics later in the model. Just because you cannot completely quantify the value of data, that doesn’t mean you should throw all metrics out the window.

To keep things practical, let”s select a data type and assign an arbitrary value to it. To keep things simple you might use a range of numbers from 1 to 3, or “Low”, “Medium”, and “High” to represent the value of the data. For our system we will use a range of 1-5 to give us more granularity, with 1 being a low value and 5 being a high value.

Another two metrics help account for business context in our valuation: frequency of use and audiences. The more often the data is used, the higher its value (generally). The audience may be a handful of people at the company, or may be partners & customers as well as internal staff. More use by more people often indicates higher value, as well as higher exposure to risk. These factors are important not only for understanding the value of information, but also the threats and risks associated with it – and so our justification for expenditures. These two items will not be used as primary indicators of value, but will modify an “intrinsic” value we will discuss more thoroughly below. As before, we will assign each metric a number from 1 to 5 , and we suggest you at least loosely define the scope of those ranges. Finally, we will examine three audiences that use the data: employees, customers, and partners; and derive a 1-5 score.

The value of some data changes based on time or context, and for those cases we suggest you define and rate it differently for the different contexts. For example, product information before product release is more sensitive than the same information after release.

As an example, consider student records at a university. The value of these records is considered high, and so we would assign a value of five. While the value of this data is considered “High” as it affects students financially, the frequency of use may be moderate because these records are accessed and updated mostly during a predictable window – at the beginning and end of each semester. The number of audiences for this data is two, as the records are used by various university staff (financial services and the registrar”s office), and the student (customer). Our tabular representation looks like this:


p style=”font: 12.0px Helvetica; min-height: 14.0px”>





Student Record




In our next post (later today) we’ll give you more examples of how this works.


Friday, November 14, 2008

Everything Old Is New Again In The Fog Of The Cloud

By Rich

Look I understand too little too late I realize there are things you say and do You can never take back But what would you be if you didn’t even try You have to try So after a lot of thought I’d like to reconsider Please If it’s not too late Make it a… cheeseburger

-Here I Am by Lyle Lovett

Sometimes I get a little too smart for my own good and think I can pull a fast one over you fair readers. Sure enough, Hoff called me out over my Cloud Computing Macro Layers post from yesterday. And I quote:

I’d say that’s a reasonable assertion and a valid set of additional “layers.” There also not especially unique and as such, I think Rich is himself a little disoriented by the fog of the cloud because as you’ll read, the same could be said of any networked technology. The reason we start with the network and usually find ourselves back where we started in this discussion is because the other stuff Rich mentions is just too damned hard, costs too much, is difficult to sell, isn’t standardized, is generally platform dependent and is really intrusive. See this post (Security Will Not End Up In the Network) as an example. Need proof of how good ideas like this get mangled? How about Web 2.0 or SOA which is for lack of a better description, exactly what RIch described in his model above; loosely coupled functional components of a modular architecture.

My response is… well… duh. I mean we’re just talking distributed applications, which we, of course, have yet to really get right (although multiplayer games and gambling are close).

But I take a little umbrage with Chris’s assumptions. I’m not proposing some kind of new, modular structure a la SOA, I’m talking more about application logic and basic security than anything else, not bolt-on tools. Because the truth is, it will be impossible to add these things on after the fact; it hasn’t worked well for network security, and sure as heck won’t work well for application security. These aren’t add-on products, they are design principles. They aren’t all new, but as everyone jumps off the cliff into the cloud they are worth repeating and putting into context for the fog/cloud environment.

Thus some new descriptions for the layers. Since it’s Friday and all I can think about is the Stone Brewery Epic Vertical Ale sitting in my fridge, this won’t be in any great depth:

  • Network: Traditional network security, and the stuff Hoff and others have been talking about. We’ll have some new advances to deal with the network aspects of remote and distributed applications, such as Chris’ dream of IF-MAP, but we’re still just talking about securing the tubes.
  • Service: Locking down the Internet exposed APIs- we have a fair bit of experience with this and have learned a lot of lessons over the past few years with work in SOA- SOAP, CORBA, DCOM, RPC and so on. We face three main issues here- first, not everyone has learned those lessons and we see tons of flaws in implementations and even fundamental design. Second, many of the designers/programmers building these cloud services don’t have a clue or a sense of history, and thus don’t know those lessons. And finally, most of these cloud services build their own kinds of APIs from scratch anyway, and thus everything is custom, and full of custom vulnerabilities from simple parsing errors, to bad parameterization, to logic flaws. Oh, and lest we not forget, plenty of these services are just web applications with AJAX and such that don’t even realize they are exposing APIs. Fun stuff I refer to as, “job security”.
  • User: This is an area I intend to talk about in much greater depth later on. Basically, right now we rely on static authentication (a single set of credentials to provide access) and I think we need to move more towards adaptive authentication (where we provide an authentication rating based on how strongly we trust that user at that time in that situation, and can thus then adjust the kinds of allowed transactions). This actually exists today- for example, my bank uses a username/password to let me in, but then requires an additional credential for transactions vs. basic access.
  • Transaction: As with user, this is an area we’ve underexplored in traditional applications, but I think will be incredibly valuable in cloud services. We build something called adaptive authorization into our applications and enforce more controls around approving transactions. For example, if a user with a low authentication rating tries to transfer a large sum out of their bank account, a text message with a code will be send to their cell phone with a code. If they have a higher authentication rating, the value amount before that back channel is required goes up. We build policies on a transaction basis, linking in environmental, user, and situational measurements to approve or deny transactions. This is program logic, not something you can add on.
  • Data: All the information-centric stuff we expend endless words on in this blog.

Thus this is nearly all about design, with a smidge of framework and shared security services support we can build into common environments (e.g. an adaptive authentication engine or encryption services in J2EE). No loosely coupled functional components, just a simple focus on proper application design with awareness of the distributed environment.

But as Chris also says,

It should be noted, however that there is a ton of work, solutions and initiatives that exist and continue to be worked on these topics, it’s just not a priority as is evidenced by how people exercise their wallets.

Exactly. Most of what we write about cloud computing security will be ignored…

… but what would we be if we didn’t even try. You have to try.


Wednesday, November 12, 2008

Cloud Security Macro Layers

By Rich

There’s been a lot of discussion on cloud computing in the blogosphere and general press lately, and although I’ll probably hate myself for it, it’s time to jump in beyond some sophomoric (albeit really funny) humor.

Chris Hoff inspired this with his post on TCG IF-MAP; a framework/standard for exchanging network security objects and events. It’s roots are in NAC, although as Alan Shimel informs us there’s been very little adoption.

Since cloud computing is a crappy marketing term that can mean pretty much whatever you want, I won’t dig into the various permutations in this post. For the purposes of this post I’ll be focusing on distributed services (e.g. grid computing), online services, and SaaS. I won’t be referring to in the cloud filtering and other network-only services.

Chris’s posting, and most of the ones I’ve seen out there, are heavily focused on network security concepts as they relate to the cloud. but if we look at cloud computing from a macro level, there are additional layers that are just as critical (in no particular order):


  • Network: The usual network security controls.
  • Service: Security around the exposed APIs and services.
  • User: Authentication- which in the cloud word will need to move to more adaptive authentication, rather than our current username/password static model.
  • Transaction: Security controls around individual transactions- via transaction authentication, adaptive authorization, or other approaches.
  • Data: Information-centric security controls for cloud based data. How’s that for buzzword bingo? Okay, this actually includes security controls over the back end data, distributed data, and any content exchanged with the user.

Down the road we’ll dig into these in more detail, but anytime we start distributing services and functionality over an open public network with no inherent security controls, we need to focus on the design issues and reduce design flaws as early as possible. We can’t just look at this as a network problem- our authentication, authorization, information, and service (layer 7) controls are likely even more important.

This gets me thinking it’s time to write a new framework… not that anyone will adopt it.


Wednesday, July 23, 2008

Best Practices For Endpoint DLP: Use Cases

By Rich

We’ve covered a lot of ground over the past few posts on endpoint DLP. Our last post finished our discussion of best practices and I’d like to close with a few short fictional use cases based on real deployments.

Endpoint Discovery and File Monitoring for PCI Compliance Support

BuyMore is a large regional home goods and grocery retailer in the southwest United States. In a previous PCI audit, credit card information was discovered on some employee laptops mixed in with loyalty program data and customer demographics. An expensive, manual audit and cleansing was performed within business units handling this content. To avoid similar issues in the future, BuyMore purchased an endpoint DLP solution with discovery and real time file monitoring support.

BuyMore has a highly distributed infrastructure due to multiple acquisitions and independently managed retail outlets (approximately 150 locations). During initial testing it was determined that database fingerprinting would be the best content analysis technique for the corporate headquarters, regional offices, and retail outlet servers, while rules-based analysis is the best fit for the systems used by store managers. The eventual goal is to transition all locations to database fingerprinting, once a database consolidation and cleansing program is complete.

During Phase 1, endpoint agents were deployed to corporate headquarters laptops for the customer relations and marketing team. An initial content discovery scan was performed, with policy violations reported to managers and the affected employees. For violations, a second scan was performed 30 days later to ensure that the data was removed. In Phase 2, the endpoint agents were switched into real time monitoring mode when the central management server was available (to support the database fingerprinting policy). Systems that leave the corporate network are then scanned monthly when the connect back in, with the tool tuned to only scan files modified since the last scan. All systems are scanned on a rotating quarterly basis, and reports generated and provided to the auditors.

For Phase 3, agents were expanded to the rest of the corporate headquarters team over the course of 6 months, on a business unit by business unit basis.

For the final phase, agents were deployed to retail outlets on a store by store basis. Due to the lower quality of database data in these locations, a rules-based policy for credit cards was used. Policy violations automatically generate an email to the store manager, and are reported to the central policy server for followup by a compliance manager.

At the end of 18 months, corporate headquarters and 78% or retail outlets were covered. BuyMore is planning on adding USB blocking in their next year of deployment, and already completed deployment of network filtering and content discovery for storage repositories.

Endpoint Enforcement for Intellectual Property Protection

EngineeringCo is a small contract engineering firm with 500 employees in the high tech manufacturing industry. They specialize in designing highly competitive mobile phones for major manufacturers. In 2006 they suffered a major theft of their intellectual property when a contractor transferred product description documents and CAD diagrams for a new design onto a USB device and sold them to a competitor in Asia, which beat their client to market by 3 months.

EngineeringCo purchased a full DLP suite in 2007 and completed deployment of partial document matching policies on the network, followed by network-scanning-based content discovery policies for corporate desktops. After 6 months they added network blocking for email, http, and ftp, and violations are at an acceptable level. In the first half of 2008 they began deployment of endpoint agents for engineering laptops (approximately 150 systems).

Because the information involved is so valuable, EngineeringCo decided to deploy full partial document matching policies on their endpoints. Testing determined performance is acceptable on current systems if the analysis signatures are limited to 500 MB in total size. To accommodate this limit, a special directory was established for each major project where managers drop key documents, rather than all project documents (which are still scanned and protected at the network). Engineers can work with documents, but the endpoint agent blocks network transmission except for internal email and file sharing, and any portable storage. The network gateway prevents engineers from emailing documents externally using their corporate email, but since it’s a gateway solution internal emails aren’t scanned.

Engineering teams are typically 5-25 individuals, and agents were deployed on a team by team basis, taking approximately 6 months total.

These are, of course, fictional best practices examples, but they’re drawn from discussions with dozens of DLP clients. The key takeaways are:

  1. Start small, with a few simple policies and a limited footprint.
  2. Grow deployments as you reduce incidents/violations to keep your incident queue under control and educate employees.
  3. Start with monitoring/alerting and employee education, then move on to enforcement.
  4. This is risk reduction, not risk elimination. Use the tool to identify and reduce exposure but don’t expect it to magically solve all your data security problems.
  5. When you add new policies, test first with a limited audience before rolling them out to the entire scope, even if you are already covering the entire enterprise with other policies.


Thursday, July 17, 2008

Best Practices for Endpoint DLP: Part 5, Deployment

By Rich

In our last post we talked about prepping for deployment- setting expectations, prioritizing, integrating with the infrastructure, and defining workflow. Now it’s time to get out of the lab and get our hands dirty.

Today we’re going to move beyond planning into deployment.

  1. Integrate with your infrastructure: Endpoint DLP tools require integration with a few different infrastructure elements. First, if you are using a full DLP suite, figure out if you need to perform any extra integration before moving to endpoint deployments. Some suites OEM the endpoint agent and you may need some additional components to get up and running. In other cases, you’ll need to plan capacity and possibly deploy additional servers to handle the endpoint load. Next, integrate with your directory infrastructure if you haven’t already. Determine if you need any additional information to tie users to devices (in most cases, this is built into the tool and its directory integration components).
  2. Integrate on the endpoint: In your preparatory steps you should have performed testing to be comfortable that the agent is compatible with your standard images and other workstation configurations. Now you need to add the agent to the production images and prepare deployment packages. Don’t forget to configure the agent before deployment, especially the home server location and how much space and resources to use on the endpoint. Depending on your tool, this may be managed after initial deployment by your management server.
  3. Deploy agents to initial workgroups: You’ll want to start with a limited deployment before rolling out to the larger enterprise. Pick a workgroup where you can test your initial policies.
  4. Build initial policies: For your first deployment, you should start with a small subset of policies, or even a single policy, in alert or content classification/discovery mode (where the tool reports on sensitive data, but doesn’t generate policy violations).
  5. Baseline, then expand deployment: Deploy your initial policies to the starting workgroup. Try to roll the policies out one monitoring/enforcement mode at a time, e.g., start with endpoint discovery, then move to USB blocking, then add network alerting, then blocking, and so on. Once you have a good feel for the effectiveness of the policies, performance, and enterprise integration, you can expand into a wider deployment, covering more of the enterprise. After the first few you’ll have a good understanding of how quickly, and how widely, you can roll out new policies.
  6. Tune policies: Even stable policies may require tuning over time. In some cases it’s to improve effectiveness, in others to reduce false positives, and in still other cases to adapt to evolving business needs. You’ll want to initially tune policies during baselining, but continue to tune them as the deployment expands. Most DLP clients report that they don’t spend much time tuning policies after baselining, but it’s always a good idea to keep your policies current with enterprise needs.
  7. Add enforcement/protection: By this point you should understand the effectiveness of your policies, and have educated users where you’ve found policy violations. You can now start switching to enforcement or protective actions, such as blocking, network filtering, or encryption of files. It’s important to notify users of enforcement actions as they occur, otherwise you might frustrate them u ecessarily. If you’re making a major change to established business process, consider scaling out enforcement options on a business unit by business unit basis (e.g., restricting access to a common content type to meet a new compliance need).

Deploying endpoint DLP isn’t really very difficult; the most common mistake enterprises make is deploying agents and policies too widely, too quickly. When you combine a new endpoint agent with intrusive enforcement actions that interfere (positively or negatively) with people’s work habits, you risk grumpy employees and political backlash. Most organizations find that a staged rollout of agents, followed by first deploying monitoring policies before moving into enforcement, then a staged rollout of policies, is the most effective approach.