Cloud Data Security: Store (Rough Cut)

In our last post in this series, we covered the cloud implications of the Create phase of the Data Security Cycle. In this post we’re going to move on to the Store phase. Please remember that we are only covering technologies at a high level in this series on the cycle; we will run a second series on detailed technical implementations of data security in the cloud a little later.

Definition

Store is defined as the act of committing digital data to structured or unstructured storage (database vs. files). Here we map the classification and rights to security controls, including access controls, encryption and rights management. I include certain database and application controls, such as labeling, in rights management – not just DRM. Controls at this stage also apply to managing content in storage repositories (cloud or traditional), such as using content discovery to ensure that data is in approved/appropriate repositories.

Steps and Controls

Control
| Structured/Application
| Unstructured

—|—|—

Access Controls
| DBMS Access Controls
Administrator Separation of Duties
| File System Access Controls
Application/Document Management System Access Controls

Encryption
| Field Level Encryption
Application Level Encryption
Transparent Database Encryption
| Media Encryption
File/Folder Encryption
Virtual Private Storage
Distributed Encryption

Rights Management
| Application Logic
Tagging/Labeling
| Tagging/Labeling
Enterprise DRM

Content Discovery
| Cloud-Provided Database Discovery Tool
Database Discovery/DAM
DLP/CMP Discovery
| Cloud-Provided Content Discovery DLP/CMP Content Discovery

Access Controls

One of the most fundamental data security technologies, built into every file and management system, and one of the most poorly used. In cloud computing environments there are two layers of access controls to manage – those presented by the cloud service, and the underlying access controls used by the cloud provider for their infrastructure. It’s important to understand the relationship between the two when evaluating overall security – in some cases the underlying infrastructure may be more secure (no direct back-end access) whereas in others the controls may be weaker (a database with multiple-tenant connection pooling).

DBMS Access Controls: Access controls within a database management system (cloud or traditional), including proper use of views vs. direct table access. Use of these controls is often complicated by connection pooling, which tends to anonymize the user between the application and the database. A database/DBMS hosted in the cloud will likely use the normal access controls of the DBMS (e.g., hosted Oracle or MySQL). A cloud-based database such as Amazon’s SimpleDB or Google’s BigTable comes with its own access controls. Depending on your security requirements, it may be important to understand how the cloud-based DB stores information, so you can evaluate potential back-end security issues.
Administrator Separation of Duties: Newer technologies implemented in databases to limit database administrator access. On Oracle this is called Database Vault, and on IBM DB2 I believe you use the Security Administrator role and Label Based Access Controls. When evaluating the security of a cloud offering, understand the capabilities to limit both front and back-end administrator access. Many cloud services support various administrator roles for clients, allowing you to define various administrative roles for your own staff. Some providers also implement technology controls to restrict their own back-end administrators, such as isolating their database access. You should ask your cloud provider for documentation on what controls they place on their own administrators (and super-admins), and what data they can potentially access.
File System Access Controls: Normal file access controls, applied at the file or repository level. Again, it’s important to understand the differences between the file access controls presented to you by the cloud service, vs. their access control implementation on the back end. There is an incredible variety of options across cloud providers, even within a single SPI tier – many of them completely proprietary to a specific provider. For the purposes of this model, we only include access controls for cloud based file storage (IaaS), and the back-end access controls used by the cloud provider. Due to the increased abstraction, everything else falls into the Application and Document Management System category.
Application and Document Management System Access Controls: This category includes any access control restrictions implemented above the file or DBMS storage layers. In non-cloud environments this includes access controls in tools like SharePoint or Documentum. In the cloud, this category includes any content restrictions managed through the cloud application or service abstracted from the back-end content storage. These are the access controls for any services that allow you to manage files, documents, and other ‘unstructured’ content. The back-end storage can consist of anything from a relational database to flat files to traditional storage, and should be evaluated separately.

When designing or evaluating access controls you are concerned first with what’s available to you to control your own user/staff access, and then with the back end to understand who at your cloud provider can see what information. Don’t assume that the back end is necessarily less secure – some providers use techniques like bit splitting (combined with encryption) to ensure no single administrator can see your content at the file level, with strong separation of duties to protect data at the application layer.

Encryption

The most overhyped technology for protecting data, but still one of the most important. Encryption is far from a panacea for all your cloud data security issues, but when used properly and in combination with other controls, it provides effective security. In cloud implementations, encryption may help compensate for issues related to multi-tenancy, public clouds, and remote/external hosting.

Application-Level Encryption: Collected data is encrypted by the application, before being sent into a database or file system for storage. For cloud-based applications (e.g., public or private SaaS) this is usually the recommended option because it protects the data from the user all the way down to storage. For added security, the encryption functions and keys can be separated from the application itself, which also limits the access of application administrators to sensitive data.
Field-Level Encryption: The database management system encrypts fields within a database, normally at the column level. In cloud implementations you will generally want to encrypt data at the application layer, rather than within the database itself, due to the complexity.
Transparent Encryption: Encryption of the database structures, files, or the media where the database is stored. For database structures this is managed by the DBMS, while for files it can be the DBMS or third-party file encryption. Media encryption is managed at the storage layer; never by the DBMS. Transparent encryption protects the database data from unauthorized direct access, but does not provide any internal security. For example, you can encrypt a remotely hosted database to prevent local administrators from accessing it, but it doesn’t protect data from authorized database users.
Media Encryption: Encryption of the physical storage media, such as hard drives or backup tapes. In a cloud environment, encryption of a complete virtual machine on IaaS could be considered media encryption. Media encryption is designed primarily to protect data in the event of physical loss/theft, such as a drive being removed from a SAN. It is often of limited usefulness in cloud deployments, although may be used by hosting providers on the back end in case of physical loss of media.
File/Folder Encryption: Traditional encryption of specific files and folders in storage by the host platform.
Virtual Private Storage: Encryption of files/folders in a shared storage environment, where the encryption/decryption is managed and performed outside the storage environment. This separates the keys and encryption from the storage platform itself, and allows them to be managed locally even when the storage is remote. Virtual Private Storage is an effective technique to protect remote data when you don’t have complete control of the storage environment. Data is encrypted locally before being sent to the shared storage repository, providing complete control of user access and key management. You can read more about Virtual Private Storage in our post.
Distributed Encryption: With distributed encryption we use a central key management solution, but distribute the encryption engines to any end-nodes that require access to the data. It is typically used for unstructured (file/folder) content. When a node needs access to an encrypted file it requests a key from the central server, which provides it if the access is authorized. Keys are usually user or group based, not specific to individual files. Distributed encryption helps with the main problem of file/folder encryption, which is ensuring that everyone who needs it gets access to the keys. Rather than trying to synchronize keys continually in the background, they are provide at need.

Rights Management

The actual enforcement of rights assigned during the Create phase.

For descriptions of the technologies, please see the post on the Create phase. In future posts we will discuss cloud implementations of each of these technologies in greater detail.

Content Discovery

Content Discovery is the process of using content or context-based tools to find sensitive data in content repositories. Content aware tools use advanced content analysis techniques, such as pattern matching, database fingerprinting, and partial document matching to identify sensitive data inside files and databases. Contextual tools rely more on location or specific metadata, such as tags, and are thus better suited to rigid environments with higher assurance that content is labeled appropriately.

Discovery allows you to scan storage repositories and identify the location of sensitive data based on central policies. It’s extremely useful for ensuring that sensitive content is only located where the desired security controls are in place. Discovery is also very useful for supporting compliance initiatives, such as PCI, which restrict the usage and handling of specific types of data.

Cloud-Provided Database Discovery Tool: Your cloud service provides features to locate sensitive data within your cloud database, such as locating credit card numbers. This is specific to the cloud provider, and we have no examples of current offerings.
Database Discovery/DAM: Tools to crawl through database fields looking for data that matches content analysis policies. We most often see this as a feature of a Database Activity Monitoring (DAM) product. These tools are not cloud specific, and depending on your cloud deployment may not be deployable. IaaS environments running standard DBMS platforms (e.g., Oracle or MS SQL Server) may be supported, but we are unaware of any cloud-specific offerings at this time.
Data Loss Prevention (DLP)/Content Monitoring and Protection (CMP) Database Discovery: Some DLP/CMP tools support content discovery within databases; either directly or through analysis of a replicated database or flat file dump. With full access to a database, such as through an ODBC connection, they can perform ongoing scanning for sensitive information.
Cloud-Provided Content Discovery: A cloud-based feature to perform content discovery on files stored with the cloud provider.
DLP/CMP Content Discovery: All DLP/CMP tools with content discovery features can scan accessible file shares, even if they are hosted remotely. This is effective for cloud implementations where the tool has access to stored files using common file sharing protocols, such as CIFS and WebDAV.

Cloud SPI Tier Implications

Software as a Service (SaaS)

As with most security aspects of SaaS, the security controls available depend completely on what’s provided by your cloud service. Front-end access controls are common among SaaS offerings, and many allow you to define your own groups and roles. These may not map to back-end storage, especially for services that allow you to upload files, so you should ask your SaaS provider how they manage access controls for their internal users.

Many SaaS offerings state they encrypt your data, but it’s important to understand just where and how it’s encrypted. For some services, it’s little more than basic file/folder or media encryption of their hosting platforms, with no restrictions on internal access. In other cases, data is encrypted using a unique key for every customer, which is managed externally to the application using a dedicated encryption/key management system. This segregates data between co-tenants on the service, and is also useful to restrict back-end administrative access. Application-level encryption is most common in SaaS offerings, and many provide some level of storage encryption on the back end.

Most rights management in SaaS uses some form of labeling or tagging, since we are generally dealing with applications, rather than raw data. This is the same reason we don’t tend to see content discovery for SaaS offerings.

Platform as a Service (PaaS)

Implementation in a PaaS environment depends completely on the available APIs and development environment.

When designing your PaaS-based application, determine what access controls are available and how they map to the provider’s storage infrastructure. In some cases application-level encryption will be an option, but make sure you understand the key management and where the data is encrypted. In some cases, you may be able to encrypt data on your side before sending it off to the cloud (for example, encrypting data within your application before making a call to store it in the PaaS).

As with SaaS, rights management and content discovery tend to be somewhat restricted in PaaS, unless the provider offers those features as part of the service.

Infrastructure as a Service (IaaS)

Your top priority for managing access controls in IaaS environments is to understand the mappings between the access controls you manage, and those enforced in the back-end infrastructure. For example, if you deploy a virtual machine into a public cloud, how are the access controls managed both for those accessing the machine from the Internet, and for the administrators that maintain the infrastructure? If another customer in the cloud is compromised, what prevents them from escalating privileges and accessing your content?

Virtual Private Storage is an excellent option to protect data that’s remotely hosted, even in a multi-tenant environment. It requires a bit more management effort, but the end result is often more secure than traditional in-house storage.

Content discovery is possible in IaaS deployments where common network file access protocols/methods are available, and may be useful for preventing unapproved use of sensitive data (especially due to inadvertent disclosure in public clouds).