Infrastructure as a Service storage can be insanely complex when you include operational and performance requirements. First you need to create a resource pool, which might itself be a pool of virtualized and abstracted storage, and then you need to tie it all together with orchestration to support the dynamic requirements of the cloud – such as moving running virtual machines between servers, instantly snapshotting multi-terabyte virtual drives, and other insanity.
For security we don’t need to know all the ins and outs of cloud storage, but we do need to understand the high-level architecture and how it affects our security controls. And keep in mind that the implementations of our generic architecture vary widely between different public and private cloud platforms.
Public clouds are roughly equivalent to provider-class private clouds, except that they are designed to support multiple external tenants. We will focus on private cloud storage, with the understanding that public clouds are about the same except that customers have less control.
IaaS Storage Overview
Here’s a diagram to work through:
- At the lowest level is physical storage. This can be nearly anything that satisfies the cloud’s performance and storage requirements. It might be commodity hard drives in commodity rack servers. It could be high-performance SSD drives in high-end specialized datacenter servers. But really it could be nearly any storage appliance/system you can think of.
- Some physical storage is generally pooled by a virtual storage controller, like a SAN. This is extremely common in production clouds but isn’t limited to traditional SAN. Basically, as long as you can connect it to the cloud storage manager, you can use it. You could even dedicate certain LUNs from a larger shared SAN to cloud, while using other LUNs for non-cloud applications. If you aren’t a storage person just remember there might be some sort of controller/server above the hard drives, outside your cloud servers, that needs to be secured.
That’s the base storage. On top of that we then build out:
- Object storage controllers (also called managers) connect to assigned physical or virtual storage and manage orchestration and connectivity. Above this level they communicate using APIs. Some deployments include object storage connectivity software running on distributed commodity servers to tie the servers’ hard drives into the storage pool.
- Object storage controllers create virtual containers (also called buckets) which are assigned to cloud users. A container is a pool of storage in which you can place objects (files). Every container stores each bit in multiple locations. This is called data dispersion, and we will talk more about it in a moment.
Object storage is something of a cross between a database and a file share. You move files into and out of it; but instead of being managed by a file system you manage it with APIs, at an abstracted layer above whatever file systems actually store the data. Object storage is accessed via APIs (almost always RESTful HTTP APIs) rather than classic network file protocols, which offers tremendous flexibility for integration into different applications and services. Object storage includes logic below the user-accessible layer for features such as quotas, access control, and redundancy management.
- Volume storage controllers (also called managers) connect to assigned physical (or virtual) storage and manage orchestration and connectivity. Above this level they communicate using APIs. The volume controller creates volumes on request and assigns them to specific cloud instances. To use traditional virtualization language, it creates a virtual hard drive and connects it to a virtual machine. Data dispersion is often used to provide redundancy and robustness.
- A volume is essentially a persistent virtual hard drive. It can be of any size supported by the cloud platform and underlying resources, and a volume assigned to a virtual machine exists until it is destroyed (note that tearing down an instance often automatically also returns the associated volume storage back to the free storage pool).
- Physical servers run hypervisors and cloud connectivity software to tie them into the compute resource pool. This is where instances (virtual machines) run. These servers typically have local hard drives which can be assigned to the volume controller to expand the storage pool, or used locally for non-persistent storage. We call this ‘ephemeral’ storage, and it’s great for swap files and other higher-performance operations that don’t require the resiliency of a full storage volume. If your cloud uses this model, the cloud management software places swap on these local drives. When you move or shut down your instance this data is always lost, although it might be recoverable until overwritten.
We like to discuss volumes as if they were virtual hard drives, but they are a bit more complex. Volumes may be distributed and data dispersed across multiple physical drives. There are also implications which we will consider later for considering volumes in the context of your cloud, and how they interact with object storage and things like snapshots and live migrations.
How object and volume storage interact
Most clouds include both object and volume storage, even if object storage isn’t available directly to users. Here are the key examples:
- A snapshot is a near-instant backup of a volume that is moved into object storage. The underlying technology varies widely and is too complex for my feeble analyst brain, but a snapshot effectively copies a complete set of the storage blocks in your volume, into a file stored in an object container which has been assigned to snapshots. Since every block in your volume is likely stored in multiple physical locations, typically 3 or more times, taking a snapshot tells the volume controller to copy a complete set of blocks over to object storage. The operation can take a while but it looks instantaneous because the snapshot accurately reflects the state of the volume at that point in time, while the volume is stil fully usable – running on another set of blocks while the snapshot is moved over (this is a (major oversimplification of something that makes my head hurt).
- Images are pre-defined storage volumes in object storage, which contain operating systems or other virtual hard drives used to launch instances. An image might be a base version of Windows, or a completely configured server in an n-tier application stack. When you launch an instance the volume controller creates a volume of the required size, then pulls the requested image from the object controller and loads it up into the virtual machine.
- Because snapshots and images are no different than any other objects or files in object storage, they are very portable and (in public clouds) can be made available to the Internet with a single API call or mouse click.
- You can quickly create images from running instances. These images contain everything stored “on disk” unless you deliberately exclude particular locations such as swap files.
Understanding of these components is essential for securing cloud resources. A snapshot is a near-instant backup of a (virtual) hard drive that is incredibly portable, and easily made public. A few years ago I co-wrote a script that, if run on a cloud administrator’s computer, would snapshot every single volume that administrator could access and make the snapshots public. With a nice metadata tag to make them easy to find. A few API calls from an unprotected developer or administrator system could expose all the data in your cloud.
Also, if you allow instances to store data in local ephemeral storage, sensitive data such as encryption keys may be left behind when you move or terminate an instance.
- Data dispersion is equivalent to RAID protection, but implemented differently. Any storage block is replicated in multiple physical locations across your cloud. In private clouds you configure this yourself, but in public clouds it is likely an opaque feature. Dispersion is great for resiliency and valuable for security – any given file might be broken up and stored on multiple hard drives. So losing one drive might not matter much, but you can rarely figure out exactly what data is stored on which drives.
Cloud storage networks
All this runs on multiple networks (at least, if you built your cloud for performance and reliability). Some of them might be:
- If you use virtual storage (e.g., SAN) this likely runs over its own storage network.
- A management network ties together the cloud controller components, particularly object and volume managers and agents.
- A data/storage network for connecting volumes to instances, to improve performance. This may also connect object and volume storage.
- The external public network for managing cloud controllers via API.
- A service network for communicating between outside clients and instances, as well as between instances – typically the Internet.
You will likely have at least one network to the outside world, one for storage (between volumes and instances), and another for management.
Some or all of these might be the same physical network, segregated with VLANs, but consider how much you trust VLANs with unknown parties running their own operating systems adjacent to your equipment. Lastly, these networks might violate your expectations for networks, due to new physical platforms for cloud hosting, which may run storage and communications traffic over the same physical connections.
This isn’t an attempt to scare you – the ins and outs of designing and securing these networks are fodder for another day – but you need to be aware of what is under the surface.
The architecture and resiliency of cloud storage models create new and interesting risks:
- Cloud managers, either in your environment or your cloud provider’s, can access any data stored in the cloud over the network. This is very different than traditional infrastructure where storage access typically requires physical connectivity.
- Snapshots become ubiquitous because they are effectively instantaneous, highly portable, and accessible over the network. They pose a significantly increased risk of exposure compared to traditional infrastructure, where snapshots are less common, less portable, and less exposed.
- Images of instances may contain and expose sensitive data.
- All this is managed with networks and APIs, which remove some of our traditional security controls and conceptions. Someone accessing a cloud administrator or developer’s system could, depending on things are set up, access literally an entire datacenter.
- Cloud data can be incredibly resilient, with any given bit stored in multiple places across the cloud.
- You may have 3 or more networks to secure (for storage) and segregate. Don’t trust VLANs.
- You have far less visibility into where things are actually stored, although some cloud platforms are beginning to offer more transparency – this is an evolving area.
- You still have physical and virtual storage to keep secure, underneath everything else.
Due to all this complexity and portability, encryption is the best tool available for most cloud data security. Encryption, implemented properly, protects data as it moves through your environment. It doesn’t matter if there are 3 versions of a particular block exposed on multiple hard drives, because without the key the data is meaningless. It doesn’t matter if someone makes a snapshot of an encrypted volume public. Only exposure of the keys and data would be problematic.
Of course encryption cannot wipe all security issues away. As we will discuss, you cannot use it for certain applications such as boot volumes, and data on unencrypted volumes is still exposed. But in combination with our other recommendations, encryption enables you to store and process even sensitive data in the cloud.