I started writing up a post on IaaS encryption options and quickly realized I should probably precede it with a post outlining the IaaS storage options first. One slightly confusing bit is that IaaS storage really falls into two categories: storage as a service where the storage itself is the product, and storage for IaaS compute instances, where the storage is tied to running virtual machines.
IaaS storage options include:
- Raw storage: As far as I can tell, this is only available for private clouds, and not on every platform. For certain high-speed operations it allows you to map a virtual volume to dedicated raw media. This skips abstraction layers for increased performance, but you lose many of the advantages of cloud storage. It’s rarely used, and may only be available on VMWare.
- Volume storage: The easiest way to think of volume storage is as a virtual hard drive for your instances. There are a few different architectures, but volumes are typically a clump of assigned blocks (often stored redundantly in the back end). When you create a volume the volume controller assigns the blocks, distributes them onto the physical storage infrastructure, and presents them as a raw volume. You then need to attach the volume to an instance, install partitions and file systems on it, and manage it like a drive. Although it presents as a single drive to your instance, volume storage is more like RAID – each block is replicated in multiple locations on different physical drives. Amazon EBS and Rackspace RAID volumes are examples.
- Object storage: Object storage is sometimes referred to as file storage. Rather than a virtual hard drive, object storage is more like a file share. Object storage performs more slowly, but is more efficient. The back end can be structured in different ways – most often a database / file system hybrid, with a bunch of processes to keep track of where everything is stored, replication, cleanup, and other housekeeping functions. Amazon S3, Rackspace Cloud Files, and OpenStack Swift are examples.
For our purposes, we will consider cloud databases part of PaaS.
So when we talk about IaaS storage, we are mostly talking volumes and objects. Volumes are like hard drives, and object storage is effectively a file share with a nifty API.
An additional piece is important for running IaaS instances: image management. Images (such as Virtual Machine Images and Amazon Machine Images) can be stored in a variety of ways, but most often in object storage because it’s cheaper and more efficient. Layered on top is an image manager such as OpenStack Glance, which tracks the images and ties them into the compute management plane.
When you create an IaaS instance you pick an image, which the image manager then pulls from object storage and streams to the hypervisor/system that will host the instance.
But the image manager doesn’t need to use object storage. Glance, for example, can use pretty much anything – including local file storage, which is particularly handy in test environments.
Lastly, we can’t forget about snapshots. Snapshotting an instance essentially makes a block-level copy of the volume it’s running on or attached to. Snapshot creation is just about instantaneous, but they need not be kept as volumes. The snapshot may be sent off to more-efficient object storage instead. If you want to turn a snapshot back into a volume you send a request, storage is assigned, and the image streams back into volume storage from object storage; you can then attach it to instances.
You’ll notice some nice interplays between object and volume storage to keep things as efficient as possible. It’s one of the cool things about cloud computing.
Hopefully this gives you a better idea of how the back end works. In a future post I will talk about volume encryption and the relationship between volume and object storage.
Reader interactions
4 Replies to “IaaS Storage 101”
Adrian-
Volume/block doesn’t have a file system either. If we use that as the definition how is it different than raw? It’s that mapping to the raw physical infrastructure that defines it as raw.
Rich – Your definition sounds like raw==physical, and I don’t agree with that. It can be raw (no file system) and still be virtual. VMDK do indeed _map_ to (one or more) physical platters but that map layer is hidden from the consumer of the storage. Why I think it’s raw is they can be used for any file system type and be used like a disk or like a file share. I concede it could be considered block because they act like mountable volumes.
I am hoping for more discussion on exactly this point.
-Adrian
Can I disagree?
VMDK files aren’t raw, they are definitely block since they don’t map to raw, physical platters.
Rich – We should discuss where _Cloud NAS_ falls in this picture and why. Firms like Permabit and Nasuni offer cloud NAS storage that look like Object Storage in IaaS but behave like PaaS (as they wrap and resell Amazon, Rackspace and Azure).
Also, the examples I cited for ‘raw’ were VMWare VMDK files and Microsoft VHD files. I am interested if anyone disagrees with that characterization.
-Adrian