Every now and again I run into an article that totally baffles me. It’s as if the author had a bunch of somewhat related quotes sitting around, and then stitched a Frankenstein article together. In this case the article was in the October 5th edition of eWeek, and the topic was “Databases: The next big virtualization thing”. The intention seems to be sketching out some hazy future projections about virtualized databases, and what wonderful things virtualization can do for you. But if you closely examine the assertions, not only are they are based on bad assumptions, they are flat-out misleading. I am not sure there is a single point in the article I wholly agree with. Rather than wallow in this mess, I will offer you what I consider to be 7 myths surrounding databases in virtual environments:
Myth #1 – Virtualization makes database administration easier. No. Any time you place a database into an environment, virtual or not, the database needs to be tuned to operate efficiently within that environment. Virtualization abstracts the resources underneath the database; it does not relieve you from the administrative tasks of tuning and provisioning. While it is theoretically possible to reduce administrative tasks by standardizing an environment, history has shown we need to optimize database configuration to accommodate organic changes that occur over time.
Myth #2 – Virtualization improves database performance. Possibly, but not always. Improvements to database performance are more likely to result from tuning SQL and database structures. Generally speaking, improvements in database logic offer an order of magnitude greater improvement than any ‘external’ changes. Virtualization does provide an easier way to allocate more resources to a database, and is highly beneficial when a database is memory or CPU constrained. I/O constrained databases are as likely to suffer from distributed storage latency as realize gains in performance, and more likely require some redesign to take advantage of virtual resources. Sure, you can throw twice as many resources at a database, but that does not mean it will automatically perform better!
Myth #3 – Virtualization lets you consolidate databases. Not really. Virtualization offers the ability to use a single central database installation, but you still normally use multiple database instances to support multiple applications. Effective consolidation of databases to take advantage of virtual environments requires some database re-engineering and does not magically (automatically) occur in a virtual environment.
Myth #4 – Virtualization will reduce your database licensing costs. This is not typically the case. Check with your vendor on this, because adding a virtual CPU is likely to cost you additional fees just as if you added a real CPU. Per database pricing may mean higher licensing costs, not lower. It will depend upon your vendor’s pricing model, so do not take it for granted.
Myth #5 – Virtualization provides better database security. I have never understood this claim. How exactly could virtualization make a database more secure? Through obscurity? Some giant VMotion shell game that hides the location of the data? The access to your data is still gated by access controls and governed by permissions. Security is largely dependent upon solid configuration of the database and current patches being applied, which may nor may not be easier depending upon how you have your virtual environment set up. Virtualization provides no inherent advantage to security, and opens up additional vulnerabilities. I have never been a big fan of the concept of ‘threat surface’, but if data gets copied to multiple locations there are simply more chances to gain access to the raw data files, which is why we recommend transparent database encryption for databases in virtual environments.
Myth #6 – Virtualization enables all clustered databases to be active simultaneously. Nonsense! This is possible today without virtualization. SQL Server is a good example. It offers two basic models for database clustering: an active-passive setup designed for failover, and an active-active mode for distributed processing. Both require the data sets to be synchronized, often via shared disks. The former requires no special database design work – only the appropriate configuration. In the later case you really need a data allocation strategy to minimize performance and data contention issues. Virtualization does provide the means to make physically separate disks appear as one, but it does not make synchronization issues go away.
Myth #7 – Virtualization helps abstract the database from applications: No, it doesn’t. Abstraction technologies like Hibernate can mask the underlying database usage from an application. Generalization of data types stored within a database or even use of XML allow data to be moved between heterogenous databases and applications. There is nothing inherent to virtualization technology that abstracts database usage. The benefit virtualization provides, in cases of disaster recovery, is being able to easily spawn a new copy of the database should the existing copy no longer be available.