I have always felt the punctuated equilibrium of database technology is really slow, with long periods between the popularity of simple relational ‘desktop’ databases (Access, Paradox, DBIII+, etc) and ‘enterprise’ platforms (DB2, Oracle, SQL Server, etc). But for the first time in my career, I am beginning to believe we are seeing a genuine movement away from relational database technology altogether. I don’t really study trends of relational database management platforms like I did a decade or so ago, so perhaps I have been slightly ignorant of the progression, but I am somewhat surprised by the rapidity with which programmers and product developers are moving away from relational DB platforms and going to simple indexed flat files for data storage. Application developers need data storage and persistence as much as ever, but it seems simpler is better. Yes, they still use tables, and they may use indices, but complex relational schemata, foreign keys, stored procedures, normalization, and triggers seem to be unwanted and irrelevant.
Advanced relational technologies are being ignored, especially by web application developers, both because they want to manage the functions within the application they know (as opposed to the database they don’t), and because it makes for a cleaner design and implementation of the application. What has surprised me is the adoption of indexed flat files for data storage in lieu of any relational engine at all. Flat files offer a lot of flexibility, they can deal with bulk data insertions very quickly, and depending upon how they are implemented may offer extraordinary query response. It’s not like ISAM and other variants ever went away, as they remain popular in everything from mainframes to control systems. We moved from basic flat files to relational platforms because they offered more efficient storage, but that requirement is long dead. We have stuck with relational platforms because they offered data integrity and transactional consistency lacking in the simple data storage platforms, as well as excellent lookup speed on reasonably static data sets, and they provide a big advantage with of pre-compiled, execution ready stored procedure code. However when the primarily requirement is quick collection and scanning of bulk data, you don’t really care about those features so much. This is one of the reasons why many security product vendors moved to indexed flat files for data storage as it offers faster uploads, dynamic structure, and correlation capabilities, but that is a discussion for another post.
I have been doing some research into ‘cloud’ service & security technologies of late, and a few months ago I was reminded of Amazon Web Services’ offering, Amazon SimpleDB. It’s a database, but in the classic sense, or what databases were like prior to the relational database model we have been using for the last 25 years. Basically it is a flat file, with each entry having attached name/value attribute pairs. Sounds simple because it is. It’s a bucket to dump data in. And you have the flexibility to introduce as much or as little virtual structure into it as you care to. It has a query interface, with all of the same query language constructs that most SQL languages offer. It appears to have been quietly launched in 2007, and I am guessing it was built by Amazon to solve their own internal data storage needs. In May of this year they augmented the query engine to support comparison operators such as ‘contains’ and several features for managing result sets. At this point, the product seems to have reached a state where it offers enough functionality to support most web application developers. You will be giving up a lot of (undesired?) functionality, but f you just want a simple bucket to dump data into with complete flexibility, this is a logical option.
I am a believer that ‘cheaper, faster, easier’ always wins. Amazon’s SimpleDB fits that model. It’s feasible that this technology could snatch away the low end of the database market that is not interested in relational functions.
Reader interactions
2 Replies to “Amazon’s SimpleDB”
The key turning point in data management came at least ten years ago when it became clear that data organization optimized for transaction processing (e.g. third normal form) was completely wrong for decision support/analytics. Thus companies began maintaining two databases – one for Online Transaction Processing (OLTP) and one for Online Analytical Processing (OLAP)- for each application system.
Traditional relational databases remain king for OLTP. Let’s not forget the math behind the relational database model. However, there has been much experimentation with ways to store data for OLAP. In fact, log management is just an example of an OLAP application.
Finally there does seem to be value in the SQL language (or a subset of it) as a standard way to access data regardless of the way it’s stored, because there are so many third party business intelligence tools out there. Therefore the SQL language is kind of open industry standard API to data.
You could make the case that this is just the further decomposition of application architecture. RDBMS was really a compute layer with stored procedures and the like because the fat clients were inefficient at this in client/server architecture. Now with app servers run wild and SOA and more flexible compute architectures, this is a logical move.
But it leaves a number of DBA types with skills potentially as leverageable as COBOL and CICS.
Mike.
http://blog.securityincite.com
http://blog.eiqnetworks.com