NoSQL Security 2.0 [New Series] *updated*By Adrian Lane
NoSQL, both the technology and the industry, have taken off. We are past the point where we can call big data a fad, and we recognize that we are staring straight into the face of the next generation of data storage platforms. About 2 years ago we started the first Securosis research project on big data security, and a lot has changed since then. At that point many people had heard of Hadoop, but could not describe what characteristics made big data different than relational databases – other than storing a lot of data. Now there is no question that NoSQL — as a data management platform — is here to stay; enterprises have jumped into large scale analysis projects with both feet and people understand the advantages of leveraging analytics for business, operations, and security use cases. But as with all types of databases – and make no mistake, big data systems are databases – high quality data produces better analysis results. Which is why in the majority of cases we have witnessed, a key ingredient is sensitive data. It may be customer data, transactional data, intellectual property, or financial information, but it is a critical ingredient. It is not really a question of whether sensitive data is stored within the cluster – more one of which sensitive data it contains. Given broad adoption, rapidly advancing platforms, and sensitive data, it is time to re-examine how to secure these systems and the data they store.
But this paper will be different than the last one. We will offer much more on big data security strategies in addition to tools and technologies. We will spend less time defining big data and more looking at trends. We will offer more explanation of security building blocks including data encryption, logging, network encryption, and access controls/identity management in big data ecosystems. We will discuss the types of threats to big data and look at some of the use cases driving security discussions. And just like last time, we will offer a frank discussion of limitations in platforms and vendor offerings, which leave holes in security or fail to mesh with the inherent performance and scalability of big data.
I keep getting one question from enterprise customers and security vendors. People ask repeatedly for a short discussion of data-centric security, so this paper provides one. This is because I have gotten far fewer questions in the last year on how to protect a NoSQL cluster, and far more on how to protect data before it is stored into the cluster. This was a surprise, and it is not clear from my conversations whether it is because users simply don’t trust the big data technology, due to worries about data propagation, because they don’t feel they can meet compliance obligations, or if they are worried about the double whammy of big data atop cloud services – all these explanations are plausible, and they have all come up. But regardless of driver, companies are looking for advice around encryption and wondering if tokenization and masking are viable alternatives for their use cases. The nature of the questions tells me that is where the market is looking for guidance, so I will cover both cluster security and data-centric security approaches.
Here is our current outline:
- Big Data Overview and Trends: This post will provide a refresher on what big data is, how it differs from relational databases, and how companies are leveraging its intrinsic advantages. We will also provide references on how the market has changed and matured over the last 24 months, as this bears on how to approach security.
- Big Data Security Challenges: We will discuss why it is different architecturally and operationally, and also how the platform bundles and approaches differ from traditional relational databases. We will discuss what traditional tools, technologies and security controls are present, and how usage of these tools differs in big data environments.
- Big Data Security Approaches: We will outline the approaches companies take when implementing big data security programs, as reference architectures. We will outline walled-garden models, cluster security approaches, data-centric security, and cloud strategies.
- Cluster Security: An examination of how to secure a big data cluster. This will be a threat-centric examination of how to secure a cluster from attackers, rogue admins, and application programmers.
- Data (Centric) Security: We will look at tools and technologies that protect data regardless of where it is stored or moved, for use when you don’t trust the database or its repository.
- Application Security: An executive summary of application security controls and approaches.
- Big data in cloud environments: Several cloud providers offer big data as part of Platform or Infrastructure as a Service offerings. Intrinsic to these environments are security controls offered by the cloud vendor, offering optional approaches to securing the cluster and meeting compliance requirements.
- Operational Considerations: Day-to-day management of the cluster is different than management of relational databases, so the focus of security efforts changes too. This post will examine how daily security tasks change and how to adjust operational controls and processes to compensate. We will also offer advice on integration with existing security systems such as SIEM and IAM.
As with all our papers, you have a voice in what we cover. So I would like feedback from readers, particularly whether you want a short section of application layer security as well. It is (tentatively) included in the current outline. Obviously this would be a brief overview – application security itself is a very large topic. That said, I would like input on that and any other areas you feel need addressing.
Ajit - in three of the security architecture posts to come. IAM and Kerberos are essential to a couple of the models.
I am not seeing SIEM right now - I am seeing devs leverage the cluster to capture logs. Logstash. Log4J, logback or whatever build in feature the cluster has. Mining and analytics are very basic at this point. And monitoring usage is unusual. At least it is right now.
By Adrian Lane
Adrian, where would you cover the IAM sections (under AppSec?)Some of the most critical pieces around security (e.g. User entitlement pieces, authentication (leveraging Kerberos, cert based, using keytab files etc.), Authorization (different models - RBAC, Hbase, etc.).
Under operational considerations, it would be interesting to see some thoughts around data metering (e.g. using SIEM/analytics to detect anomalous usage)
Yes. You’re right. I wholly agree with your first first sentence: Let’s top calling it big data. I’ll remove it from this series. In that spirit I’ve renamed the series NoSQL Security. Is NoSQL a great choice? No. It’s a badge of rebellion more than a true name for an emerging industry. Especially in light of the recent trend of people actually bolting on a SQL query language parser onto Hadoop. But the name is certainly better than big data. If the industry ever finally decides it’s time to call these platforms modular databases (https://securosis.com/blog/random-thought-meet-your-new-database) or analytics databases or just databases I’ll change the title.
I do not believe I fully understand you’re second and third sentences - but it sounds like you do _not_ want me to cover application security within the context of this series. As the database is someone an application stack in and of itself, but it does seem like a separate domain of skills/efforts. I am on the fence about this but I’ll seriously weigh your comment.
By Adrian Lane
Please just stap using the phrase big data. Use data product, data lake, or anything else.
Data productization, for security, merely adds/aggregates activity-based intelligence.
Appsec is a piece to a security architecture puzzle—a part of the landscape. Data products are not landscape, they are theory (the analytic model or models they are based on). Don’t cross the streams, buddy.
By Andre Gironda