February 10, 2016 - Securosis

Securing Hadoop: Operational Security Issues

Adrian Lane February 10, 2016 No Comments

Beyond the architectural security issues endemic to Hadoop and NoSQL platforms discussed in the last post, IT teams expect some common security processes and supporting tools familiar from other data management platforms. That includes “turning the dials” on configuration management, vulnerability assessment, and maintaining patch levels across a complex assembly of supporting modules. The day-to-day processes IT managers follow to ensure typical application platforms are properly configured have evolved over years – core platform capabilities, community contributions, and commercial third-party support to fill in gaps. Best practices, checklists, and validation tools to verify things like admin rights are sufficiently tight, and that nodes are patched against known and perhaps even unknown vulnerabilities. Hadoop security has come a long way in just a few years, but it still lacks the maturity in day to day operational security offerings, and it is here that we find most firms continue to struggle. The following is an overview of the most common threats to data management systems, where operational controls offer preventative security measures to close off most common attacks. Again we will discuss the challenges, then map them to mitigation options. Authentication and authorization: Identity and authentication are central to any security effort – without them we cannot determine who should get access to data. Fortunately the greatest gains in NoSQL security have been in identity and access management. This is largely thanks to providers of enterprise Hadoop distributions, who have performed much of the integration and setup work. We have evolved from simple in-database authentication and crude platform identity management to much better integrated LDAP, Active Directory, Kerberos, and X.509 based authentication options. Leveraging those capabilities we can use established roles for authorization mapping, and sometimes extend to fine-grained authorization services with Apache Sentry, or custom authorization mapping controlled from within the calling application the database. Administrative data access: Most organizations have platform administrators and NoSQL database administrators, both with access to the cluster’s files. To provide separation of duties – to ensure administrators cannot view content – a facility is needed to segregate administrative roles and keep unwanted access to a minimum. Direct access to files or data is commonly addressed through a combination of role based-authorization, access control lists, file permissions, and segregation of administrative roles – such as with separate administrative accounts, bearing different roles and credentials. This provides basic protection, but cannot protect archived or snapshotted content. Stronger security requires a combination of data encryption and key management services, with unique keys for each application or cluster. This prevents different tenants (applications) in a shared cluster from viewing each other’s data. Configuration and Patch Management: With a cluster of servers, which may have hundreds of nodes, it is common to run different configurations and patch levels at one time. As nodes are added we see configuration skew. Keeping track of revisions is difficult. Existing configuration management tools can cover the underlying platforms, and HDFS Federation will help with cluster management, but they both leave a lot to be desired – including issuing encryption keys, avoiding ad hoc configuration changes, ensuring file permissions are set correctly, and ensuring TLS is correctly configured. NoSQL systems do not yet have counterparts for the configuration management tools available for relational platforms, and even commercial Hadoop distributions offer scant advice on recommended configurations and pre-deployment checklists. But administrators still need to ensure configuration scripts, patches, and open source code revisions are consistent. So we see NoSQL databases deployed on virtual servers and cloud instances, with home-grown pre-deployment scripts. Alternatively a “golden master” node may embody extensive configuration and validation, propagated automatically to new nodes before they can be added into the cluster. Software Bundles: The application and Hadoop stacks are assembled from many different components. Underlying platforms and file systems also vary – with their own configuration settings, ownership rights, and patch levels. We see organizations increasingly using source code control systems to handle open source version management and application stack management. Container technologies also help developers bundle up consistent application deployments. Authentication of applications and nodes: If an attacker can add a new node they control to the cluster, they can exfiltrate data from the cluster. To authenticate nodes (rather than users) before they can join a cluster, most firms we spoke with either employ X.509 certificates or Kerberos. Both can authenticate users as well, but we draw this distinction to underscore the threat of rogue applications or nodes being added to the cluster. Deployment of these services brings risks as well. For example if a Kerberos keytab file can be accessed or duplicated – perhaps using credentials extracted from virtual image files or snapshots – a node’s identity can be forged. Certificate-based identity options implicitly complicate setup and deployment, but properly deployed they can provide strong authentication and stronger security. Audit and Logging: If you suspect someone has breached your cluster, can you detect it, or trace back to the root cause? You need an activity record, which is usually provided by event logging. A variety of add-on logging capabilities are available, both open source and commercial. Scribe and LogStash are open source tools which integrate into most big data environments, as do a number of commercial products. You can leverage the existing cluster to store logs, build an independent cluster, or even leverage other dedicated platforms like a SIEM or Splunk. That said, some logging options do not provide an auditor sufficient information to determine exactly what actions occurred. You will need to verify that your logs are capturing both the correct event types and user actions. A user ID and IP address are insufficient – you also need to know what queries were issued. Monitoring, filtering, and blocking: There are no built-in monitoring tools to detect misuse or block malicious queries. There isn’t even yet a consensus on what a malicious big data query looks like – aside from crappy MapReduce scripts written by bad programmers. We are just seeing the first viable releases of Hadoop activity monitoring tools. No longer the “after-market speed regulators” they once were, current tools typically embedded into a

Read Post

Research

Securing Hadoop: Operational Security Issues

Research

Firestarter: Multicloud Deployment Structures and Blast Radius

Firestarter: So you want to multicloud?

Firestarter: 2019: Insert Winter is Coming Meme Here

Firestarter: re:Invent Security Review

Firestarter: Hardware Hacks and Lift and Pray

Sign Up for Our Newsletter

Contact

About

Quick Links