Security Analytics with Big Data [New Series]

By Adrian Lane

Big Data is being touted as a ‘transformative’ technology for security event analysis – promised to detect threats in the ever-increasing volume of event data generated from in-house, mobile, and cloud-based services. But a combination of PR hype, vendor positioning, and customer questions has pushed it to the top of my research agenda. Many customers are asking “Wait, don’t I already have SIEM for event analysis?” Yes, you do. And SIEM is designed and built solve the same problems – but 7-8 years ago – and it is failing to keep up with current problems. It’s not just that we’re trying to scale up to a much larger set of data, but we also need to react to events an order of magnitude faster than before. Still more troubling is that we are collecting multiple types of data, each requiring new and different analysis techniques to detect advanced attacks. Oh, and while all that slows down SIEM and log management systems, you are under the gun to identify attacks faster than before.

This trifecta of issues limit the usefulness of SIEM and Log Management – and makes customers cranky. Many SIEM platforms can’t scale to the quantity of data they need to manage. Some are incapable of even storing basic data as fast as it comes in – forget about storing and analyzing non-standard data types. ‘Real-time’ analysis is a commonly cited as SIEM feature but after collection, storage, normalization, correlation, and enrichment, you are lucky to access new events within an hour – much less within a minute. The good news is that big data, correctly deployed, can solve these issues. In this paper we will examine how big data addresses scalability and performance, improves analysis, can accommodate multiple data types, and will be leveraged with existing environments. Or goal is to help users differentiate reality from wishful thinking, and to provide enough information to make informed purchasing decisions.

To do this we need to demystify big data and contrast how it differs from traditional data management systems. We will offer a clear and unique definition of big data and explain how it helps overcome current technical limitations. We will offer a pragmatic way for customers to leverage big data, enabling them to select a solution strategically. We will highlight the limitations of SIEM and Log Management, key areas of customer dissatisfaction, areas where big data excels in comparison. We will also discuss some changes required for big data analysis and data management, as well as a change in mindset necessary to take full advantage.

This is not all theory and speculation – big data is currently being employed to detect security threats, address new requirements for IT security, and even help gauge the effectiveness of other security investments. Big data natively addresses ever-increasing event volume and the rate at which we need to examine new events. There is no question that it holds promise for security intelligence, both in the numerous ways it can parse information and through its native capabilities to sift proverbial needles from monstrous haystacks. Cloud and mobile architectures force us to reexamine how we manage security data, and to scale across broader sets of systems and events – neither of which mesh with the structured data repositories on which most organizations rely. But most IT and security practitioners do not yet fully understand big data or how to employ it so they are unable to weed through all the hype, FUD, and hyperbole. To take full advantage, however, requires both a deeper understanding of the technology and a subtle shift in mindset to enable informed decisions on incorporate big data into existing IT systems, perhaps by shifting to newer big data platforms.

This research paper will highlight several areas:

  • Use Cases: We will discuss issues customers cite with performance and scalability, particularly for security event analysis. We will discuss in detail how SIEM, Log Management, and event-centric systems struggle under new requirements for data velocity and data management, and why existing technologies aren’t cutting it. We will also discuss the inflexibility of pre-BD analysis, alerting, and reporting – and how they demand a new approach to security and forensics, as we struggle to keep pace with the evolution of IT.
  • New Events and Approaches: This post will explain why we need to consider additional data types that go beyond events. Existing technologies struggle to meet emerging needs because threat data does not conform to traditional syslog and netflow event types. There is a clear trend toward broader data analysis to detect advanced attacks and better understand risks.
  • What is Big Data and how does it work? This post will offer a basic definition of big data, along with a discussion of the native capabilities that make big data different than traditional analysis tools. We will discuss how features like HDFS, MapReduce, Hive, and Pig work together to address issues of scale, velocity, performance, and multiple data types.
  • The promise of big data: We will explain why big data is viewed as a disruptive technology for security analytics. We will show how big data solutions mitigate problems and change security and event analysis. We will discuss how big data platforms handle collecting and parsing event data, and cover different queries and reports that support new threat analyses.
  • How big data changes security platforms: This post will discuss how to supplement existing systems – through standalone instances, partial integration of big data with existing systems, systems that natively leverage big data infrastructure, or fully integrated systems that run atop NoSQL structures. We will also discuss operational changes to SIEM usage, including the growing importance of data scientists to security.
  • Integration roadmap and planning: In this section we will address the common concerns, limitations, and realities of merging big data into your IT systems. Specifically, we will discuss:
    • Integration and deployment issues
    • Platform selection (diversity of platforms and data)
    • Policy and report development
    • Data privacy and sharing
    • Big data platform security basics

Our next post will cover use cases, the key areas where SIEM needs to improve, and some key areas of customer dissatisfaction.

No Related Posts

>many organizations aren’t even capable of performing
>competent Big Data security analytics. 

MANY???  Can we agree on 99.99%? :-)

By Anton Chuvakin

While it’s true that SIEM information fed into Big Data can provide a swift analytical response, along with Data Usage Control, to provide a faster, more complete view of security information and events, the “three V’s” set forth as SIEM issues aren’t necessarily the biggest issue today.

With the shortage of professionals with the ability to utilize Big Data’s deep analytical functions as well as a general shortage of data security personnel, many organizations aren’t even capable of performing competent Big Data security analytics.  Add into the fact that security analytics and monitoring only show where the problems are, and don’t do anything to address them, and the urgency shrinks a bit more.

An organization could spend countless hours and dollars just trying to configure their Big Data systems and security reporting, rather than taking concrete steps to protect their data.  Not to mention that they could potentially leave all of their security information out in the open if the Big Data environment is not properly protected.  There are still too few people with thorough knowledge about Big Data environments and security, and many organizations probably don’t even know they’re doing anything wrong from a security perspective.

The ideal solution would begin with strong, granular protection of data through Vaultless Tokenization or masking, along with SIEM systems.  Then later they can use existing SIEM functions to feed Big Data security analytics, which are also protected.  Over time, these analytics can inform a Data Usage Control to identify abnormalities in usage or access attempts.  In this way, organizations can build a cycle of protection, detection, and remediation for continuous, intelligent protection of their systems.

In the meantime, while more professionals are being trained and the deficit in analytical and security skills is hopefully being reduced, the powerful security provided by a field-level Vaultless Tokenization or masking can provide a “safety net” in case of breach, both in the traditional enterprise systems, as well as inside the Big Data environment itself.  Once the capabilities are there to fully utilize Big Data security analytics, it will only add to the ability to respond faster and to a higher percentage of security events.

By Ulf Mattsson

Thanks for the response. Given the incredible “hype vs reality” balance in this particular niche of security market, maybe even the vendors who have “no reality” here will support the research…

By Anton Chuvakin


Yes, of course, I will disclose vendors who wish to license the content when commitments are firm. Several have expressed interest so I will post soon.


By Adrian Lane

I am *really* curious which vendors will choose to sponsor it as it might present a worldview that few (any?) of them share…..

By Anton Chuvakin

If you like to leave comments, and aren’t a spammer, register for the site and email us at and we’ll turn off moderation for your account.