If you follow the security press, you know many predict that big data will transform information security. RSA recently released a security brief on security analytics with big data that mirrors the press. Depending on your perspective, security analytics with big data may be the concept that we’ll leverage big data clusters for actionable intel in coming years. Or if you talk to SIEM vendors who run on top of NoSQL repositories, the future has been here for 5 years. You may go with “none of the above”. To me it is simply a good idea that has yet to be fully implemented, which is currently just something we talk about in the security echo chamber.

But that did not stop me from enjoying the paper. And I don’t say that about most vendor-led research. Most of it makes me angry, to the point where I avoid writing about it to avoid saying really nasty things in public, which should not be printed. But I want to make a couple comments on the assumptions here – specifically, “Big data’s new role in security comes at a time time when organizations confront un-precedented risk arising from two conditions:”, which implies a connection to both security concerns and the need for big data analytics. I think that link is tenuous, and serves their premise poorly.

The dissolving perimeter has little or nothing to do with security analytics with big data. The “dissolving perimeter” became a topic for discussion because third-party cloud services, combined with mobile devices, have destroyed the security value of the corporate IT ‘perimeter’. The ‘edge’ of the network now has so many holes that it no longer forms a discernible boundary between inside and outside. We do, however – given the number of servers, services, and mobile computing platforms (all programmed to deliver event data) get a wealth of constantly generated information. Cheap computing resources, coupled with nearly free analytics tools, make storage and processing of this data newly feasible.

And do you think we have more sophisticated adversaries? APT is one argument for this idea, but I tend to think we have more determined adversaries. Given the increasing complexity of IT systems, there seems to be plentiful “low-hanging fruit” – accessible security vulnerabilities for attackers to take advantage of. We have evidence that some security measures are really working – Jeremiah Grossman discussed how this is shifting attacker tactics. Many attacks are not so sophisticated, but still hard to detect. I think the link to big data and attackers appears when you couple the complexity of IT environments with the staggering volume of data, and it becomes very difficult to find the proverbial needle in the haystack. The good news is that this is exactly the type of outlier big data can detect – provided it’s programmed to do so.

But ultimately I agree with their assertions, albeit for slightly different reasons. I have every confidence that big data holds promise for security intelligence, both because I have witnessed attacker behavior captured in event data just waiting to be pulled out, and because I have also seen miraculous ideas sprout from people just playing around with database queries. In the same way hackers stumble on vulnerabilities while playing with protocols, engineers stumble on interesting data just by asking the same question (query) different ways. The data holds promise. The mining of most data, and all of the work that will be required in writing M-R scripts to locate actionable intelligence, is not yet here. It will take years of dedicated work – and it’s will take script development on different data types for different NoSQL varieties.

Finally, I like the helpful graphic differentiating passive vs. active inputs. I also really like Amit Yoran’s commentary; he is dead on target. The need to aggregate, normalize, and correlate in advance can go away when you move to big data repositories. It’s ironic, but you can get better intelligence faster when you do not pre-process the data.

It may smell a bit like forecasts and new year’s predictions, but the paper is worth a read.