Security Analytics with Big Data [New Series]

Big Data is being touted as a ‘transformative’ technology for security event analysis – promised to detect threats in the ever-increasing volume of event data generated from in-house, mobile, and cloud-based services. But a combination of PR hype, vendor positioning, and customer questions has pushed it to the top of my research agenda. Many customers are asking “Wait, don’t I already have SIEM for event analysis?” Yes, you do. And SIEM is designed and built solve the same problems – but 7-8 years ago – and it is failing to keep up with current problems. It’s not just that we’re trying to scale up to a much larger set of data, but we also need to react to events an order of magnitude faster than before. Still more troubling is that we are collecting multiple types of data, each requiring new and different analysis techniques to detect advanced attacks. Oh, and while all that slows down SIEM and log management systems, you are under the gun to identify attacks faster than before. This trifecta of issues limit the usefulness of SIEM and Log Management – and makes customers cranky. Many SIEM platforms can’t scale to the quantity of data they need to manage. Some are incapable of even storing basic data as fast as it comes in – forget about storing and analyzing non-standard data types. ‘Real-time’ analysis is a commonly cited as SIEM feature but after collection, storage, normalization, correlation, and enrichment, you are lucky to access new events within an hour – much less within a minute. The good news is that big data, correctly deployed, can solve these issues. In this paper we will examine how big data addresses scalability and performance, improves analysis, can accommodate multiple data types, and will be leveraged with existing environments. Or goal is to help users differentiate reality from wishful thinking, and to provide enough information to make informed purchasing decisions. To do this we need to demystify big data and contrast how it differs from traditional data management systems. We will offer a clear and unique definition of big data and explain how it helps overcome current technical limitations. We will offer a pragmatic way for customers to leverage big data, enabling them to select a solution strategically. We will highlight the limitations of SIEM and Log Management, key areas of customer dissatisfaction, areas where big data excels in comparison. We will also discuss some changes required for big data analysis and data management, as well as a change in mindset necessary to take full advantage. This is not all theory and speculation – big data is currently being employed to detect security threats, address new requirements for IT security, and even help gauge the effectiveness of other security investments. Big data natively addresses ever-increasing event volume and the rate at which we need to examine new events. There is no question that it holds promise for security intelligence, both in the numerous ways it can parse information and through its native capabilities to sift proverbial needles from monstrous haystacks. Cloud and mobile architectures force us to reexamine how we manage security data, and to scale across broader sets of systems and events – neither of which mesh with the structured data repositories on which most organizations rely. But most IT and security practitioners do not yet fully understand big data or how to employ it so they are unable to weed through all the hype, FUD, and hyperbole. To take full advantage, however, requires both a deeper understanding of the technology and a subtle shift in mindset to enable informed decisions on incorporate big data into existing IT systems, perhaps by shifting to newer big data platforms. This research paper will highlight several areas: Use Cases: We will discuss issues customers cite with performance and scalability, particularly for security event analysis. We will discuss in detail how SIEM, Log Management, and event-centric systems struggle under new requirements for data velocity and data management, and why existing technologies aren’t cutting it. We will also discuss the inflexibility of pre-BD analysis, alerting, and reporting – and how they demand a new approach to security and forensics, as we struggle to keep pace with the evolution of IT. New Events and Approaches: This post will explain why we need to consider additional data types that go beyond events. Existing technologies struggle to meet emerging needs because threat data does not conform to traditional syslog and netflow event types. There is a clear trend toward broader data analysis to detect advanced attacks and better understand risks. What is Big Data and how does it work? This post will offer a basic definition of big data, along with a discussion of the native capabilities that make big data different than traditional analysis tools. We will discuss how features like HDFS, MapReduce, Hive, and Pig work together to address issues of scale, velocity, performance, and multiple data types. The promise of big data: We will explain why big data is viewed as a disruptive technology for security analytics. We will show how big data solutions mitigate problems and change security and event analysis. We will discuss how big data platforms handle collecting and parsing event data, and cover different queries and reports that support new threat analyses. How big data changes security platforms: This post will discuss how to supplement existing systems – through standalone instances, partial integration of big data with existing systems, systems that natively leverage big data infrastructure, or fully integrated systems that run atop NoSQL structures. We will also discuss operational changes to SIEM usage, including the growing importance of data scientists to security. Integration roadmap and planning: In this section we will address the common concerns, limitations, and realities of merging big data into your IT systems. Specifically, we will discuss: Integration and deployment issues Platform selection (diversity of platforms and data) Policy and report development Data privacy and sharing Big data platform security basics Our next post will cover use cases, the key areas where SIEM needs to improve,

Read Post

The CISO’s Guide to Advanced Attackers: Mining for Indicators

The key to dealing with advanced attackers is not closing off every window of vulnerability. As we have discussed throughout this series, advanced attackers will figure out a way to gain a foothold in your environment. Actually they will find multiple ways into your environment. So if you hope for any semblance of success, your goal cannot be to stop them – instead you need to work on shorteneing the window between compromise and detection. We have called that Reacting Faster and Better for years. 5 years to be exact, but who’s counting? The general concept is that you want to monitor your environment, gathering key security information that can either identify typical attack patterns as they are happening (yes, a SIEM-like capability), or more likely searching for indicators identified via intelligence activities. Collecting All the Security Data We say “all the security data” a bit tongue-in-cheek, but not too much. We have been saying Monitor Everything almost as long as we have been talking about Reacting Faster, because if you fail to collect data you won’t have an opportunity to get it later. Unfortunately most organizations don’t realize their security data collection leaves huge gaps until the high-priced forensics folks let you know they can’t truly isolate the attack, or the perpetrator, or the malware, or much of anything, because you just don’t have the data. Most folks only need to learn that lesson once. So the first order of business is to lay down a collection infrastructure to store all your security data. The good news is that you have likely been collecting security data for quite some time, and your existing investment and infrastructure should be directly useful for dealing with advanced attackers. This means existing log management system may be useful after all. But perhaps not – you might have tools that aren’t at all suited to helping you find advanced attackers in your midst. One step at a time – now let’s delve into the data you need to collect. Network Security Devices: Your firewalls and IPS devices generate huge logs of what’s blocked, what’s not, and which rules are effective. You will receive intelligence that typically involves port/protocol/destination combinations or application identifiers for next-generation firewalls, which can identify potential attack traffic. Configuration Data: One key area to mine for indicators is the configuration data from your devices. It enables you to look for very specific files and/or configurations that have been identified as indicators of compromise. Identity: Similarly information about logins, authentication failures, and other identity-related data is useful for matching against attack profiles from third-party threat intelligence providers. NetFlow: This is another data type commonly used in SIEM environments; it provides information on protocols, sources, and destinations for network traffic as it traverses devices. NetFlow records are similar to firewall logs but far smaller, making them more useful for high-speed networks. Flows can identify lateral movement by attackers, as well as large exfiltration file transfers. Network Packet Capture: The next frontier for security data collection is actually to capture all network traffic on key segments. Forensics folks have been doing this for years during investigations, but proactive continuous full packet capture – for the inevitable incident responses which haven’t even started yet – is still an early market. For more detail on how full packet capture impacts security operations check out our Network Security Analytics research. Application/Database Logs: Application and database logs are generally less relevant, unless they come from standard applications or components likely to be specifically targeted by attackers. But you might be able to discover unusual application and/or database transactions – which might represent bulk data removal, injection attempts, or efforts to attack your critical data. Vulnerability Scans: This is another information source with limited value, detailing which devices are vulnerable to specific attacks. They help eliminate devices from your search criteria to streamline search activities. Of course this isn’t an exhaustive list, and you are likely already capturing much of this data. That’s a good thing, but capturing and analyzing data within the context of a compliance audit is fundamentally different than trying to detect advanced attacker activity. We are sticking to the CISO view for this series so we won’t dig into the technical nuances of the collection infrastructure. But they must be built on a strong analytical foundation which provides a threat-centric view of the world rather than one a focused on compliance reporting. More advanced organizations may already have a Security Operations Center (SOC) leveraging a SIEM platform for more security-oriented correlation and forensics to pinpoint and investigate attacks. That’s a start, but you will likely require some kind of Big Data thing, which should be clear after we discuss what we need this detection platform to do. Attack Patterns FTW As much as we have talked about the futility of blocking every advanced attack, that doesn’t mean we shouldn’t learn from both the past and the misfortune of others. We spent a time early in this process on sizing up the adversary for some insight into what is likely to be attacked, and perhaps even how. That enables you to look for those attack patterns within your security data – the promise of SIEM technology for years. The ultimate disconnect with SIEM was the hard truth that you needed to know what you were looking for. Far too many vendors forgot to mention that little requirement when selling you a bill of goods. Perhaps they expected attackers to post their plans on Facebook or something? But once you do the work to model the likely attacks on your key information, and then enumerate those attack patterns in your tool, you can get tremendous value. Just don’t expect it to be fully automated. The best case is that you receive an alert about a very likely attack because it’s something you were looking for. But the quickest way to get killed is to plan for the best case. So we also need to ensure we are ready for the worst case. That is advanced attackers using attacks you haven’t seen before, in ways you don’t expect. That’s when

Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.