Securosis

Research

Understanding and Selecting SIEM/LM: Data Management

We covered SIEM and Log Management deployment architectures in depth to underscore how different models are used to deal with scalability and data management issues. In some cases these deployment choices are driven by the underlying data handling mechanism within the product. In other words each platform stores and manages data differently – these decisions have significant impact on product scalability, data management, and reporting & forensics capabilities. Here we discuss the different internal data storage models, with advantages and disadvantages of each. Relational Database In the early days of this technology, most SIEM and log management systems were built on relational database engines to store events and log records. In this model the SIEM platform maps data attributes from each data source into database columns, so each event is stored in a single database row. There are numerous advantages to this model, including: Data Validation – As data is inserted into the column, the database verifies data type and range settings. Integrity check failures indicate corrupted files and are omitted from the import, with notification to administrators. Event Consistency – An event from a Cisco router now looks just like an event from a Juniper router, and vice-versa, as events are normalized before being stored in the table. Reporting – Reports are easier to generate from validated data columns, and the database can format data when generating the report. Reports run far faster thanks to column indices, effectively filtering and ordering events. Analytics – An RDBMS facilitates complex queries across all available attributes, inspected content, and correlation. This model for data storage has fallen out of favor due to the overhead of data insertion: as each row is inserted the database must perform the checks and periodically rebuild indices. As daily event volumes scaled from millions to hundreds of millions and billions, this overhead became problematic and resulted in significant scalability issues with SIEM offerings built on RDBMS. Further, data that does not fit into the tables defined in the relational model is typically left out. Unless there is some other method to maintain the fidelity and integrity of the original event records, this is problematic for forensics. This “selective memory” can also result in data accuracy issues, as truncated records may not correlate properly and can hamper analysis. As a result SIEM/LM architectures based on RDBMS are waning, as products in this space re-architect their backend data stores to address these issues. On the other hand, RDBMS storage is not totally dead – some vendors have instead chosen to streamline data insertion, basically by turning off some RDBMS checks and integrity verification. Others use an RDBMS to supplement a flat file architecture (described below), leveraging the advantages above for reporting and forensics. Flat File Flat files, or just ‘files’, are now the most common way to store events for SIEM and Log Management. Files are serve as a blank canvas for the vendor; as they can introduce any structure they choose to help define, format, and delineate events. Anything that helps with correlation and speeds up future searches is included, and each vendor has their own secret sauce for building files. Each file typically contains a day’s events, possibly from a single source, with each event clearly delineated. The files (in some cases each event) can be tagged with additional information – this is called “log enrichment”. These tags offer some of the contextual benefits of a relational database, and help to define attributes. Some even include a control structure similar to VSAM files. The events may be stored in their raw form, or be normalized prior to insertion. Flat files offer several advantages. Performance – Since normalization (to the degree necessary) happens before data insertion, there is very little work to be performed prior to insertion compared to a relational database. Data is stored as quickly as the physical media can handle, and often available immediately for searching and analysis. Flexibility – Stored events are not limited to specific normalized columns as they are in a relational database, but can take any form. Changes to internal file formats are much easier. Search – Searches can be performed without understanding the underlying structures, using simple keyword search. At least one log management vendor provides a Google-style search capability across data files. Alternately, search can rely upon tags and keywords established by the vendor. The flat file tradeoffs are twofold. First, any data management capabilities – such as indexing and data integrity – must be built from scratch by the vendor, since no RDBMS capabilities are provided by the underlying platform. This means the SIEM/LM vendor must provide any needed facilities for data integrity, normalization, filtering, and indexing. Second, there is an efficiency tradeoff. Some vendors tag, index, and normalize prior to insertion; others initially record raw events, later re-reading the data in order to normalize it, and then rewrite the reformatted data. The later method offers faster insertion, at the expense of greater total storage and processing requirements. The good news is that a few years ago most vendors saw the scalability wall of RDBMS approaching, and began investing in their own back-end data management environments. At this point many platforms feature purpose-built high-performance data stores, and we believe this will be the underlying architecture for these products moving forward. Of course, we don’t live in an either/or world, so many of the platforms combine some RDBMS capabilities with flat file aspects. Yes, the answer can be ‘both’. Share:

Share:
Read Post
dinosaur-sidebar

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.