We’ve already discussed the basic features of a SIEM/Log Management platform, including collection, aggregation and normalization, correlation and alerting, reporting and forensics, and deployment architectures. But these posts cover the core functions, and are part of what each products in the space will bring to the table.
As markets evolve and vendors push to further differentiate themselves, more and more capabilities are integrated into the platforms. In the case of SIEM/LM, this means pumping more data into the analysis engine, and making engine smarter. The idea is to make 1+1 produce 5, as multiple data types provide more insight than a single source – that’s the concept, anyway. To be clear, having more data does not make directly the product any better. The only way to really leverage additional data is to build correlation rules and alerts and reports that utilize the extra data.
Let’s take a tour through some of the advanced data types you’ll see integrated into SIEM/LM platforms.
Network flow data is the connection records that stream out of a router or switch. These small and simple data files/streams typically list source, destination, and packet type. Flow data was really the first new data type which, when integrated with event and log data, really made the systems smarter. Flow data allowed the system to establish a baseline and scan for anomalous network traffic as the first indication of a problem.
An entire sub-market of network management – network behavioral analysis – revolves around analyzing and visualizing flow data to understand the traffic dynamics of networks, and pinpointing performance and capacity issues before they impact users. Many of the NBA vendors have been unsuccessfully trying to position their products in the security market; but in combination with events and logs, flow data is very useful.
As an example, consider a typical attack where a web server is compromised and then used as a pivot to further compromise an application server and the backend database server. The data needs to be exfiltrated in some way, so the attackers establish a secure pipe to an external zombie device. But the application server doesn’t typically send data to an external device, so flow data would show an anomalous traffic flow. At that point an administrator could analyze the logs, with correlated activity showing a new account created on the database server, and identifying the breach.
Could an accurate correlation rule have caught the reconnaissance and subsequent compromise of the servers? Maybe. But the network doesn’t lie, and at some point the attackers need to move the data. These types of strange network flows can be a great indicator of a successful attack, but remember strange flows only appear after the attack has occurred. So flow data is really for reacting faster to attacks already underway.
Even more powerful is the ability to set up compound correlation rules, which factor in specific events and flow scenarios. Of course setting up these rules is complicated and they require a lot of tuning, but once the additional data stream is in place, there are many options for how to leverage it.
Everyone wants to feel like more than just a number, but when talking about SIEM/Log Management, your IP address is pretty much who you are. You can detect many problems by just analyzing all traffic indiscriminately, but this tends to generate plenty of false positives. What about the scenario where the privileged user makes a change on a key server? Maybe they used a different device, which had a different IP address. This would show up as an unusual address for that action, and could trigger an alert.
But if the system were able to leverage identity information to know the same privileged user was making the change, all would be well, right? That’s the idea behind identity integration with SIEM/LM. Basically, the analysis engine pulls in directory information from the major directory stores (Active Directory & LDAP) to understand who is in the environment and what groups they belong to, which indicates what access rights they have. Other identity data – such as provisioning and authentication information – can be pulled in to enable advanced analysis, perhaps pinpointing a departed user accessing a key system.
The holy grail of identity integration is user activity monitoring. Yup, Big Brother lives – and he always knows exactly what you are doing. In this scenario you’d be able to set up a baseline for a group of users (such as Accounting Clerks), including which systems they access, who they communicate with, and what they do. There are actually a handful of other attributes that help identify a single user even when using generic service accounts. Then you can look for anomalies, such as an accounting clerk accessing the HR system, making a change on a sensitive server, or even sending data to his/her Gmail account. This isn’t a smoking gun, per se, but it does give administrators a place to look for issues.
Again, additional data types beyond plain event logs can effectively make the system smarter and streamline problem identification.
Database Activity Monitoring
Recently SIEM/LM platforms have been integrating Database Activity Monitoring (DAM), which collects very detailed information about what is happening to critical data stores. As with the flow data discussed above, DAM can serve up activity and audit data for SIEM. These sources not only provide more data, but add additional context for analysis, helping with both correlation and forensic analysis. Securosis has published plenty of information on DAM, which you can check out in our research library.
The purpose of DAM integration is to drive analysis deeper into database transactions, gaining the ability to detect patterns which indicate successful compromise or misuse. As a simple example, if a mobile user gets infected at Starbucks (like that ever happens!) and then unwittingly provides access to the corporate network, the attacker then proceeds to compromise the database.
The DAM device monitors the transactions to and from the database, and should see the attack traffic. At that point the admin must go to another system to figure out the issue with the rogue device. But if all the data is available within a single platform, the admin would be able to instantly know about the compromised device and remediate.
Additionally, powerful correlation rules can be set up to look for account changes and other significant events on certain servers, followed up by a data dump from the database (recorded by DAM), and a bulk file transfer to an external location (detectable in flow data). This is certainly getting closer to a smoking gun, if the attack scenarios are modeled and implemented as rules in the SIEM/LM.
Like DAM, some SIEM/LM platforms are climbing up to the application layer by ingesting application logs, as well as performing simple content analysis on specific application types – typically email or web traffic. Again, baseline models can identify how applications should behave; then alerts can be set up for behavior which is not normal.
The problem with application monitoring is the amount of work required to get it right. Each application works differently in each environment, so significant tuning is required to get the rules right and tighten thresholds enough to provide relevant alerts. With the number of application in a typical organization, getting useful coverage within an environment takes substantial time and resources.
But over time this capability will continue to improve and become much more useful in practice. We expect that to take another 12-18 months.
Another data type useful for SIEM/LM integration is configuration data. This means many different things to different folks, so let’s level set a bit. We are referring to the configuration settings for security, applications, network, and computing devices. Most attacks involve some type of configuration change, whether it’s a service turned on or off, a new user added, a new protocol allowed, or a destination address added. Monitoring for these changes can again provide key hints to an attack in progress.
This integration can happen with the SIEM/LM platform directly collecting and monitoring the configurations (many of the vendors can monitor network and security device configurations) or by taking an alert or log stream from a standalone configuration monitoring product. Either way works, so long as the correlation rules and reports are built to take advantage of the configuration data.
Let’s run through a quick scenario. In the event of an attack on a retailer’s POS system, events would show reconnaissance activity on a store wireless network, which happens frequently and so shouldn’t trigger any alarms. The attacker than breaks the WEP key and gains access to the POS network, then compromising the POS devices, which run on an un-patched version of embedded Windows XP. Yes, I know this organization deserves to be pwned for using WEP and unpatched XP, but indulge me a bit. None of this would necessarily be caught through typical event logs.
But then the attacker enable FTP on the POS terminal, which would change the configuration and be detected by configuration monitoring. So the admin can investigate FTP being active on a POS device, which indicates badness. Combine that with other event and logging activity, and a case for further investigation can be made – which is the point of having the SIEM/LM platform in the first place.
File Integrity Monitoring
The last data type we’ll discuss is file integrity data. This involves monitoring for changes on key system files such as the IP stack. If one of these files changes (and it’s not traceable to a legitimate patching), it usually indicates some kind of unauthorized activity. So this data is similarly useful for helping to narrow the scope of analysis for a security analyst.
If the analyst sees system files changing on a critical server, in combination with strange network flows, other configuration changes, and IDS alerts, that’s a good indication of a successful attack. Remember, it isn’t necessary to find a smoking gun. SIEM/LM is useful if it can make us a bit smarter and enable a security analyst to react faster to an attack.
Direct or Indirect Integration?
One of the ways vendors try to differentiate is by whether their product takes the data in directly and does the analysis within the SIEM/LM platform, or partners with leading vendors of standalone solutions – such as NBA or configuration monitoring. We aren’t religious one way or the other.
There are advantages to direct integration – all the data is in one location, which facilitates forensic investigation; this may also enable more detailed correlation rules and compliance reports. On the other hand, a standalone NBA system is more useful to the network administrator, at the expense of fewer capabilities built into the SIEM. If it’s the network administrator’s budget they will buy NBA, and the security team will get alerts. Either way is fine, since it’s about making the SIEM/LM smarter and focusing investigations.
Additional Data = More Complexity
As we described in the series introduction, making SIEM/LM work is a fairly complicated endeavor, and that’s just dealing with logs and events. When you add a couple or more additional data types, you multiply the number of rules and reports the system can generate. Couple that with enrichment and activity profiles, and you have seriously increased complexity. That can be positive (by supporting broader analysis) as well as negative (because tuning and performance become bigger issues), so be careful what you wish for.
Ultimately, the use cases need to drive the advanced features needed during the procurement process. If you are just trying to meet a compliance automation requirement, then flow data may not be that useful and shouldn’t weigh heavily in the vendor analysis. But if you are trying to gain operational efficiencies, then something like configuration monitoring should allow your analysts to kill both birds with the same platform, so the data type becomes more important.