We have discussed why continuous security monitoring is important, how we define CSM, and finally how you should be classifying your assets to figure out the most appropriate levels of monitoring. Now let’s dig into the problems you are trying to solve with CSM. At the highest level we generally see three discrete use cases:
- Attacks: This is how you use security monitoring to identify a potential attack and/or compromise of your systems. This is the general concept we have described in our monitoring-centric research for years.
- Change: An operations-centric use case is to monitor for changes, both to detect unplanned (possibly malicious) changes, and to verify that planned changes complete successfully.
- Compliance: Finally, there is the check the box use case, where a mandate or guidance requires monitoring and/or scanning technology; less sophisticated organizations have no choice but to do something. But keep in mind the mandated product of this initiative is documentation that you are doing something – not necessarily an improved security posture, identification of security issues, or confirmation of activity.
In this post and the next we will dig into these use cases, describe the data sources applicable to each, and deal with the nuances of making CSM work to solve each problem. Before we dig in we need to make a general comment about these use cases. Notice that they are listed from broadest and most challenging, to narrowest and most limited. The attack use case is bigger, broader, and more difficult than change management; compliance is the least sophisticated. Obviously you can define more granular use cases, but these three cover most of what people expect from security monitoring. So if we missed something we are confident you will let us know in the comments.
This is a reversal of the order in which most organizations adopt security technologies, and correlates to security program sophistication. Many start with a demand to achieve compliance, then grow an internal control process to deal with changes — typically internal — and finally are ready to address potential attacks, which entails changes to devices posture. Of course the path to security varies widely — many organizations jump right to the attack use case, especially those under immediate or perpetual attack.
We made a specific decision to address the broadest use case first — largely because even if you are not yet looking for attacks, you will need to soon enough. So we might as well lay out the entire process, and then show how you can streamline your implementation for the other use cases.
The Attack Use Case
As we start with how you can use CSM to detect attacks, let’s begin with the NIST’s official definition of Continuous Security Monitoring:
Information security continuous* monitoring (ISCM) is maintaining ongoing* awareness of information security, vulnerabilities, and threats to support organizational risk management decisions.
*The terms “continuous” and “ongoing” in this context mean that security controls and organizational risks are assessed, analyzed and reported at a frequency sufficient to support risk-based security decisions as needed to adequately protect organization information. Data collection, no matter how frequent, is performed at discrete intervals. NIST 800-137 (PDF)
Wait, what? So to NIST ‘continuous’ doesn’t actually mean continuous, but instead a “frequency … needed to adequately protect organization information.” Basically, your monitoring strategy should as continuous as it needs to be. A bit like the fact that advanced attackers are only as advanced as they need to be. We like this clarification, which reflects the fact that some assets need to be monitored at all times, and others not so much. But let’s be a bit more specific about what you are trying to identify in this use case:
- Determine vulnerable (and exploitable) devices
- Prioritize remediating those devices based on which have the most risk of compromise
- Identify malware in your environment
- Detect intrusion attempts at all levels of your environment
- Gain awareness and track adversaries in your midst
- Detect exfiltration of sensitive data
- Identify the extent of any active compromise and provide information useful in clean-up
- Verify clean-up and elimination of the threat
To address this laundry list of goals, you need the following data sources:
- Assets: As we discussed in classification, you cannot monitor what you don’t know about; without knowing how critical an asset is you cannot choose the most appropriate way to monitor it. As we described in our Vulnerability Management Evolution research, this requires an ongoing (and dare we say “continuous”) discovery capability to detect new devices appearing on your network, and then a mechanism for profiling and classifying them.
- Network Topology/Telemetry: Next you need to understand the network layout, specifically where critical assets reside. Assets which are accessible to attackers are of course higher priority than inaccessible assets, so it is quite possible to have a device which is technically vulnerable and contains critical data, but is less important than a less-valuable asset which is clearly in harm’s way.
- Events/Logs: Any technological device generates log and event data. This includes security gear, network infrastructure, identity sources, data center servers, and applications, among others. Patterns in the log may indicate attacks if you know how to look; logs also offer substantiation and forensic evidence after an attack.
- Configurations: Configuration details and unauthorized configuration changes may also indicate attacks. Malware generally needs to change device configuration to cause its desired behavior.
- Vulnerabilities: Known vulnerabilities provide another perspective on device vulnerability, can be attacked by exploits in the wild.
- Device Forensics: An advanced data source would the very detailed information (including memory, disk images, etc.) of what’s happening on each monitored device to identify indicators of compromise and facilitate investigation of potential compromise. But this kind of information can be invaluable to confirm compromise.
- Network Forensics: Capturing the full packet stream enables replay of traffic into and out of devices. This is very useful for identifying attack patterns, and also for forensics after an attack.
That is a broad list of data, but — depending on the sophistication of your CSM process — you may not need all these sources. More data is better than less data, but everyone needs to strike a balance between capturing everything and only aggregating what’s immediately useful. You do not get a second chance to capture data, but resource realities have a strong influence on the scope of collection efforts.
Getting the Data
So how do you collect all this data on an ongoing basis and at what frequency? It’s time to get back to those asset classifications you decided on earlier. For critical assets gather as much data as possible. That likely means some kind of agentry on the devices to gather data at all times, and send it back to the CSM aggregation point for pseudo-real-time analysis. We say ‘pseudo-real-time’ because due to the nature of monitoring and the laws of physics, there is always lag between when something happens on the device and when it can be analyzed by the CSM system.
For devices which do not quite require always-on monitoring or forensics, you need to determine the frequency of scanning for vulnerabilities, file integrity monitoring, and/or configuration change monitoring. Depending on criticality you might want to scan daily or weekly. You also need to determine whether you need a credentialed scan to collect far more granular information, or if an uncredentialed scan will suffice. Of course today’s malware spreads almost instantaneously, so if you don’t catch a configuration change or another indicator of attack for a week, who knows how much damage will happen before you notice? This is why classification is so important – an attacker may start (and gain presence) by compromising an infrequently scanned device, but at some point the attacker will need to go after a critical devices that you should be monitoring continuously.
Another important aspect of data collection is automation. The only way to scale any kind of monitoring initiative is to have the data sent to the aggregation platform without human intervention, both from an efficiency and accuracy standpoint. One aspect of the ‘continuous’ monitoring approach espoused by the US government is moving away from an accreditation activity every couple years, instead looking to monitor devices more often. Seems obvious, and it is. In today’s environment, it’s about shortening the window between compromise and detection. To be clear, there is a place for third-party assessment to confirm the controls, but operationally automation for data collection is essential.
We should also point out the blurry line between monitoring and defense, particularly for critical devices which are monitored continuously. Monitoring is a passive alerting function compared to prevention – actively blocking attacks. The nuances of what to block and what to only monitor, as well as how to avoid false positives and negatives, are both beyond the scope of this research. But we need to mention that all things being equal, if you can identify a clear attack on a critical device, you should position yourself to prevent it.
Quantifying CSM Risk
A key aspect of prioritizing which devices require remediation is understanding the risk they pose to your organization. You start the process by classifying the asset; and take it to the next level by assessing the risk of the device being compromised. We have never beens fan of risk scoring — it can be far too subjective, and the algorithms tend to capture risk of compromise rather than true organizational or economic risk. Once again, NIST offers useful perspective:
True risk scoring can be difficult to achieve using the NIST SP 800-37 revision 1 definition, and many “risk scoring” methodologies do not demonstrate a correlation to true risk measurement. Instead, they usually measure the state of a collection of security capabilities or controls. NIST IR 7756, p26 (PDF)
Of course many folks (including our friends) spend a significant portion of their careers trying to quantify risk, and we applaud those efforts. We are sure they will love our risk quantification shortcuts (we are braced for flames in the comments). But for our definition of CSM it is not clear we need to truly quantify risk – mostly we need to assess relative risk to support decision-making. Issues with critical assets need higher priority. You can prioritize further based on whether the device can be accessed via external or only internal attackers. Inaccessible device are lower priority. Then you need to define a coarse hierarchy of potential exposures. It might look like this:
- Device exhibiting anomalous behavior and/or indicators of compromise
- Vulnerable device/exploit code in the wild
- Configuration issue which could result in full compromise
- Vulnerable device/non-weaponized attack
- Everything else
Obviously compromised devices need to be addressed ASAP. Next on the list would be a device vulnerable to an exploit which has been seen in the wild. You may not have seen an active attack yet, but you will. Attackers are reliable that way, once weaponized code is available. Next you would address configuration errors which could result in a compromised device. Finally you would deal with standard vulnerabilities as part of the normal patch/maintenance cycle. This list is not meant to be comprehensive — just to illustrate that you don’t need a very complicated algorithm to determine security risk in your environment to drive remediation decisions.
We aren’t trying to minimize the work required to aggregate all this data and define the rules to determine what is an attack, vs. an exploitable vulnerability, vs. a problematic configuration issue. It is decidedly non-trivial, but keep your eye on the prize. Today these data sources are aggregated and analyzed within separate management systems. Forcing decisions based on disparate information sources using inconsistent metrics and reporting environments, making it difficult to make decisions. The compelling value of CSM is in integrating all these disparate data sources into a common platform, for more effective decision support.
Our next post will simplify things a bit (we hope); we will talk about the change and compliance use cases.