We now resume our series on Continuous Security Monitoring. We have dug into the Attack Use Case so it’s time to cover the next most popular use case for security monitoring: Change Control. We will keep the same format as before; digging into what you are trying to do, what data is required to do it, and then how this information can and should guide your prioritization of operational activities.

The Change Control Use Case

We briefly described the change control use case as follows:

An operations-centric use case is to monitor for changes, both to detect unplanned (possibly malicious) changes, and to verify that planned changes complete successfully.

There are two aspects, first to determine whether unplanned change indicates an attack – covered in the attack use case. The other aspect is to isolate unplanned any non-malicious changes to figure out why they took occurred outside normal change processes. Finally you need to verify planned and authorized changes to close the operational process loop.

Before we discuss the data sources you need we should mention monitoring frequency. As with the attack use case, the NIST definition – monitor as frequently as you need to – fits here as well. For highly critical devices you want to look for changes continuously, because if the device is attacked or suffers a bad change, the result could be or enable data loss. As we mentioned under the attack use case, automation is critical to maintaining a consistent and accurate monitoring process. Ensure you minimize human effort, increase efficiency, and minimize human error.

Data Sources

To evaluate a specific change you will want to collect the following data sources:

  • Assets: As we discussed in the classification post you cannot monitor what you don’t know about; without knowing how critical an asset is, you cannot choose the most appropriate way to monitor it. This requires an ongoing – dare we say, ‘continuous’ – discovery capability to detect new devices appearing on your network, as well as a mechanism for profiling and classifying them.
  • Work Orders: A key aspect of change control is handling unauthorized and authorized changes differently. To do that you need an idea of which changes are part of a patch, update, or maintenance request. That requires a link to your work management system to learn whether a device was scheduled for work.
  • Patching Process: Sometimes installing security patches is outside the purview of the operations group, and instead something the security function takes care of. Not that we think that’s the right way to run things, but the fact is that not all operational processes are managed in the same system. If different systems are used to manage the work involved in changes and patches, you need visibility into both.
  • Configurations: This use case is all about determining differentials in configurations and software loaded on devices. You need the ability to assess the configuration of devices, and to store a change history so you can review deltas to pinpoint exactly what any specific change did and when. This is critical to determining attack intent.

We have always been fans of more data rather than less, so if you can collect device forensics, more detailed events/logs, and/or network full packet captures, as described in the attack use case – do that. But for the change control use case proper, you don’t generally need that data. It is more useful when trying to determine whether the change is part of an attack.

Decision Flow

Unlike the attack use case, which is less predictable in how you evaluate alerts from the monitoring process, the decision flow for change control is straightforward:

  1. Detect change: Through your security monitoring initiative you will be notified that a change happened on a device you are watching.
  2. Is this change authorized? Next you will want to cross-reference the change against the work management system(s) which manages all the operational changes in your environment. It is important that you be able to link your operational tracking systems with the CSM environment – otherwise you will spend a lot of time tracking down authorized changes. We understand these systems tend to be run by different operational groups, but in order to have a fully functional process those walls need to be broken down.
  3. If authorized, was the change completed successfully? If the change was completed then move on. Nothing else to see here. The hope is that this verification can be done in an automated fashion to ensure you aren’t spending time validating stuff that already happened correctly, so your valuable (and expensive) humans can spend their time dealing with exceptions. If the change wasn’t completed successfully you need to send that information back into the work management system (perhaps some fancy DevOps thing, or your trouble ticket system) to have the work done again.
  4. If not authorized, is it an attack? At this point you need to do a quick triage to figure out whether this is an attack warranting further investigation or escalation, or merely an operational failure. The context is important for determining whether it’s an ongoing attack. We will get into that later.
  5. If it’s an attack investigate: If you determine it’s an attack you need to investigate. We dealt with this process in both the Incident Response Fundamentals and also React Faster and Better.
  6. If it’s not an attack, figure out who screwed up: If you made it to this point the good news is that your unauthorized change is an operational mishap rather than an attack. So you need to figure out why the mistake happened and take corrective measures within the change process to ensure it doesn’t happen again.

Let’s make a further clarification on the distinction between the attack and change control use cases. If you have only implemented the change use case and collected the data appropriate for it, then your visibility into what the malware is doing and how broadly it has spread up to this point will be limited. But that doesn’t mean starting with change control doesn’t provide value for detecting attacks. An alert of an unauthorized change can give you a heads up to an imminent issue.

Taking Action

The entire point of any monitoring initiative is to make better decisions on to what needs to be done and how to allocate resources. First let’s take inprocess attacks off the table; they were covered in the attack use case, and obviously attacks take priority over pretty much everything else. So how do you determine whether it’s an attack? Since you can’t truly understand the intent of an attacker or an insider making changes, we talk about attack surface. Does the change makes the device easier to attack or control? If so it is effectively an attack. Clearly some operational failures result in increased attack surface and should be handled as attacks, even if the actor wasn’t malicious.

This approach takes intent out of the decision to enable a simpler and more objective analysis. An innocent operational failure that increases attack surface isn’t any less of a problem than a malicious action. The device is more exposed than it was before the change and needs to be remediated. That’s why we favor the attack use case as the basis for security monitoring, then simplifying it to deal with change and compliance.

In case of an operational mishap you have a further decision to make: when to roll it back. That depends on the nature of the change, the criticality of the device, and whether the rollback can be automated. For changes that didn’t increase attack surface there is less urgency to roll back, unless the change broke an application or otherwise impacted availability. So operational mishaps can be put back into the stack of work and processed according to the other operational processes managing workflow in the organization.

Our next post will document the Compliance use case before we dig into the specifics of the CSM platform and how to get to true Continuous Security Monitoring.