During speaking gigs we ask how many in the audience actually get through their to-do list every day. Usually we get one or two jokers in the crowd between jobs, or maybe just trying to troll us a bit. But nobody in a security operational role gets everything done every day. So the critical success factor is to make sure you are getting the right things done, and not burning time on activities that don’t reduce risk or contain attack damage.
Underpinning this series is the fact that prevention inevitably fails at some point. Along with a renewed focus on network-based detection, that means your monitoring systems will detect a bunch of things. But which alerts are important? Which represent active adversary activity? Which are just noise and need to be ignored? Figuring out which is which is where you need the most help.
To use a physical security analogy, a security fence will alert regularly. But you need to figure out whether it’s a confused squirrel, a wayward bird, a kid on a dare, or the offensive maneuver of an adversary. Just looking at the alert won’t tell you much. But if you add other details and additional context into your analysis, you can figure out which is which. The stakes are pretty high for getting this right, as the postmortems of many recent high-profile breaches indicated alerts did fire – in some cases multiple times from multiple systems – but the organizations failed to take action… and suffered the consequences.
Our last post listed network telemetry you could look for to indicate potential malicious activity. Let’s say you like the approach laid out in that post and decide to implement it in your own monitoring systems. So you flip the switch and the alerts come streaming in. Now comes the art: separating signal from noise and narrowing your focus to the alerts that matter and demand immediate attention. You do this by adding context to general network telemetry and then using an analytics engine to crunch the numbers.
To add context you can leverage both internal and external information. At this point we’ll focus on internal data, because you already have that and can implement it right away. Our next post will tackle external data, typically accessible via a threat intelligence feed.
Device Behavior
You start by figuring out what’s important – not all devices are created equal. Some store very important data. Some are issued to employees with access to important data, typically executives. But not all devices present a direct risk to your organization, so categorizing them provides the first filter for prioritization. You can use the following hierarchy to kickstart your efforts:
- Critical devices: Devices with access to protected information and/or particularly valuable intellectual property should bubble to the top. Fast. If a device on a protected and segmented network shows indications of compromise, that’s bad and needs to be dealt with immediately. Even if the device is dormant, traffic on a protected network that looks like command and control constitutes smoke, and you need to act quickly to ensure any fire doesn’t spread. Or enjoy your disclosure activities…
- Active malicious devices: If you see device behavior which indicates an active attack (perhaps reconnaissance, moving laterally within the environment, blasting bits at internal resources, or exfiltrating data), that’s your next order of business. Even if the device isn’t considered critical, if you don’t deal with it promptly the compromise might find an exploitable hole to a higher-value device and move laterally within the organization. So investigate and remediate these devices next.
- Dormant devices: These devices at some point showed behavior consistent with command and control traffic (typically staying in communication with a C&C network), but aren’t doing anything malicious at the moment. Given the number of other fires raging in your environment, you may not have time to remediate these dormant devices immediately.
These priorities are fairly coarse but should be sufficient. You don’t want a complicated multi-tier rating system which is too involved to use on a daily basis. Priorities should be clear. If you have a critical device that is showing malicious activity, that’s a red alert. Critical devices that throw alerts need to be investigated next, and then non-critical devices showing malicious activity. Finally, after you have all the other stuff done, you can get around to dealing with devices you’re pretty sure are compromised. Of course this last bucket might show malicious activity at any time, so you still need to watch it. The question is when you remediate.
This categorization helps, but within each bucket you likely have multiple devices. So you still need additional information and context to make decisions.
Who and Where
Not all employees are created equal either. Another source of context is user identity, and there are a bunch of groups you need to pay attention to. The first is people with elevated privileges, such as administrators and others with entitlements to manage devices that hold critical information. They can add, delete, delete, change accounts and access rules on the servers, and manipulate data. They have access to tamper with logs, and basically can wreck an environment from the inside. There are plenty of examples of rogue or disgruntled administrators making a real mess, so when you see an administrator’s device behaving strangely, that should bubble up to the top of your list.
The next group of folks to watch closely are executives with access to financials, company strategy, and other key intellectual property. These users are attacked most frequently via phishing and other social engineering, so they need to be watched closely – even trained, they aren’t perfect. This may trigger organizational backlash – some executives get cranky when they are monitored. But that’s not your problem, and without this kind of context it’s hard to do your job. So dig in and make your case to the executives for why it’s important. As you look for indicators that devices are connecting to a C&C server or performing reconnaissance, you are protecting the organization, and executives should know better than to fight that.
The location of your critical data also provides context for priorities. Critical data lives on particular network segments, typically in the data center, so you should be making sure those networks are monitored. But it’s not just PII you need to worry about. Your organization should isolate segments for labs doing cutting-edge R&D, finance networks with preliminary numbers from last quarter, and anything else needing special caution. Isolation is your friend – use different segments, at least logically, to minimize data intermingling.
You can get contextual information from a variety of sources, which you likely already use. For instance identity information (such as Active Directory users and groups) enables you to map a devices to a user and/or group. Then you can profile typical finance department activity and know it’s different than how marketing and engineering groups communicate with each other and the broader Internet. You could go deeper and profile specific people.
Additionally, network topology information is important in attack path analysis to understand the blast radius of any specific attack. That’s a fancy term for damage assessment in case a device or network is compromised: what else would be directly exposed? Once you figure out which other devices on the network can be reached from the compromised device (during lateral movement), and what potential attacks would succeed, you can use this information to further prioritize your activities.
Content
The next area to mine for context is content – as you might guess, not all data is created equal either. You’ll need to be able to analyze the content stream within network traffic to look for protected data, or data identified as critical intellectual property. This rough data classification can be very resource-intensive and hard to keep current (ask anyone trying to implement DLP), so make it as simple as possible. For instance private personal information (PPI) may be the most important data to protect in your environment. But intellectual property is the lifeblood of most non-medical high-tech organizations, and thus typically their top priority. It doesn’t really matter what is at the top, so long as it reflects your organization’s priorities.
Compliance remains a factor for many organizations, so potential compliance violations bubble up when figuring out priorities.
The importance of various specific types of content depends on the organization, and you need to do the work to understand how they need to be protected and monitored. That will entail building consensus with executives, because you need clear marching orders for what alerts need to be validated and investigated first.
Math
Armed with network data identifying indicators and additional context such as identity, location, and content, now you need to figure out what is at greatest risk and react accordingly. This involves crunching numbers and identifying out the highest priority alert. You are looking to:
- Get a verdict on a device and/or a network: whether it has been compromised and to what degree.
- Dig deeper into the attack to figure out the extent of the damage and how far it has spread.
This requires math. We aren’t being flippant (okay, maybe a little), but this type of analysis requires fairly sophisticated algorithms to establish a general risk assessment. You will hear a lot of noise about “risk scoring” as you dig into the current state of network-based detection. Coming up with a quantified risk score can be pretty arbitrary, so it’s good to understand how the score is calculated and where the numbers come from. Make sure your numbers pass the sniff test and you can defend where they come from, because they will be used to make decisions.
As discussed above, your organization will have its own ideas about what’s important and different risk tolerances than other organizations. So you should be able to tune algorithms and weight factors differently to get more meaningful alerts. Your environment is not static – it will change constantly, which means you need to tune your alerting systems on an ongoing basis. Sorry, but there is not much set it and forget it any more. We recommend that you include include a feedback loop in your security alerting process. Assess the value of your alerts, identify gaps, and then tune further based on what is really happening in the field.
Once you have a score you need to operationalize the detection process. That entails figuring out how you will visualize the data and integrate it into your security operational processes. Our next post will discuss getting context from external data/threat intelligence sources, and then using it to help you remediate attacks completely and efficiently.
Comments