Endpoint Advanced Protection Buyer’s Guide: Key Capabilities for DetectionBy Mike Rothman
As we resume posting Endpoint Detection and Response (D/R) selection criteria, let’s start with a focus on the Detection use case.
Before we get too far into capabilities, we should clear up some semantics about the word ‘detection’. Referring back to our timeline in Prevention Selection Criteria, detection takes place during execution. You could make the case that detection of malicious activity is what triggers blocking, and so a pre-requisite to attack prevention – without detection, how could you know what to prevent?. But that’s too confusing. For simplicity let’s just say prevention means blocking an attack before it compromises a device, and can happen both prior to and during execution. Detection happens during and after execution, and implies a device was compromised because the attack was not prevented.
Modern detection requires significant analysis across a wide variety of telemetry sources from endpoints. Once telemetry is captured, a baseline of normal endpoint activity is established and used to look for anomalous behavior.
Given the data-centric nature of endpoint detection, an advanced endpoint detection offering should aggregate and analyze the following types of data:
Endpoint logs: Endpoints can generate a huge number of log entries, and an obvious reaction is to restrict the amount of log data ingested, but we recommend collecting as much log data from endpoint as possible. The more granular the better, given the sophistication of attackers and their ability to target anything on a device. If you do not collect the data on the endpoint, there is no way to get it when you need it to investigate later. Optimally, endpoint agents collect operating system activity alongside all available application logs. This includes identity activity such as new account creation and privilege escalation, process launching, and file system activity (key for detection ransomware). There is some nuance to how long you retain collected data because it can be voluminous and compute-intensive to process and analyze – both on devices and centrally.
Processes: One of the more reliable ways to detect malicious behavior is by which OS processes are started and where they are launched from. This is especially critical when detecting scripting attacks because attackers love using legitimate system processes to launch malicious child processes.
Network traffic: A compromised endpoint will inevitably connect to a command and control network for instructions and to download additional attack code. These actions can be detected by monitoring the endpoint’s network stack. An agent can also watch for connections to known malicious sites and for reconnaisance activity on the local network.
Memory: Modern file-less attacks don’t store any malicious code in the file system, so modern advanced detection requires monitoring and analyzing activity within endpoint memory.
Registry: As with memory-based attacks, attackers frequently store malicious code within the device registry to evade file system detection. So advanced detection agents need to monitor and analyze registry activity for signs of misuse.
Configuration changes: It’s hard for attackers to totally obfuscate what is happening on an endpoint, so on-device configuration changes can indicate an attack.
File integrity: Another long-standing method attack detection is monitoring changes to system files, because changes to such files outside administrative patching usually indicates something malicious. An advanced endpoint agent should collect this data and monitor for modified system files.
As mentioned above, traditional endpoint detection relied heavily on simple file hashes and behavioral indicators. With today’s more sophisticated attacks, a more robust and scientific approach is required to distinguish legitimate from malicious activity. This more scientific approach is centered around machine learning techniques (advanced mathematics) to recognize the activity of adversaries before and during attacks. Modern detection products use huge amounts of endpoint telemetry (terabytes) to train mathematical models to detect anomalous activity and find commonalities between how attackers behave. These models then generate an attack score to prioritize alerts.
Profiling applications: Detecting application misuse is predicated on understanding legitimate usage of the application, so the mathematical models analyze both legitimate and malicious usage of frequently targeted applications (browsers, office productivity suites, email clients, etc.). This is a similar approach to attack prevention, discussed in our Prevention Selection Criteria guide.
Anomaly detection: With profiles in hand and a consistent stream of endpoint telemetry to analyze, the mathematical models attempt to identify abnormal device activity. When suspicion is high they trigger an alert, the device is marked suspicious, and an analyst determines whether the alert is legitimate.
Tuning: To avoid wasting too much time on false positives, the detection function needs to constantly learn what is really an attack and what isn’t, based on the results of detection in your environment. In terms of process, you’ll want to ensure your feedback is captured by your detection offering, and used to constantly improve your models to keep detection precise and current.
Risk scoring: We aren’t big fans of arbitrary risk scoring because the underlying math can be suspect. That said, there is a role for risk scoring in endpoint attack detection: prioritization. With dozens of alerts hitting daily – perhaps significantly more – it is important to weigh which alerts warrant immediate investigation, and a risk score should be able to tell you. Be sure to investigate the underlying scoring methodology, track scoring accuracy, and tune scoring to your environment.
Data management: Given the analytics-centric nature of EDR, being able to handle and analyze large amounts of endpoint telemetry collected from endpoints is critical. Inevitably you’ll run into the big questions: where to store all the data, how to scale analytics to tens or hundreds of thousands of endpoints, and how to economically analyze all your security data. But ultimately your technology decision comes down to a few factors:
Cost: Whether or not the cost of storage and analytics is included in the service (some vendors store all telemetry in a cloud instance) or you need to provision a compute cluster in your data center to perform the analysis, there is a cost to crunching all the numbers. Make sure hardware, storage, and networking costs (including management) are all included in your analysis. You should perform an apples-to-apples comparison between options, whether they entail building or buying an analytics capability. And think about scaling, for both on-premise and cloud options, because you might decide to collect much more data in the future, and don’t want to be prevented by a huge upcharge.
Performance: Based on your data volumes, both current and projected, how will the system perform? Various analytical techniques scale differently, so dig in a bit with vendors to understand how the performance of their system will be impacted if you significantly add a bunch more endpoints, or decide to analyze a lot more endpoint data sources.
To keep up with modern advanced attackers, you need to learn from other attacks in the wild. That’s where Threat Intelligence comes into play, so any endpoint detection solution should have access to timely and robust threat intel. That can be directly from your endpoint detection vendor or a third party (or both), but you must be able to look for signs of attacks you haven’t suffered yet.
Broader indicators: Traditional endpoint protection relied mostly on file hashes to detect malware. When file hashes ceased to be effective, behavioral indicators were added to look for patterns associated with malicious activity. Advanced detection analysis requires an ever-expanding range of inputs – including memory, registry, and scripting attacks.
Campaign visibility: It’s not enough to detect attacks on a single endpoint – current adversaries leverage many devices to achieve their mission. Make sure your threat intelligence isn’t just indicators to look for on a single endpoint – it should reflect activity patterns across multiple devices, as observed during a more sophisticated campaign.
Network activity: Another aspect of modern detection entails malicious usage of legitimate applications and authorized system functions. At some point during the attack campaign a compromised device will need to communicate with either a command and control network, other devices on the local network, or likely both. That means you need to monitor endpoint network activity for patterns of suspicious activity and connections to known-bad networks.
Shared intelligence: Given the additional context threat intelligence can provide for endpoint detection, leveraging intelligence from a number of organizations can substantially enhance detection. Securely sharing intel bidirectionally among a community of similar organizations can be a good way to magnify the value of external threat data.
As we mentioned, advanced attackers rarely complete an attack using a single device. They typically orchestrate a multi-faceted attack involving multiple tactics across many devices to achieve their mission. This means you cannot understand an adversary’s objective or tactics if your detection mechanisms and analytic perspective are limited to a single device. So aggregating telemetry across devices and looking for signs of a coordinated attack (or campaign) is key to advanced detection. To be clear, a campaign always starts with an attack, so looking for malicious activity on a single device is your starting point. But it’s not sufficient for an advanced adversary.
- Timeline visualization: Given the complexity of defender environments and attacker tactics, many security analysts find visualizing an attack to be helpful for understanding a campaign. The ability to see all devices and view attacker activity across the environment, and also to drill down into specific devices for deeper analysis and attack validation, streamlines analysis and understanding of attacks and response planning.
As discussed in our Threat Operations research, we all need to make security analysts as effective and efficient as possible. One way is to eliminate traditional busywork by providing all the relevant information for validation and triage.
Adversary information: Depending on the malware, tactics, and networks detected during an incident, information about potential adversaries can be gathered from threat intelligence sources and presented to the analyst so they have as much context as available about what this attacker tends to do and what they are trying to achieve.
Artifacts: Assembling data related to the attack and the endpoint in question (such as memory dumps, file samples, network packets, etc.) as part of the detection process can save analysts considerable time, providing information they need to immediately drill down into a device once they get the alert.
Organizational history: Attackers don’t use new attacks unless they need to, sos the ability to see whether a particular attack or tactic has been used before against this organization (or is being used currently) provides further context for an analyst to determine the attacker’s intent, and the depth of compromise.
Automating enrichment: A lot of enrichment information can be gathered automatically, so a key capability of a detection platform is its ability to look for attributes (including IP addresses, malware hashes, and botnet address) and populate a case file automatically with supplemental information before it is sent to the analyst.
Leveraging the Cloud
In light of the ongoing cloudification of pretty much everything in technology, it’s no surprise that advanced endpoint detection has also taken advantage. Much of the leverage comes from more effective analysis, both within an organization and across organizations – sharing threat data. There are also advantages to managing thousands of endpoints across locations and geographies via the cloud, but we’ll discuss that later under key technologies. Some considerations (both positive and otherwise) include:
Cloud scale: Depending on the size of an organization, endpoints can generate a tremendous amount of telemetry. Analyzing telemetry on an ongoing basis consumes a lot of storage and compute. The cloud is good at scaling up storage and compute, so it makes sense to shift processing to the cloud when feasible.
Local preprocessing: As good as the cloud is at scaling, some preprocessing can be done on each device, to only send pertinent telemetry up to the cloud. Some vendors send all telemetry to the cloud, and that can work – but there are tradeoffs in terms of performance, latency, and cost. Performing some local analysis on-device enables earlier attack detection.
Data movement: The next cloud consideration is how to most efficiently move all that data up to the cloud. Each endpoint can connect to the cloud service to transmit telemetry, but that may consume too much bandwidth (depending on what is collected) and can be highly inefficient. Alternatively, you can establish an on-premise aggregation, which might perform additional processing (normalization, reduction, compression) before transmission to the cloud. The approaches are not actually mutually exclusive – devices aren’t always on the corporate network to leverage the aggregation point. The key is to consider network consumption when designing the system architecture.
Data security & privacy: Endpoint security analysis in the cloud entails sending device telemetry (at least metadata) to networks and systems not under your control. For diligence you need to understand how your vendor’s analytics infrastructure protects your data. Dig into their multi-tenancy architecture and data protection techniques to understand whether and how other organizations could access your data (even inadvertently). Also be sure to probe about whether and how your data is anonymized for shared threat intelligence. Finally, if you stop working with a vendor, make sure you understand how you can access your data, whether you can port it to another system, how you can and ensure your data is destroyed.
Our next post will dig into response and hunting use cases.