Endpoint Advanced Protection Buyer’s Guide: Key Capabilities for Response and Hunting
As we resume posting Endpoint Detection and Response (D/R) selection criteria, let’s start with a focus on the Detection use case. Before we get too far into capabilities, we should clear up some semantics about the word ‘detection’. Referring back to our timeline in Prevention Selection Criteria, detection takes place during execution. You could make the case that detection of malicious activity is what triggers blocking, and so a pre-requisite to attack prevention – without detection, how could you know what to prevent?. But that’s too confusing. For simplicity let’s just say prevention means blocking an attack before it compromises a device, and can happen both prior to and during execution. Detection happens during and after execution, and implies a device was compromised because the attack was not prevented. Data Collection Modern detection requires significant analysis across a wide variety of telemetry sources from endpoints. Once telemetry is captured, a baseline of normal endpoint activity is established and used to look for anomalous behavior. Given the data-centric nature of endpoint detection, an advanced endpoint detection offering should aggregate and analyze the following types of data: Endpoint logs: Endpoints can generate a huge number of log entries, and an obvious reaction is to restrict the amount of log data ingested, but we recommend collecting as much log data from endpoint as possible. The more granular the better, given the sophistication of attackers and their ability to target anything on a device. If you do not collect the data on the endpoint, there is no way to get it when you need it to investigate later. Optimally, endpoint agents collect operating system activity alongside all available application logs. This includes identity activity such as new account creation and privilege escalation, process launching, and file system activity (key for detection ransomware). There is some nuance to how long you retain collected data because it can be voluminous and compute-intensive to process and analyze – both on devices and centrally. Processes: One of the more reliable ways to detect malicious behavior is by which OS processes are started and where they are launched from. This is especially critical when detecting scripting attacks because attackers love using legitimate system processes to launch malicious child processes. Network traffic: A compromised endpoint will inevitably connect to a command and control network for instructions and to download additional attack code. These actions can be detected by monitoring the endpoint’s network stack. An agent can also watch for connections to known malicious sites and for reconnaisance activity on the local network. Memory: Modern file-less attacks don’t store any malicious code in the file system, so modern advanced detection requires monitoring and analyzing activity within endpoint memory. Registry: As with memory-based attacks, attackers frequently store malicious code within the device registry to evade file system detection. So advanced detection agents need to monitor and analyze registry activity for signs of misuse. Configuration changes: It’s hard for attackers to totally obfuscate what is happening on an endpoint, so on-device configuration changes can indicate an attack. File integrity: Another long-standing method attack detection is monitoring changes to system files, because changes to such files outside administrative patching usually indicates something malicious. An advanced endpoint agent should collect this data and monitor for modified system files. Analytics As mentioned above, traditional endpoint detection relied heavily on simple file hashes and behavioral indicators. With today’s more sophisticated attacks, a more robust and scientific approach is required to distinguish legitimate from malicious activity. This more scientific approach is centered around machine learning techniques (advanced mathematics) to recognize the activity of adversaries before and during attacks. Modern detection products use huge amounts of endpoint telemetry (terabytes) to train mathematical models to detect anomalous activity and find commonalities between how attackers behave. These models then generate an attack score to prioritize alerts. Profiling applications: Detecting application misuse is predicated on understanding legitimate usage of the application, so the mathematical models analyze both legitimate and malicious usage of frequently targeted applications (browsers, office productivity suites, email clients, etc.). This is a similar approach to attack prevention, discussed in our Prevention Selection Criteria guide. Anomaly detection: With profiles in hand and a consistent stream of endpoint telemetry to analyze, the mathematical models attempt to identify abnormal device activity. When suspicion is high they trigger an alert, the device is marked suspicious, and an analyst determines whether the alert is legitimate. Tuning: To avoid wasting too much time on false positives, the detection function needs to constantly learn what is really an attack and what isn’t, based on the results of detection in your environment. In terms of process, you’ll want to ensure your feedback is captured by your detection offering, and used to constantly improve your models to keep detection precise and current. Risk scoring: We aren’t big fans of arbitrary risk scoring because the underlying math can be suspect. That said, there is a role for risk scoring in endpoint attack detection: prioritization. With dozens of alerts hitting daily – perhaps significantly more – it is important to weigh which alerts warrant immediate investigation, and a risk score should be able to tell you. Be sure to investigate the underlying scoring methodology, track scoring accuracy, and tune scoring to your environment. Data management: Given the analytics-centric nature of EDR, being able to handle and analyze large amounts of endpoint telemetry collected from endpoints is critical. Inevitably you’ll run into the big questions: where to store all the data, how to scale analytics to tens or hundreds of thousands of endpoints, and how to economically analyze all your security data. But ultimately your technology decision comes down to a few factors: Cost: Whether or not the cost of storage and analytics is included in the service (some vendors store all telemetry in a cloud instance) or you need to provision a compute cluster in your data center to perform the analysis, there is a cost to crunching all the numbers. Make sure hardware, storage, and networking costs (including management)