Now let’s dig into some key EDR technologies which appear across all the use cases: detection, response, and hunting.
The agent is deployed to each monitored endpoint, so you be sensitive to its size and its performance hit on devices. A main complaint regarding older endpoint protection was performance impact on devices. The smaller the better, and the less performance impact the better (duh!), but just as important is agent deployability and maintainability.
- Full capture versus metadata: There are differing strong opinions on how much telemetry to capture and store from each device. Similar to the question of whether to do full network packet capture or to derive metadata from the packet stream, there is a level of granularity available with a full endpoint capture which isn’t available via metadata, for session reconstruction and more detail about what an adversary actually did. But full capture is very resource and storage intensive, so depending on the sophistication of your response team and process, metadata may be your best option. Also consider products that can gather more device telemetry when triggered, perhaps by an alert or connection to a suspicious network.
- Offline collection: Mobile endpoints are not always on the network, so agents much be able to continue collecting event data when disconnected. Once back on the network, cached endpoint telemetry should be uploaded to the central repository, which can then perform aggregate analysis.
- Multi-platform support: It’s a multi-platform world, and your endpoint security strategy needs to factor in not just Windows devices, but also Macs and Linux. Even if these platforms aren’t targeted they could be used in sophisticated operations as repositories, staging grounds, and file stores. Different operating systems offer different levels of telemetry access. Security vendors have less access to the kernel on both Mac and Linux systems than on Windows. Also dig into how vendors leverage built-in operating system services to provide sufficiently granular data for analysis. Finally, mobile devices access and store critical enterprise data, although their vulnerability is still subject to debate. We do not consider mobile devices as part of these selection criteria, although for many organizations an integrated capability is an advantage.
- Kernel vs. user space: There is a semi-religious battle over whether a detection agent needs to live at the kernel level (with all the potential device instability risks that entails), or accurate detection can take place exclusively at the kernel level. Any agent must be able to detect attacks at lower levels of the operating system – such as root kits – as well as any attempts at agent tampering (again, likely outside user space). Again, we don’t get religious, and we appreciate that user-space agents are less disruptive, but are not willing to compromise on detecting all attacks.
- Tamper proof: Speaking of tampering, to address another long standing issue with traditional EPP, you’ll want to dig into the product security of any agent you install on any endpoint in your environment. We can still remember the awesome Black Hat talks where EPP agent after EPP agent was shown to be more vulnerable than some enterprise applications. Let’s learn from those mistakes and dig into the security and resilience of the detection agents to make sure you aren’t just adding attack surface.
- Scalability: Finally, scale is a key consideration for any enterprise. You might have 1,000 or 100,000 devices, or even more; but regardless you need to ensure the tool will work for the number of endpoints you need to support, and the staff on your team – both in terms of numbers and sophistication. Of course you need to handle deployment and management of agents, but don’t forget the scalability and responsiveness of analysis and searching.
Machine learning is a catch-all term which endpoint detection/response vendors use for sophisticated mathematical analysis across a large dataset to generate models, intended to detect malicious device activity. Many aspects of advanced mathematics are directly relevant to detection and response.
- Static file analysis: With upwards of a billion malicious file samples in circulation, mathematical malware analysis can pinpoint commonalities across malicious files. With a model of what malware looks like, detection offerings can then search for these attributes to identify ‘new’ malware. False positives are always a concern with static analysis, so part of diligence is ensuring the models are tested constantly, and static analysis should only be one part of malware detection.
- Behavioral profiles: Similarly, behaviors of malware can be analyzed and profiled using machine learning. Malware profiling produces a dynamic model which can be used to look for malicious behavior.
Those are the main use cases for machine learning in malware detection, but there are a number of considerations when evaluating machine learning approaches, including:
- Targeted attacks: With an increasing amount of attacks specifically targeting individual organizations, it is again important to distinguish delivery from compromise. Targeted attacks use custom (and personalized) methods to deliver attacks – which may or may not involve custom malware – but once the attacker has access to a device they use similar tactics to a traditional malware attack, so machine learning models don’t necessarily need to do anything unusual to deal with targeted attacks.
- Cloud analytics: The analytics required to develop malware machine learning models are very computationally intensive. Cloud computing is the most flexible way to access that kind of compute power, so it makes sense that most vendors perform their number crunching and modeling in the cloud. Of course the models must be able to run on endpoints to detect malicious activity, so they are typically built in the cloud and executed locally on every endpoint. But don’t get distracted with where computation happens, so long as performance and accuracy are acceptable.
- Sample sizes: Some vendors claim that their intel is better than another company’s. That’s a hard claim to prove, but sample sizes matter. Looking at a billion malware samples is better than looking at 10,000. Is there a difference between looking at a hundred million samples and at a billion? That’s less clear, and a larger sample size can easily produce an unacceptable number of false positives. Evaluation of these approaches needs to focus on actual effectiveness, not just sample size.
- Positive and negative training: The thing about machine learning is that you can profile anything. It doesn’t need to be just malicious code, and in fact behavioral application profiles are built by analyzing legitimate behaviors. The models should use both positive (normal) and negative (‘bad’) attributes and behaviors to improve accuracy.
- Malware samples: Another area of consideration is where vendors get malware samples. There will always be a base of samples assembled and shared among vendors. But relative effectiveness is determined in the margins. Where are the vendors getting unique samples? How do they age out samples? Do they optimize their products to catch their test samples? We know that’s cynical, but experience has shown it’s important to ask.
- Retraining: Understanding how often models change also helps understand machine learning approaches. Are vendors updating their models weekly or even daily, and then using that to improve detections? Is it a monthly thing? Annual? There is o single right or wrong answer (it’s all about effectiveness), but understanding the vendor’s mindset provides perspective on how they believe malware works and what they consider the most effective detection methods.
There is this thing called the cloud – you might have heard of it. Of course that’s facetious but the fact is that every endpoint security vendor needs the cloud to keep pace with attackers. There are a couple areas to dig into regarding how they leverage the cloud:
- Signatures & rules: It’s not possible to keep all file hashes and attack indicators on every device to detect file-based attacks, so each vendor typically sends a small subset of rules to the agent, and if a file or profile isn’t known, the can send it up to the cloud for analysis, receiving in turn a verdict on whether it’s malicious.
- Machine learning: Some vendors have built their own internal data lakes on server clusters, and perform analytics on their own hardware to support machine learning. Other depend on cloud computing providers. There isn’t a single right answer, but it’s hard to see how it makes economic sense for a vendor to maintain an analytics cluster in their own data center over the long term. This is about the future, and how the vendor plans to scale its analytics capability, because the only thing you can be sure of is that there will be more security data to analyze tomorrow than today.
- Cloud-based management: At this point any vendor should provide an option to manage endpoint agents, define policies, and investigate attacks via a cloud-based console and service. Given the remote nature of many endpoints and the fact that keeping devices up to date is a critical aspect of endpoint defense, a cloud console has become table stakes. This also means you won’t need to stand up a management server to deploy and manage endpoint agents. You can and should expect any vendor to distribute updates to agents automatically via their cloud service, with the ability to vet updates and determine exactly when they will be deployed.
As critical as the cloud is to scaling endpoint security and to keeping pace with attackers, there are some cloud security aspects you should review for each vendor.
- Authentication: Every vendor should support multi-factor authentication to access the console. It seems obvious but ask anyway.
- Data security: You will have proprietary data in their service – at least a list of employees – so you need to find out how they will protect your data. Figure out what kind of encryption they use and whether it’s built into the back end of the technology stack or just network-layer encryption.
- Data privacy: Hand in hand with encryption is the question of how the vendor supports multi-tenancy. Make sure you understand how they keep your data isolated from all their other customers. And no, providing a logon and password for each customer account is not a great answer.
- Penetration testing: Make sure they aren’t just breathing their own exhaust about the security of their environment, and they actually have professional attackers trying to break in. They are security folks – they should know all about red teams, eat their own dog food, and try to break their applications. If they don’t have an internal red team tasked with breaking their own environment, as well as a team of hunters making sure adversaries haven’t compromised their systems (with your data in them!), they are doing it wrong.
- Data migration: Finally, selecting an endpoint protection product is not a lifelong commitment. Understand how you can get your data out and remove any copies they have, in case you decide to give them the boot and pick someone else. There is also a psychological value to making sure the vendor knows they have to keep proving their value, or you’ll kick them out, so be sure to harp on this one a bit.
Next we will cover the top 10 questions to discuss with potential vendors. It’s as much a review as a comprehensive list, but after getting a brain dump about detection, response, and hunting, we figure it’s worth revisiting the high points.