Securosis

Research

Evolving to Security Decision Support: Visibility is Job #1

To demonstrate our mastery of the obvious, it’s not getting easier to detect attacks. Not that it was ever really easy, but at least you used to know what tactics adversaries used, and you had a general idea of where they would end up, because you knew where your important data was, and which (single) type of device normally accessed it: the PC. It’s hard to believe we now long for the days of early PCs and centralized data repositories. But that is not today’s world. You face professional adversaries (and possibly nation-states) who use agile methods to develop and test attacks. They have ways to obfuscate who they are and what they are trying to do, which further complicate detection. They prey on the ever-present gullible employees who click anything to gain a foothold in your environment. Further complicating matters is the inexorable march towards cloud services – which moves unstructured content to cloud storage, outsources back-office functions to a variety of service providers, and moves significant portions of the technology environment into the public cloud. And all these movements are accelerating – seemingly exponentially. There has always been a playbook for dealing with attackers when we knew what they were trying to do. Whether or not you were able to effectively execute on that playbook, the fundamentals were fairly well understood. But as we explained in our Future of Security series, the old ways don’t work any more, which puts practitioners behind the 8-ball. The rules have changed and old security architectures are rapidly becoming obsolete. For instance it’s increasingly difficult to insert inspection bottlenecks into your cloud environment without adversely impacting the efficiency of your technology stack. Moreover, sophisticated adversaries can use exploits which aren’t caught by traditional assessment and detection technologies – even if they don’t need such fancy tricks often. So you need a better way to assess your organization’s security posture, detect attacks, and determine applicable methods to work around and eventually remediate exposures in your environment. As much as the industry whinges about adversary innovation, the security industry has also made progress in improving your ability to assess and detect these attacks. We have written a lot about threat intelligence and security analytics over the past few years. Those are the cornerstone technologies for dealing with modern adversaries’ improved capabilities. But these technologies and capabilities cannot stand alone. Just pumping some threat intel into your SIEM won’t help you understand the context and relevance of the data you have. And performing advanced analytics on the firehose of security data you collect is not enough either, because you might be missing a totally new attack vector. What you need is a better way to assess your organizational security posture, determine when you are under attack, and figure out how to make the pain stop. This requires a combination of technology, process changes, and clear understanding of how your technology infrastructure is evolving toward the cloud. This is no longer just assessment or analytics – you need something bigger and better. It’s what we now call Security Decision Support (SDS). Snazzy, huh? In this blog series, “Evolving to Security Decision Support”, we will delve into these concepts to show how to gain both visibility and context, so you can understand what you have to do and why. Security Decision Support provides a way to prioritize the thousands of things you can do, enabling you to zero in on the few things you must. As with all Securosis’ research developed using our Totally Transparent methodology, we won’t mention specific vendors or products – instead we will focus on architecture and practically useful decision points. But we still need to pay the bills, so we’ll take a moment to thank Tenable, who has agreed to license the paper once it’s complete. Visibility in the Olden Days Securing pretty much anything starts with visibility. You can’t manage what you can’t see – and a zillion other overused adages all illustrate the same point. If you don’t know what’s on your network and where your critical data is, you don’t have much chance of protecting it. In the olden days – you know, way back in the early 2000s – visibility was fairly straightforward. First you had data on mainframes in the data center. Even when we started using LANs to connect everything, data still lived on a raised floor, or in a pretty simple email system. Early client/server systems started complicating things a bit, but everything was still on networks you controlled in data centers you had the keys to. You could scan your address space and figure out where everything was, and what vulnerabilities needed to be dealt with. That worked pretty well for a long time. There were scaling issues, and a need (desire) to scan higher in the technology stack, so we started seeing first stand-alone and then integrated application scanners. Once rogue devices started appearing on your network, it was no longer sufficient to scan your address space every couple weeks, so passive network monitoring allowed you to watch traffic and flag (and assess) unknown devices. Those were the good old days, when things were relatively simple. Okay – maybe not really simple, but you could size the problem. That is no longer the case. Visibility Challenged We use a pretty funny meme in many of our presentations. It shows a man from the 1870s, remembering blissfully the good old days when he knew where his data was. That image always gets a lot of laughs from audiences. But it’s brought on by pain, because everyone in the room knows it illustrates a very real problem. Nowadays you don’t really know where your data is, which seriously compromises your capability to determine the security posture of the systems with access to it. These challenges are a direct result of a number of key technology innovations: SaaS: Securosis talks about how SaaS is the New Back Office, and that has rather drastic ramifications for visibility. Many organizations deploy CASB just to figure out which SaaS services they are using, because it’s not like business folks ask permission to use a business-oriented

Share:
Read Post

The Future of Security Operations: Embracing the Machines

To state the obvious, traditional security operations is broken. Every organization faces more sophisticated attacks, the possibility of targeted adversaries, and far more complicated infrastructure; compounding the problem, we have fewer skilled resources to execute on security programs. Obviously it’s time to evolve security operations by leveraging technology to both accelerate human work and take care of rote, tedious tasks which don’t add value. So security orchestration and automation are terms you will hear pretty consistently from here on out. Some security practitioners resist the idea of automation, mostly because if done incorrectly the ramifications are severe and likely career-limiting. So we’ve advocated a slow and measured approach, starting with use cases that won’t crater the infrastructure if something goes awry. We discussed two of those in depth: enriching alerts and accelerating incident response, in our Regaining Balance post. The value of being able to respond to more alerts, better, is obvious. So we expect technologies focused on this (small) aspect of security operations to become pervasive over the next 2-3 years. But the real leverage lies not just in making post-attack functions work better. The question is: How can you improve your security posture and make your environment more resilient by orchestrating and automating security controls? That’s what this post will dig into. But first we need to set some rules of engagement for what automation of this sort looks like. And more importantly, how you can establish trust in what you are automating. Ultimately the Future of Security Operations hinges on this concept. Without trust, you are destined to remain in the same hamster wheel of security pain (h/t to Andy Jaquith). Attack, alert, respond, remediate, repeat. Obviously that hasn’t worked too well, or we wouldn’t continue having the same conversations year after year. The Need for Trustable Automation It’s always interesting to broach the topic of security automation with folks who have had negative experiences with early (typically network-centric) automation. They instantaneously break out in hives when discussing automatically reconfiguring anything. We get it. When there is downtime or another adverse situation, ops people get fired and can’t pay their mortgages. Predictably, survival instincts kick in, limiting use of automation. Thus our focus on Trustable Automation – which means you tread carefully, building trust in both your automated processes and the underlying decisions that trigger them. Iterate your way to broader use of automation with a simple phased approach. Human approval: The first step is to insert a decision point into the process where a human takes a look and ensures the proper functions will happen as a result of automation. This is basically putting a big red button in the middle of the process and giving an ops person the ability to perform a few checks and then hit it. It’s faster but not really fast, because it still involves waiting on a human. Accept that some processes are so critical they never get past human approval, because the organization just cannot risk a mistake. Automation with significant logging: The next step is to take the training wheels off and let functions happen automatically, while making sure to log pretty much everything and have humans keep close tabs on it. Think of this as taking the training wheels off, but staying within a few feet of the bike, just in case it tips over. Or running an application in Debug mode so you can see exactly what is happening. If something does happen which you don’t expect, you’ll be right there to figure out what didn’t work as expected and correct it. As you build trust in the process, we recommend you continue to scrutinize logs, even when things go perfectly. This helps you understand the frequency of changes, and which changes are made. Basically you are developing a baseline of your automated process, which you can use in the next phase. Automation with guardrails: Finally you reach the point where you don’t need to step through every process. The machines are doing their job. That said, you still don’t want things to go haywire. Now you leverage the baseline you developed using automation with logging. With these thresholds you can build guardrails to make sure nothing happens outside your tolerances. For example, if you are automatically adding entries to an egress IP blacklist to stop internal traffic going to known bad locations, and all of a sudden your traffic to your SaaS CRM system is due to be added to your blacklist due to a fault threat intel update, you can prevent that update and alert administrators to investigate the threat intel update. Obviously this requires a fundamental understanding of the processes being automated and an ability to distinguish between low-risk changes which should be made automatically from those which require human review. But that level of knowledge is what engenders trust, right? Once you have built some trust in your automated process, you still want a conceptual net to make sure you don’t go splat if something doesn’t work as intended. The second requirement for trustable automation is rollback. You need to be able to quickly and easily get back to a known good configuration. So when rolling out any kind of automation (whether via scripting or a platform), you’ll want to make sure you store state information, and have the capability to reverse any changes quickly and completely. And yes, this is something you’ll want to test extensively, both as you select an automation platform and once you start using it. The point is that as you design orchestration and automation functions, you have a lot of flexibility to get there at your own pace. Some folks have a high threshold for pain and jump in with both feet, understanding at some point they will likely need to clean up a mess. Others choose to tiptoe toward an automated future, adding use cases as they build comfort in the ability of their controls to work without human involvement. There is no right answer

Share:
Read Post

The Future of Security Operations: Regaining Balance

The first post in this series, Behind the 8 Ball, raised a number of key challenges practicing security in our current environment. These include continual advancement and innovation by attackers seeking new ways to compromise devices and exfiltrate data, increasing complexity of technology infrastructure, frequent changes to said infrastructure, and finally the systemic skills shortage which limits our resources available to handle all the challenges created by the other issues. Basically, practitioners are behind the 8-ball in getting their job done and protecting corporate data. As we discussed in that earlier post, thinking differently about security entails you changing things up to take a (dare we say it?) more enlightened approach, basically focusing the right resources on the right functions. We know it seems obvious that having expensive staff focused on rote and tedious functions is a suboptimal way to deploy resources. But most organizations do it anyway. We prefer to have our valuable, constrained, and usually highly skilled humans doing what humans are good at, such as: identifying triggers that might indicate malicious activity drilling into suspicious activity to understand the depth of attacks and assess potential damage figuring out workarounds to address attacks Humans in these roles generally know what to look for, but aren’t very good at looking at huge amounts of data to find those patterns. Many don’t like doing the same things over and over again – they get bored and less effective. They don’t like graveyard shifts, and they want work that teaches them new things and stretches their capabilities. Basically they want to work in an environment where they do cool stuff and can grow their skills. And (especially in security) they can choose where they work. If they don’t get the right opportunity in your organization, they will find another which better suits their capabilities and work style. On the other hand machines have no problem working 24/7 and don’t complain about boring tasks – at least not yet. They don’t threaten to find another place to work, nor do they agitate for broader job responsibilities or better refreshments in the break room. We’re being a bit facetious here, and certainly don’t advocate replacing your security team with robots. But in today’s asymmetric environment, where you can’t keep up with the task list, robots may be your only chance to regain balance and keep pace. So we will expand a bit on a couple concepts from our Intro to Threat Operations paper, because over time we expect our vision of threat operations to become a subset of SecOps. Enriching Alerts: The idea is to take an alert and add a bunch of common information you know an analyst will want to the alert, before to sending it to an analyst. This way the analyst doesn’t need to spend time gathering information from those various systems and information sources, and can get right to work validating the alert and determining potential impact. Incident Response: Once an alert has been validated, a standard set of activities are generally part of response. Some of these activities can be automated via integration with affected systems (networks, endpoint management, SaaS, etc.) and the time saved enables responders to focus on higher-level tasks such as determining proliferation and assessing data loss. Enriching Alerts Let’s dig into enriching alerts from your security monitoring systems, and how this can work without human intervention. We start with a couple different alerts, and some educated guesses as to what would be useful to an analyst. Alert: Connection to a known bad IP: Let’s say an alert fires for connectivity to a known bad IP address (thanks, threat intel!). With source and destination addresses, an analyst would typically start gathering basic information. 1. Identity: Who uses the device? With a source IP it’s usually straightforward to see who the address is allocated to, and then what devices that person tends to use. Target: Using a destination IP external site comes into focus. An analyst would probably perform geo-location to figure out where the IP is and a whois query to figure out who owns it. They could also figure out the hosting provider and search their threat intel service to see if the IP belongs to a known botnet, and dig up any associated tactics. Network traffic: The analyst may also check out network traffic from the device to look for strange patterns (possibly C&C or reconnaissance) or uncharacteristically large volumes to or from that device over the past few days. Device hygiene: The analyst also needs to know specifics about the device, such as when it was last patched and does it have a non-standard configuration? Recent changes: The analyst would probably be interested in software running on the device, and whether any programs have been installed or configurations changed recently. Alert: Strange registry activity: In this scenario an alert is triggered because a device has had its registry changed, but it cannot be traced back to authorized patches or software installs. The analyst could use similar information to the first example, but device hygiene and recent device changes would be of particular interest. The general flow of network traffic would also be of interest, given that the device may have been receiving instructions or configuration changes from external devices. In isolation registry changes may not be a concern, but in close proximity of a larger inbound data transfer the odds of trouble increase. Additionally, checking out web traffic logs from the device could provide clues to what they were doing that might have resulted in compromise. Alert: Large USB file transfer: We can also see the impact of enrichment in an insider threat scenario. Maybe an insider used their USB port for the first time recently, and transferred 1GB of data in a 3-hour window. That could generate a DLP alert. At that point it would be good to know which internal data sources the device has been communicating with, and any anomalous data volumes over the past few days, which

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Top 10 Questions for Detection and Response

There are plenty of obvious questions you could ask an endpoint security vendor. But most won’t really help you understand the nuances of their approach, so we decided to distill the selection criteria down to a couple of key points. We’ll provide not just the questions, but the rationale behind them. Q1: Where do you draw the line between prevention and EDR? The clear trend is towards an integrated advanced endpoint protection capability addressing prevention, detection, response, and hunting. That said, it may not be the right answer for any specific organization, depending on the adversaries they face and the sophistication & capabilities of their internal team. As discussed under selection criteria for Prevention, simple EDR (EDR-lite) is already bundled into a few advanced prevention products, accelerating this integration and emphasizing the importance of deciding whether the organization needs separate tools for prevention and detection/response/hunting. Q2: How does your product track a campaign, as opposed to just looking for attacks on single endpoints? Modern attacks rarely focus on just one endpoint – they tend to compromise multiple devices as the adversary advances towards their objective. To detect and respond to such modern attacks, analysis needs to look not merely at what’s happening on a single endpoint, but also at how that endpoint is interacting with the rest of the environment – looking for broader indications of reconnaissance, lateral movement, and exfiltration. Q3: Is detection based on machine learning? Does your analysis leverage the cloud? How do your machine learning models handle false positives? Advanced analytics are not the only way to detect attacks, but they are certainly among the key techniques. This question addresses the vendor’s approach to machine learning, digs into where they perform analysis, and gets at the breadth of the data they use to train ML models. Finally, you want the vendor to pass a sniff test on false positives. If any vendor claims they don’t have false positives, run away fast. Q4: Does your endpoint agent work in user or kernel mode? What kind of a performance impact does your agent have on devices? The answer is typically ‘both’ because certain activities that cannot be monitored or prevented purely from user space or kernel mode. For monitoring and EDR, it’s possible to stay within user mode, but that limits automated remediation capability because some attacks need to be dealt with at the kernel level. Of course, with many agents already in use on typical endpoints, when considering adding another for EDR you will want to understand the performance characteristics of the new agent. Q5: Do we need “Full DVR”, or is collecting endpoint metadata sufficient? This question should reveal the vendor’s response religion – some believe comprehensive detection and/or response can work using only metadata from granular endpoint telemetry, while others insist that a full capture of all endpoint activity is necessary to effectively respond and to hunt for malicious activity. The truth is somewhere in the middle, depending on your key use case. Detection-centric environments can run well on metadata, but if response/hunting is your driving EDR function, access to full devie telemetry is more important because attackers tend to cover their tracks using self-deleting files and other techniques to obfuscate their activities. Keep in mind that the EDR architecture is a major factor here, as central analysis of metadata can provide excellent detection, with full telemetry stored temporarily on each device in case it is needed for response. Q6: How is threat intelligence integrated into your agent? This anser should be about more than getting patterns for the latest indicators of compromise and patterns for attacks involving multiple devices. Integrated threat intel provides the ability to search historical telemetry for attacks you didn’t recognize as attacks at the time (retrospective search). You should also be able to share intelligence with a community of similar organizations, and be able to integrate first-party intel from your vendor with third-party intel from threat intelligence vendors when appropriate. Additionally, the able to send unrecognized files to a network sandbox makes the system more effective and enables quicker recognition of emerging attacks. Q7: How does your product support searching endpoint telemetry for our SOC analysts? Can potentially compromised devices be polled in real time? What about searching through endpoint telemetry history? Search is king for EDR tools, so spend some time with the vendor to understand their search interface and how it can be used to drill down into specific devices or pivot to other devices, to understand which devices an attacker has impacted. You’ll also want to see their search responsiveness, especially with data from potentially hundreds of thousands of endpoints in the system. This is another opportunity to delve into retrospective search capabilities – key for finding malicious activity, especially when you don’t recognize it as bad when it occurs. Also consider the tradeoffs between retention of telemetry and the cost of storing it, because being able to search a longer history window makes both retrospective search and hunting more effective. Q8: Once I get an alert, does the product provide a structured response process? What kind of automation is possible with your product? What about case management? As we have discussed throughout this series, the security skills gap makes it critical to streamline the validation and response processes for less sophisticated analysts. The more structured a tool can make the user experience, the more it can help junior analysts be more productive, faster. That said, you also want to make sure the tool isn’t so structured that analysts have no flexibility to follow their instincts and investigate the attack a different way. Q9: My staff aren’t security ninjas, but I would like to proactively look for attackers. How does your product accelerate a hunt, especially for unsophisticated analysts? Given sufficiently capable search and visualization of endpoint activity, advanced threat hunters can leverage an EDR tool for hunting. Again, you’ll want to learn how the tool can make your less experienced folks more productive and enable them

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Key Technologies for Detection and Response

Now let’s dig into some key EDR technologies which appear across all the use cases: detection, response, and hunting. Agent The agent is deployed to each monitored endpoint, so you be sensitive to its size and its performance hit on devices. A main complaint regarding older endpoint protection was performance impact on devices. The smaller the better, and the less performance impact the better (duh!), but just as important is agent deployability and maintainability. Full capture versus metadata: There are differing strong opinions on how much telemetry to capture and store from each device. Similar to the question of whether to do full network packet capture or to derive metadata from the packet stream, there is a level of granularity available with a full endpoint capture which isn’t available via metadata, for session reconstruction and more detail about what an adversary actually did. But full capture is very resource and storage intensive, so depending on the sophistication of your response team and process, metadata may be your best option. Also consider products that can gather more device telemetry when triggered, perhaps by an alert or connection to a suspicious network. Offline collection: Mobile endpoints are not always on the network, so agents much be able to continue collecting event data when disconnected. Once back on the network, cached endpoint telemetry should be uploaded to the central repository, which can then perform aggregate analysis. Multi-platform support: It’s a multi-platform world, and your endpoint security strategy needs to factor in not just Windows devices, but also Macs and Linux. Even if these platforms aren’t targeted they could be used in sophisticated operations as repositories, staging grounds, and file stores. Different operating systems offer different levels of telemetry access. Security vendors have less access to the kernel on both Mac and Linux systems than on Windows. Also dig into how vendors leverage built-in operating system services to provide sufficiently granular data for analysis. Finally, mobile devices access and store critical enterprise data, although their vulnerability is still subject to debate. We do not consider mobile devices as part of these selection criteria, although for many organizations an integrated capability is an advantage. Kernel vs. user space: There is a semi-religious battle over whether a detection agent needs to live at the kernel level (with all the potential device instability risks that entails), or accurate detection can take place exclusively at the kernel level. Any agent must be able to detect attacks at lower levels of the operating system – such as root kits – as well as any attempts at agent tampering (again, likely outside user space). Again, we don’t get religious, and we appreciate that user-space agents are less disruptive, but are not willing to compromise on detecting all attacks. Tamper proof: Speaking of tampering, to address another long standing issue with traditional EPP, you’ll want to dig into the product security of any agent you install on any endpoint in your environment. We can still remember the awesome Black Hat talks where EPP agent after EPP agent was shown to be more vulnerable than some enterprise applications. Let’s learn from those mistakes and dig into the security and resilience of the detection agents to make sure you aren’t just adding attack surface. Scalability: Finally, scale is a key consideration for any enterprise. You might have 1,000 or 100,000 devices, or even more; but regardless you need to ensure the tool will work for the number of endpoints you need to support, and the staff on your team – both in terms of numbers and sophistication. Of course you need to handle deployment and management of agents, but don’t forget the scalability and responsiveness of analysis and searching. Machine Learning Machine learning is a catch-all term which endpoint detection/response vendors use for sophisticated mathematical analysis across a large dataset to generate models, intended to detect malicious device activity. Many aspects of advanced mathematics are directly relevant to detection and response. Static file analysis: With upwards of a billion malicious file samples in circulation, mathematical malware analysis can pinpoint commonalities across malicious files. With a model of what malware looks like, detection offerings can then search for these attributes to identify ‘new’ malware. False positives are always a concern with static analysis, so part of diligence is ensuring the models are tested constantly, and static analysis should only be one part of malware detection. Behavioral profiles: Similarly, behaviors of malware can be analyzed and profiled using machine learning. Malware profiling produces a dynamic model which can be used to look for malicious behavior. Those are the main use cases for machine learning in malware detection, but there are a number of considerations when evaluating machine learning approaches, including: Targeted attacks: With an increasing amount of attacks specifically targeting individual organizations, it is again important to distinguish delivery from compromise. Targeted attacks use custom (and personalized) methods to deliver attacks – which may or may not involve custom malware – but once the attacker has access to a device they use similar tactics to a traditional malware attack, so machine learning models don’t necessarily need to do anything unusual to deal with targeted attacks. Cloud analytics: The analytics required to develop malware machine learning models are very computationally intensive. Cloud computing is the most flexible way to access that kind of compute power, so it makes sense that most vendors perform their number crunching and modeling in the cloud. Of course the models must be able to run on endpoints to detect malicious activity, so they are typically built in the cloud and executed locally on every endpoint. But don’t get distracted with where computation happens, so long as performance and accuracy are acceptable. Sample sizes: Some vendors claim that their intel is better than another company’s. That’s a hard claim to prove, but sample sizes matter. Looking at a billion malware samples is better than looking at 10,000. Is there a difference between looking at a hundred million samples and at a billion?

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Key Capabilities for Response and Hunting

As we resume posting Endpoint Detection and Response (D/R) selection criteria, let’s start with a focus on the Detection use case. Before we get too far into capabilities, we should clear up some semantics about the word ‘detection’. Referring back to our timeline in Prevention Selection Criteria, detection takes place during execution. You could make the case that detection of malicious activity is what triggers blocking, and so a pre-requisite to attack prevention – without detection, how could you know what to prevent?. But that’s too confusing. For simplicity let’s just say prevention means blocking an attack before it compromises a device, and can happen both prior to and during execution. Detection happens during and after execution, and implies a device was compromised because the attack was not prevented. Data Collection Modern detection requires significant analysis across a wide variety of telemetry sources from endpoints. Once telemetry is captured, a baseline of normal endpoint activity is established and used to look for anomalous behavior. Given the data-centric nature of endpoint detection, an advanced endpoint detection offering should aggregate and analyze the following types of data: Endpoint logs: Endpoints can generate a huge number of log entries, and an obvious reaction is to restrict the amount of log data ingested, but we recommend collecting as much log data from endpoint as possible. The more granular the better, given the sophistication of attackers and their ability to target anything on a device. If you do not collect the data on the endpoint, there is no way to get it when you need it to investigate later. Optimally, endpoint agents collect operating system activity alongside all available application logs. This includes identity activity such as new account creation and privilege escalation, process launching, and file system activity (key for detection ransomware). There is some nuance to how long you retain collected data because it can be voluminous and compute-intensive to process and analyze – both on devices and centrally. Processes: One of the more reliable ways to detect malicious behavior is by which OS processes are started and where they are launched from. This is especially critical when detecting scripting attacks because attackers love using legitimate system processes to launch malicious child processes. Network traffic: A compromised endpoint will inevitably connect to a command and control network for instructions and to download additional attack code. These actions can be detected by monitoring the endpoint’s network stack. An agent can also watch for connections to known malicious sites and for reconnaisance activity on the local network. Memory: Modern file-less attacks don’t store any malicious code in the file system, so modern advanced detection requires monitoring and analyzing activity within endpoint memory. Registry: As with memory-based attacks, attackers frequently store malicious code within the device registry to evade file system detection. So advanced detection agents need to monitor and analyze registry activity for signs of misuse. Configuration changes: It’s hard for attackers to totally obfuscate what is happening on an endpoint, so on-device configuration changes can indicate an attack. File integrity: Another long-standing method attack detection is monitoring changes to system files, because changes to such files outside administrative patching usually indicates something malicious. An advanced endpoint agent should collect this data and monitor for modified system files. Analytics As mentioned above, traditional endpoint detection relied heavily on simple file hashes and behavioral indicators. With today’s more sophisticated attacks, a more robust and scientific approach is required to distinguish legitimate from malicious activity. This more scientific approach is centered around machine learning techniques (advanced mathematics) to recognize the activity of adversaries before and during attacks. Modern detection products use huge amounts of endpoint telemetry (terabytes) to train mathematical models to detect anomalous activity and find commonalities between how attackers behave. These models then generate an attack score to prioritize alerts. Profiling applications: Detecting application misuse is predicated on understanding legitimate usage of the application, so the mathematical models analyze both legitimate and malicious usage of frequently targeted applications (browsers, office productivity suites, email clients, etc.). This is a similar approach to attack prevention, discussed in our Prevention Selection Criteria guide. Anomaly detection: With profiles in hand and a consistent stream of endpoint telemetry to analyze, the mathematical models attempt to identify abnormal device activity. When suspicion is high they trigger an alert, the device is marked suspicious, and an analyst determines whether the alert is legitimate. Tuning: To avoid wasting too much time on false positives, the detection function needs to constantly learn what is really an attack and what isn’t, based on the results of detection in your environment. In terms of process, you’ll want to ensure your feedback is captured by your detection offering, and used to constantly improve your models to keep detection precise and current. Risk scoring: We aren’t big fans of arbitrary risk scoring because the underlying math can be suspect. That said, there is a role for risk scoring in endpoint attack detection: prioritization. With dozens of alerts hitting daily – perhaps significantly more – it is important to weigh which alerts warrant immediate investigation, and a risk score should be able to tell you. Be sure to investigate the underlying scoring methodology, track scoring accuracy, and tune scoring to your environment. Data management: Given the analytics-centric nature of EDR, being able to handle and analyze large amounts of endpoint telemetry collected from endpoints is critical. Inevitably you’ll run into the big questions: where to store all the data, how to scale analytics to tens or hundreds of thousands of endpoints, and how to economically analyze all your security data. But ultimately your technology decision comes down to a few factors: Cost: Whether or not the cost of storage and analytics is included in the service (some vendors store all telemetry in a cloud instance) or you need to provision a compute cluster in your data center to perform the analysis, there is a cost to crunching all the numbers. Make sure hardware, storage, and networking costs (including management)

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Key Capabilities for Detection

As we resume posting Endpoint Detection and Response (D/R) selection criteria, let’s start with a focus on the Detection use case. Before we get too far into capabilities, we should clear up some semantics about the word ‘detection’. Referring back to our timeline in Prevention Selection Criteria, detection takes place during execution. You could make the case that detection of malicious activity is what triggers blocking, and so a pre-requisite to attack prevention – without detection, how could you know what to prevent?. But that’s too confusing. For simplicity let’s just say prevention means blocking an attack before it compromises a device, and can happen both prior to and during execution. Detection happens during and after execution, and implies a device was compromised because the attack was not prevented. Data Collection Modern detection requires significant analysis across a wide variety of telemetry sources from endpoints. Once telemetry is captured, a baseline of normal endpoint activity is established and used to look for anomalous behavior. Given the data-centric nature of endpoint detection, an advanced endpoint detection offering should aggregate and analyze the following types of data: Endpoint logs: Endpoints can generate a huge number of log entries, and an obvious reaction is to restrict the amount of log data ingested, but we recommend collecting as much log data from endpoint as possible. The more granular the better, given the sophistication of attackers and their ability to target anything on a device. If you do not collect the data on the endpoint, there is no way to get it when you need it to investigate later. Optimally, endpoint agents collect operating system activity alongside all available application logs. This includes identity activity such as new account creation and privilege escalation, process launching, and file system activity (key for detection ransomware). There is some nuance to how long you retain collected data because it can be voluminous and compute-intensive to process and analyze – both on devices and centrally. Processes: One of the more reliable ways to detect malicious behavior is by which OS processes are started and where they are launched from. This is especially critical when detecting scripting attacks because attackers love using legitimate system processes to launch malicious child processes. Network traffic: A compromised endpoint will inevitably connect to a command and control network for instructions and to download additional attack code. These actions can be detected by monitoring the endpoint’s network stack. An agent can also watch for connections to known malicious sites and for reconnaisance activity on the local network. Memory: Modern file-less attacks don’t store any malicious code in the file system, so modern advanced detection requires monitoring and analyzing activity within endpoint memory. Registry: As with memory-based attacks, attackers frequently store malicious code within the device registry to evade file system detection. So advanced detection agents need to monitor and analyze registry activity for signs of misuse. Configuration changes: It’s hard for attackers to totally obfuscate what is happening on an endpoint, so on-device configuration changes can indicate an attack. File integrity: Another long-standing method attack detection is monitoring changes to system files, because changes to such files outside administrative patching usually indicates something malicious. An advanced endpoint agent should collect this data and monitor for modified system files. Analytics As mentioned above, traditional endpoint detection relied heavily on simple file hashes and behavioral indicators. With today’s more sophisticated attacks, a more robust and scientific approach is required to distinguish legitimate from malicious activity. This more scientific approach is centered around machine learning techniques (advanced mathematics) to recognize the activity of adversaries before and during attacks. Modern detection products use huge amounts of endpoint telemetry (terabytes) to train mathematical models to detect anomalous activity and find commonalities between how attackers behave. These models then generate an attack score to prioritize alerts. Profiling applications: Detecting application misuse is predicated on understanding legitimate usage of the application, so the mathematical models analyze both legitimate and malicious usage of frequently targeted applications (browsers, office productivity suites, email clients, etc.). This is a similar approach to attack prevention, discussed in our Prevention Selection Criteria guide. Anomaly detection: With profiles in hand and a consistent stream of endpoint telemetry to analyze, the mathematical models attempt to identify abnormal device activity. When suspicion is high they trigger an alert, the device is marked suspicious, and an analyst determines whether the alert is legitimate. Tuning: To avoid wasting too much time on false positives, the detection function needs to constantly learn what is really an attack and what isn’t, based on the results of detection in your environment. In terms of process, you’ll want to ensure your feedback is captured by your detection offering, and used to constantly improve your models to keep detection precise and current. Risk scoring: We aren’t big fans of arbitrary risk scoring because the underlying math can be suspect. That said, there is a role for risk scoring in endpoint attack detection: prioritization. With dozens of alerts hitting daily – perhaps significantly more – it is important to weigh which alerts warrant immediate investigation, and a risk score should be able to tell you. Be sure to investigate the underlying scoring methodology, track scoring accuracy, and tune scoring to your environment. Data management: Given the analytics-centric nature of EDR, being able to handle and analyze large amounts of endpoint telemetry collected from endpoints is critical. Inevitably you’ll run into the big questions: where to store all the data, how to scale analytics to tens or hundreds of thousands of endpoints, and how to economically analyze all your security data. But ultimately your technology decision comes down to a few factors: Cost: Whether or not the cost of storage and analytics is included in the service (some vendors store all telemetry in a cloud instance) or you need to provision a compute cluster in your data center to perform the analysis, there is a cost to crunching all the numbers. Make sure hardware, storage, and networking costs (including management)

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Detection and Response Use Cases

As we continue documenting what you need to know to understand Endpoint Advanced Protection offerings, it’s time to delve into Detection and Response. Remember that before you are ready to pick anything, you need to understand the problem you are trying to solve. Detecting all endpoint attacks within microseconds and without false positives isn’t really achievable. You need to determine the key use cases most important to you, and make an honest assessment of your team and adversaries. Why is this introspection necessary? Nobody ever says they don’t want to detect active attacks and hunt for adversaries. It’s cool and it’s necessary. Nobody wants to be perpetually reacting to attacks. That said, if you don’t have enough staff to work through half the high-priority alerts from your security monitoring systems, how can you find time to proactively hunt for stuff your monitoring systems don’t catch? As another example, your team may consist of a bunch of entry-level security analysts struggling to figure out which events are actual device compromise, and which are false positives. Tasking these less sophisticated folks with advanced memory forensics to identify file-less malware may not be a good use of time. To procure effective advanced Endpoint Detection and Response (EDR) technology, you must match what you buy to your organization’s ability to use it. Of course you should be able to grow into a more advanced program and capability. But don’t pay for an Escalade when a Kia Sportage is what you need today. Over the next 5 days we will explain what you need to know about Detection and Response (D/R) to be an educated buyer of these solutions. We’ll start by helping you understand the key use cases for D/R, and then delve into the important capabilities for each use case, the underlying technologies which make it all work, and finally some key questions to ask vendors to understand their approaches to your problems. Planning for Compromise Before we get into specific use cases, we need to level-set regarding your situation, which we highlighted in our introduction to the Endpoint Advanced Protection Buyer’s Guide. For years there was little innovation in endpoint protection. Even worse, far too many organizations didn’t upgrade to the latest versions of their vendor’s offerings – meaning they were trying to detect 2016 attacks with 2011 technology. Inevitably that didn’t work out well. Now there are better alternatives for prevention, so where does that leave endpoint detection and response? In the same situation it has always been: a necessity. Regardless of how good your endpoint prevention strategy is, it’s not good enough. You will have devices which get compromised. So you must to be in position to detect compromise and respond to it effectively and efficiently. The good news is that in the absence of perfect (and often even effective) prevention options, many organizations have gone down this path, investing in better detection and response. They have been growing network-based detection and centralized security monitoring infrastructure (which drove the wave of security analytics offerings hitting the market now), and these organizations also invested in technologies to gather telemetry from endpoints and make sense of it. To be clear, we have always been able to analyze what happened on an endpoint after an attack, assuming some reasonable logging and a forensic image of the device. There are decent open source tools for advanced forensics, which have always been leveraged by forensicators who charge hundreds of dollars an hour. What you don’t have is enough people to perform that kind of response and forensic analysis. You hardly have enough people to work through your alert queue, right? This is where advanced Endpoint Detection and Response (EDR) tools can add real value to your security program. Facing a significant and critical skills gap, the technology needs to help your less experience folks by structuring their activities and making their next step somewhat intuitive. But if a tool can’t make your people better and faster, then why bother? But all vendors say that, right? They claim their tools find unknown attacks, and don’t create a bunch of makework wasted identifying or confirming false positives. And help you prioritize activities. The magic tools even find attacks before you know they are attacks, bundled with a side of unicorn dust. Our objective with these selection criteria is to make sure you understand how to dig deeper into the true capabilities of these products, and know what is real and what is marketing puffery. Understand whether a vendor understands the entire threat landscape, or is focused on a small subset of high-profile attack vectors, and whether they will be an effective partner as adversaries and their tactics inevitably change. But as we mentioned above, you need to focus your selection process on the problem you need to solve, which comes down to defining your main use cases for EDR. Key Use Cases Let’s be clear about use cases. There are three main functions you need these tools to perform, and quite a bit of overlap with technologies underlying endpoint prevention. Detection: When you are attacked it’s a race against time. Attackers are burrowing deeper into your environment and working toward their goal. The sooner you detect that something is amiss on an endpoint, the better your chance to contain the damage. Today’s challenge is not just detecting an attack on a single endpoint, but instead figuring out the extent of a coordinated campaign against many endpoints and devices within your environment. Response: Once you know you have been attacked you need to respond quickly and efficiently. This use case focuses on providing an analyst the ability to drill down, validate an attack, and determine the extent of the attacker’s actions across all affected device(s), while assessing potential damage. You also need to able to figure out effective workarounds and remediations to instruct the operations team how to prevent further outbreaks of the same attack. Don’t forget the need to make sure evidence is gathered in a way

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Top 10 Questions on Prevention

There are plenty of obvious questions you could ask an endpoint security vendor. But most won’t really help you understand the nuances of their approach, so we decided to distill the selection criteria down to a couple of key points. We’ll provide not just the questions, but the rationale behind them. Q1 If your prevention capabilities rely on machine learning, how and how often are your machine learning models retrained? An explanation here should provide some perspective on the vendor’s approach to math and the ‘half-life’ of their models, which indicates how quickly they believe malware attack behaviors change. Some espouse continuous retraining, while others maintain that very little changes daily, so it’s sufficient to retrain weekly or monthly. You’ll also want to understand whether model updates disrupt the end-user experience with significant downloads, restarts, or other intrusions on normal user activity. Q2 How do your machine learning models factor in false positives and minimize them? If a vendor claims they don’t have false positives, run away – quickly. Customer feedback and awareness of false positives are critical to keeping the products current, so get a sense of how they update their protection, models, and whitelists based on what doesn’t work in the field. Q3 Does your agent work in user or kernel mode? Do you protect the OS kernel? The answer is typically both, because some activities aren’t accessible from either user mode and others from kernel mode. For monitoring and EDR it’s possible to stay in user mode but if attacks need to be blocked there is a requirement for some kind of kernel access. This question enables you to get at how an EDR vendor has developed their offering to provide broader prevention. Q4 Can you block an attack before it loads into memory? If so, how? This digs into the vendor’s ability to block a file-less attack, a critical aspect of stopping advanced attacks. Q5 How do you prevent an attacker from gaining root access to the device? You are trying to understand the exploit prevention/blocking capability of the product; at some point to control the machine, an attacker will need root-level access. Q6 How is threat intelligence integrated into the prevention agent? This answer should be about more than getting patterns for the latest indicators of compromise. It should include the ability to block known bad IP addresses at the network layer, as well as cloud-based sandbox integration. Q7 How often and how large are agent updates? How do you age out old signatures to conserve space? How are updates distributed? Attacks change and machine learning models are imperfect, so agents need to be updated. Especially given that what’s old becomes new again, and care needs to be taken when specific signatures or behavioral models are no longer on endpoint agents. This question enables you to assess how the vendor distributes work between the agents and the cloud. It also gets at whether the vendor has a cloud-native management option, which wouldn’t require an on-premise aggregation point. Q8 How does the product integrate with other enterprise security solutions, including SIEM and/or EDR? If the vendor offers a full EDR capability use this question to figure out whether it’s a common agent between prevention and EDR, and the level of management integration. You can dig a bit into how the endpoint prevention product sends data to/from a SIEM, incident response, and network-based controls. Q9 Does the product support automation? If so, how? Given that you likely don’t have enough people to do what needs to be done, you’ll need to automate some functions as the only way to scale up your security operation. Some tools integrate with automation platforms (typically for incident response), while offering some auto-remediation for common problems (make sure you can control what gets done by the machine and what just sends an alert to a console). Q10 Does the product support application control to lock down some devices? How are exceptions to the white list handled? Can you lock down USB ports? For some devices it’s easier and safer to just lock them down and prevent unauthorized executables from running. This won’t work on all devices but having the option provides flexibility in designing endpoint controls. Likewise, locking down USB ports prevents a common mechanism of data leakage. We could ask another couple hundred questions, but these ten should provide a lot of the insight you need to differentiate between vendors. Next week we’ll post a similar set of criteria for Detection/Response. Enjoy the weekend! Share:

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Key Prevention Technologies

After exploring prevention approaches, you should understand some common technologies which are foundational to endpoint advanced prevention offerings. Machine Learning Machine learning is a catch-all term to indicate that the endpoint protection vendor uses sophisticated mathematical analysis on a large set of data to generate models for detecting malicious files or activity on devices. There are a couple mathematical algorithms which can improve malware prevention. Static file analysis: With upwards of a billion malicious file samples in circulation, mathematical analysis of malware can pinpoint commonalities across malicious files. With a model of what malware looks like, advanced prevention products then Share:

Share:
Read Post
dinosaur-sidebar

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.