Securosis

Research

Container Security 2018: Threats and Concerns

To better understand which container security areas you should focus on, and why we recommend particular controls, it helps to understand which threats need to be addressed and which areas containers affect most. Some threats and issues are well-known, some are purely lab proofs of concept, and others are threat vectors which attackers have yet to exploit – typically because there is so much low-hanging fruit elsewhere. So what are the primary threats to container environments? Threats to the Build Environment The first area which needs protection is the build environment. It’s not first on most people’s lists for container security, but I start here because it is typically the least secure, and the easiest place to insert malicious code. Developers tend to loathe security in development because it slows them down. That is why there is an entire industry dedicated to test data management and data asking: because developers tend to end-run around security whenever it slows their build and testing processes. What kinds of threats are we talking about, specifically? Things like malicious or moronic source code changes. Malicious or mistaken alterations to automated build controllers. Configuration scripts with errors, or which expose credentials. The addition of insecure libraries or down-rev/insecure versions of existing code. We want to know whether runtime code has been scanned for vulnerabilities. And we worry about failures to audit all the above and catch any errors. Container Workload and Contents What the hell is in the container? What does it do? Is that even the correct version? These are common questions from operations folks. They have no idea. Nor do they know whether developers included tools like ssh in a container so they can alter its contents on the fly. Just as troubling is the difficulty of mapping access rights to OS and host resources by a container, which can break operational security and open up the entire stack to various attacks. Security folks are typically unaware of what – if any – container hardening may have been performed. You want to know each container’s contents have been patched, vetted, hardened, and registered prior to deployment. Runtime Behavior Organizations worry a container will attack or infect another container. They worry a container may quietly exfiltrate data, or just exhibit suspicious behavior. We have seen attacks extract source code, and others add new images to registries – in both cases the platforms were unprotected by identity and access management. Organizations need to confirm that access to the Docker client is sufficiently gated through access controls to limit who controls the runtime environment. They worry about containers running a long time, without rotation to newer patched versions. And whether the network has been properly configured to limit damage from compromise. And also about attackers probing containers, looking for vulnerabilities. Operating System Security Finally, the underlying operating system’s security is a concern. The key question is whether it is configured correctly to restrict each container’s access to the subset of resources it needs, and to effectively block everything else. Customers worry that a container will attack the underlying host OS or the container engine. They worry that the container engine may not sufficiently shield the underlying OS. If an attack on the host platform succeeds it’s pretty much game over for that cluster of containers, and may give malicious code sufficient access to pivot and attack other systems. Orchestration Manager Security A key reason to update and reissue this report is this change in the container landscape, where focus has shifted to orchestration managers which control containers. It sounds odd, but as containers have become a commodity unit of application delivery, organizations have begun to feel they understand containers, and attention has shifted to container management. Attention and innovation have shifted to focus on cluster orchestration, with Kubernetes the poster child for optimizing value and use of containers. But most of the tools are incredibly complex. And like many software product, the focus of orchestration tools is scalability and ease of management – not security. As you probably suspected, orchestration tools bring a whole new set of security issues and vulnerabilities. Insecure default configurations, as well as permission escalation and code injection vulnerabilities, are common. What’s more, most organizations issue certificates, identity tokens and keys from the orchestration manager as containers are launched. We will drill down into these issues and what to do about them in the remainder of this series. Share:

Share:
Read Post

Building a Container Security Program 2018: Introduction

The explosive growth of containers is not surprising – these technologies, such as Docker, alleviate several problems for developers deploying applications. Developers need simple packaging, rapid deployment, reduced environmental dependencies, support for microservices, generalized management, and horizontal scalability – all of which containers help provide. When a single technology enables us to address several technical problems at once, it’s very compelling. But this generic model of packaged services, where the environment is designed to treat each container as a “unit of service”, sharply reduces transparency and auditability (by design), and gives security pros nightmares. We run more code and faster, but must accept a loss of visibility inside the container. It begs the question, “How can we introduce security without losing the benefits of containers?” Containers scare the hell out of security pros because they are so opaque. The burden of securing containers falls across Development, Operations, and Security teams – but none of these groups always knows how to tackle their issues. Security and development teams may not even be fully aware of the security problems they face, as security is typically ignorant of the tools and technologies developers use, and developers don’t always know what risks to look for. The problem extends beyond containers to the entire build, deployment, and runtime environments. The container security space has changed substantially since our initial research 18-20 months back. Security of the orchestration manager is a primary concern, as organization rely more heavily on tools to deploy and scale out applications. We have seen a sharp increase in adoption of container services (PaaS) from various cloud vendors, which changes how organizations need to approach security. We reached forward a bit in our first container security paper, covering build pipleine security issues because we felt that was a hugely underservered area, but over the last 18 months DevOps practitioners have taken note, and this has become the top question we get. Just behind that is the need for secrets management to issue container credentials and secure identity. The rapid change of pace in this market means it’s time for a refresh. We get a ton of calls from people moving towards – or actively engaged in – DevOps, so we will target this research at both security practitioners and developers & IT operations. We will cover some reasons containers and container orchestration managers create new security concerns, as well as how to go about creating security controls across the entire spectrum. We will not go into great detail on how to secure apps in general here – instead we will focus on build, container management, deployment, platform, and runtime security which arise with containers. As always we hope you will ask questions and participate in the process. Community involvement makes our research betters so we welcome your inquires, comments, and suggestions. Share:

Share:
Read Post

How Cloud Security Managers Should Respond to Meltdown and Spectre

I hope everyone enjoyed the holidays… just in time to return to work, catch up on email, and watch the entire Internet burn down thanks to a cluster of hardware vulnerabilities built into pretty much every computing platform available. I won’t go into details or background on Meltdown and Spectre (note: if I ever discover a vulnerability, I want it named “CutYourF-ingHeartOutWithSpoon”). Instead I want to talk about them in the context of the cloud, short-term and long-term implications, and some response strategies. These are incredibly serious vulnerabilities – not only due to their immediate implications, but also because they will draw increased scrutiny to a set of hardware weaknesses, which in turn are likely to require a generational fix (a computer generation – not your kids). Meltdown Briefly, Meltdown increases the risk of a multi-tenancy break. This has impacts on three levels: It potentially enables any instance or guest on a system to read all the memory on that system. This is the piece which cloud providers have almost completely patched. On a single system, it could also allow code in a container to read the memory of the entire server. This is likely also patched by cloud providers (AWS/Google/Microsoft). Because Function as a Service (‘serverless’) offerings are really implemented as code in containers, the same issues apply to these products. Meltdown is a privilege escalation vulnerability and requires a malicious process to be run on the system – you cannot use it to gain an initial foothold or exploitation, but to do things like steal secrets from memory once you have presence. Meltdown in its current form on major cloud providers is likely not an immediate security risk. But just to be safe I recommend immediately applying Meltdown patches at the operating system level to any instances you have running. This would have been far worse if there hadn’t been a coordinated disclosure between researchers, hardware and operating system vendors, and cloud providers. You may see some performance degradation, but anything that uses autoscaling shouldn’t really notice. Spectre Spectre is a different group of vulnerabilities which relies on a different set of hardware-related issues. Right now Spectre only allows access to memory the application already has access to. This is still a privilege escalation issue because it’s useful for things like allowing hostile JavaScript code in a browser access to data outside its sandbox. This also seems like it could be an issue for anything which runs multiple processes in a sandbox (such as containers), and might allow reading data from other guests or containers on the same host. Exploitation is difficult, the cloud providers are on it, and there is nothing to be done right now – other than to pay attention. So for both attacks, your short-term action is to patch instances and keep an eye on upcoming patches. Oh – and if you run a private cloud, you really need to patch everything yesterday and be prepared to replace all your hardware within the next few years. All your hardware. Oops. Long-term implications and recommendations These are complex vulnerabilities related to deeply embedded hardware functionality. Spectre itself is more an entire vulnerability/exploit class than a single patchable vulnerability. Right now we seem to have the protections we need available, and the performance implications appear manageable (although the performance impact will be costly for some customers). The bigger concern is that we don’t know what other variants of both vulnerability classes may appear (or be discovered by malicious actors who don’t make them public). The consensus among my researcher friends is that this is a new area of study; while it’s not completely novel, it’s definitely drawing highly intelligent and experienced eyeballs. I will be very surprised if we don’t see more variants and implications over the next few years. Hardware manufacturers need to update chip designs, which is a slow process, and even then they are likely to leave holes which researchers will eventually discover. Let’s not mince words – this is a very big deal for cloud computing. The immediate risk is very manageable but we need to be prepared for the long-term implications. As this evolves, here is what I recommend: Obviously, immediately patch all your operating systems on all your instances to the best of your ability. Hopefully cloud provider mitigations at the hypervisor level are already protecting you, but it’s still better to be safe. Start with a focus on instances where memory leaks are the worst threat. For highly sensitive workloads (e.g., encryption) immediately consider moving to dedicated tenancy and don’t run any less-privileged workloads on the same hardware. Dedicated tenancy means you rent a whole box from your cloud provider, and only your workloads run on it. This eliminates much of the concern of guest to host breaks. Migrate to dedicated PaaS where possible, especially for things like encryption operations. For example if you move to an AWS Elastic Load Balancer and perform discrete application data encryption in KMS, your crypto operations and keys are never exposed in the memory of any general-purpose system. This is the critical piece: the hardware underpinning these services isn’t used for anything other than the assigned service. So another tenant cannot run a malicious process to read the box’s physical memory. If you can’t run malicious code as a tenant, then even if you break multi-tenancy you still need to compromise the entire system – which cloud providers are damn good at preventing. Removing customers’ ability to run arbitrary processes is a massive roadblock to exploitation of these kinds of vulnerabilities. Continue to migrate workloads to Function as a Service (also called ‘serverless’ and ‘Lambda’), but recognize there still are risks. Moving to servlerless pushes more responsibility for mitigating future vulnerabilities in these (and any other) classes onto your cloud provider, but since tenants can run nearly arbitrary code there is always a chance of future issues. Right now my feeling is that the risk is low, and far lower than running things

Share:
Read Post

New Paper: Understanding Secrets Management

Traditional application security concerns are shifting, responding to disruptive technologies and development frameworks. Cloud services, containerization, orchestration platforms, and automated build pipelines – to name just a few – all change the way we build and deploy applications. Each effects security a different way. One of the new application security challenges is to provision machines, applications, and services with the credentials they need at runtime. When you remove humans from the process things move much faster – but knowing how and when to automatically provide passwords, authentication tokens, and certificates is not an easy problem. This secrets management problem is not new, but our need grows exponentially when we begin orchestrating the entire application build and deployment process. We need to automate distribution and management of secrets to ensure secure application delivery. This research paper covers the basic use cases for secrets management, and then dives into different technologies that address this need. Many of the technologies assume a specific application deployment model so we also discuss pros and cons of the different approaches. We close with recommendations on product selection and decision criteria. We would like to thank the folks at CyberArk for getting behind this research effort and licensing this content. Support like this enables us to both deliever research under our Totally Transparent Research process and bring this content to you free of charge. Not even a registration wall. Free, and we respect your privacy. Not a bad deal. As always, if you have comments or question on our research please shoot us an email. If you want to comment or make suggestions for future iterations of this research, please leave a comment here. You can go directly to the full paper: Securosis_Secrets_Management_JAN2018_FINAL.pdf Or visit the research library page. Share:

Share:
Read Post

Firestarter: An Explicit End of Year Roundup

The gang almost makes it through half the episode before dropping some inappropriate language as they summarize 2017. Rather than focusing on the big news, we spend time reflecting on the big trends and how little has changed, other than the pace of change. How the biggest breaches of the year stemmed from the oldest of old issues, to the newest of new. And last we want to thank all of you for all your amazing support over the years. Securosis has been running as a company for a decade now, which likely scares all of you even more than us. We couldn’t have done it without you… seriously. Share:

Share:
Read Post

Firestarter: Breacheriffic EquiFail

This week Mike and Rich address the recent spate of operational fails leading to massive security breaches. This isn’t yet another blame the victim rant, but a frank discussion of why these issues are so persistent and so difficult to actually manage. We also discuss the rising role of automation and its potential to reduce these all-too-human errors. Watch or listen: Share:

Share:
Read Post

The Future of Security Operations: Regaining Balance

The first post in this series, Behind the 8 Ball, raised a number of key challenges practicing security in our current environment. These include continual advancement and innovation by attackers seeking new ways to compromise devices and exfiltrate data, increasing complexity of technology infrastructure, frequent changes to said infrastructure, and finally the systemic skills shortage which limits our resources available to handle all the challenges created by the other issues. Basically, practitioners are behind the 8-ball in getting their job done and protecting corporate data. As we discussed in that earlier post, thinking differently about security entails you changing things up to take a (dare we say it?) more enlightened approach, basically focusing the right resources on the right functions. We know it seems obvious that having expensive staff focused on rote and tedious functions is a suboptimal way to deploy resources. But most organizations do it anyway. We prefer to have our valuable, constrained, and usually highly skilled humans doing what humans are good at, such as: identifying triggers that might indicate malicious activity drilling into suspicious activity to understand the depth of attacks and assess potential damage figuring out workarounds to address attacks Humans in these roles generally know what to look for, but aren’t very good at looking at huge amounts of data to find those patterns. Many don’t like doing the same things over and over again – they get bored and less effective. They don’t like graveyard shifts, and they want work that teaches them new things and stretches their capabilities. Basically they want to work in an environment where they do cool stuff and can grow their skills. And (especially in security) they can choose where they work. If they don’t get the right opportunity in your organization, they will find another which better suits their capabilities and work style. On the other hand machines have no problem working 24/7 and don’t complain about boring tasks – at least not yet. They don’t threaten to find another place to work, nor do they agitate for broader job responsibilities or better refreshments in the break room. We’re being a bit facetious here, and certainly don’t advocate replacing your security team with robots. But in today’s asymmetric environment, where you can’t keep up with the task list, robots may be your only chance to regain balance and keep pace. So we will expand a bit on a couple concepts from our Intro to Threat Operations paper, because over time we expect our vision of threat operations to become a subset of SecOps. Enriching Alerts: The idea is to take an alert and add a bunch of common information you know an analyst will want to the alert, before to sending it to an analyst. This way the analyst doesn’t need to spend time gathering information from those various systems and information sources, and can get right to work validating the alert and determining potential impact. Incident Response: Once an alert has been validated, a standard set of activities are generally part of response. Some of these activities can be automated via integration with affected systems (networks, endpoint management, SaaS, etc.) and the time saved enables responders to focus on higher-level tasks such as determining proliferation and assessing data loss. Enriching Alerts Let’s dig into enriching alerts from your security monitoring systems, and how this can work without human intervention. We start with a couple different alerts, and some educated guesses as to what would be useful to an analyst. Alert: Connection to a known bad IP: Let’s say an alert fires for connectivity to a known bad IP address (thanks, threat intel!). With source and destination addresses, an analyst would typically start gathering basic information. 1. Identity: Who uses the device? With a source IP it’s usually straightforward to see who the address is allocated to, and then what devices that person tends to use. Target: Using a destination IP external site comes into focus. An analyst would probably perform geo-location to figure out where the IP is and a whois query to figure out who owns it. They could also figure out the hosting provider and search their threat intel service to see if the IP belongs to a known botnet, and dig up any associated tactics. Network traffic: The analyst may also check out network traffic from the device to look for strange patterns (possibly C&C or reconnaissance) or uncharacteristically large volumes to or from that device over the past few days. Device hygiene: The analyst also needs to know specifics about the device, such as when it was last patched and does it have a non-standard configuration? Recent changes: The analyst would probably be interested in software running on the device, and whether any programs have been installed or configurations changed recently. Alert: Strange registry activity: In this scenario an alert is triggered because a device has had its registry changed, but it cannot be traced back to authorized patches or software installs. The analyst could use similar information to the first example, but device hygiene and recent device changes would be of particular interest. The general flow of network traffic would also be of interest, given that the device may have been receiving instructions or configuration changes from external devices. In isolation registry changes may not be a concern, but in close proximity of a larger inbound data transfer the odds of trouble increase. Additionally, checking out web traffic logs from the device could provide clues to what they were doing that might have resulted in compromise. Alert: Large USB file transfer: We can also see the impact of enrichment in an insider threat scenario. Maybe an insider used their USB port for the first time recently, and transferred 1GB of data in a 3-hour window. That could generate a DLP alert. At that point it would be good to know which internal data sources the device has been communicating with, and any anomalous data volumes over the past few days, which

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Top 10 Questions for Detection and Response

There are plenty of obvious questions you could ask an endpoint security vendor. But most won’t really help you understand the nuances of their approach, so we decided to distill the selection criteria down to a couple of key points. We’ll provide not just the questions, but the rationale behind them. Q1: Where do you draw the line between prevention and EDR? The clear trend is towards an integrated advanced endpoint protection capability addressing prevention, detection, response, and hunting. That said, it may not be the right answer for any specific organization, depending on the adversaries they face and the sophistication & capabilities of their internal team. As discussed under selection criteria for Prevention, simple EDR (EDR-lite) is already bundled into a few advanced prevention products, accelerating this integration and emphasizing the importance of deciding whether the organization needs separate tools for prevention and detection/response/hunting. Q2: How does your product track a campaign, as opposed to just looking for attacks on single endpoints? Modern attacks rarely focus on just one endpoint – they tend to compromise multiple devices as the adversary advances towards their objective. To detect and respond to such modern attacks, analysis needs to look not merely at what’s happening on a single endpoint, but also at how that endpoint is interacting with the rest of the environment – looking for broader indications of reconnaissance, lateral movement, and exfiltration. Q3: Is detection based on machine learning? Does your analysis leverage the cloud? How do your machine learning models handle false positives? Advanced analytics are not the only way to detect attacks, but they are certainly among the key techniques. This question addresses the vendor’s approach to machine learning, digs into where they perform analysis, and gets at the breadth of the data they use to train ML models. Finally, you want the vendor to pass a sniff test on false positives. If any vendor claims they don’t have false positives, run away fast. Q4: Does your endpoint agent work in user or kernel mode? What kind of a performance impact does your agent have on devices? The answer is typically ‘both’ because certain activities that cannot be monitored or prevented purely from user space or kernel mode. For monitoring and EDR, it’s possible to stay within user mode, but that limits automated remediation capability because some attacks need to be dealt with at the kernel level. Of course, with many agents already in use on typical endpoints, when considering adding another for EDR you will want to understand the performance characteristics of the new agent. Q5: Do we need “Full DVR”, or is collecting endpoint metadata sufficient? This question should reveal the vendor’s response religion – some believe comprehensive detection and/or response can work using only metadata from granular endpoint telemetry, while others insist that a full capture of all endpoint activity is necessary to effectively respond and to hunt for malicious activity. The truth is somewhere in the middle, depending on your key use case. Detection-centric environments can run well on metadata, but if response/hunting is your driving EDR function, access to full devie telemetry is more important because attackers tend to cover their tracks using self-deleting files and other techniques to obfuscate their activities. Keep in mind that the EDR architecture is a major factor here, as central analysis of metadata can provide excellent detection, with full telemetry stored temporarily on each device in case it is needed for response. Q6: How is threat intelligence integrated into your agent? This anser should be about more than getting patterns for the latest indicators of compromise and patterns for attacks involving multiple devices. Integrated threat intel provides the ability to search historical telemetry for attacks you didn’t recognize as attacks at the time (retrospective search). You should also be able to share intelligence with a community of similar organizations, and be able to integrate first-party intel from your vendor with third-party intel from threat intelligence vendors when appropriate. Additionally, the able to send unrecognized files to a network sandbox makes the system more effective and enables quicker recognition of emerging attacks. Q7: How does your product support searching endpoint telemetry for our SOC analysts? Can potentially compromised devices be polled in real time? What about searching through endpoint telemetry history? Search is king for EDR tools, so spend some time with the vendor to understand their search interface and how it can be used to drill down into specific devices or pivot to other devices, to understand which devices an attacker has impacted. You’ll also want to see their search responsiveness, especially with data from potentially hundreds of thousands of endpoints in the system. This is another opportunity to delve into retrospective search capabilities – key for finding malicious activity, especially when you don’t recognize it as bad when it occurs. Also consider the tradeoffs between retention of telemetry and the cost of storing it, because being able to search a longer history window makes both retrospective search and hunting more effective. Q8: Once I get an alert, does the product provide a structured response process? What kind of automation is possible with your product? What about case management? As we have discussed throughout this series, the security skills gap makes it critical to streamline the validation and response processes for less sophisticated analysts. The more structured a tool can make the user experience, the more it can help junior analysts be more productive, faster. That said, you also want to make sure the tool isn’t so structured that analysts have no flexibility to follow their instincts and investigate the attack a different way. Q9: My staff aren’t security ninjas, but I would like to proactively look for attackers. How does your product accelerate a hunt, especially for unsophisticated analysts? Given sufficiently capable search and visualization of endpoint activity, advanced threat hunters can leverage an EDR tool for hunting. Again, you’ll want to learn how the tool can make your less experienced folks more productive and enable them

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Key Technologies for Detection and Response

Now let’s dig into some key EDR technologies which appear across all the use cases: detection, response, and hunting. Agent The agent is deployed to each monitored endpoint, so you be sensitive to its size and its performance hit on devices. A main complaint regarding older endpoint protection was performance impact on devices. The smaller the better, and the less performance impact the better (duh!), but just as important is agent deployability and maintainability. Full capture versus metadata: There are differing strong opinions on how much telemetry to capture and store from each device. Similar to the question of whether to do full network packet capture or to derive metadata from the packet stream, there is a level of granularity available with a full endpoint capture which isn’t available via metadata, for session reconstruction and more detail about what an adversary actually did. But full capture is very resource and storage intensive, so depending on the sophistication of your response team and process, metadata may be your best option. Also consider products that can gather more device telemetry when triggered, perhaps by an alert or connection to a suspicious network. Offline collection: Mobile endpoints are not always on the network, so agents much be able to continue collecting event data when disconnected. Once back on the network, cached endpoint telemetry should be uploaded to the central repository, which can then perform aggregate analysis. Multi-platform support: It’s a multi-platform world, and your endpoint security strategy needs to factor in not just Windows devices, but also Macs and Linux. Even if these platforms aren’t targeted they could be used in sophisticated operations as repositories, staging grounds, and file stores. Different operating systems offer different levels of telemetry access. Security vendors have less access to the kernel on both Mac and Linux systems than on Windows. Also dig into how vendors leverage built-in operating system services to provide sufficiently granular data for analysis. Finally, mobile devices access and store critical enterprise data, although their vulnerability is still subject to debate. We do not consider mobile devices as part of these selection criteria, although for many organizations an integrated capability is an advantage. Kernel vs. user space: There is a semi-religious battle over whether a detection agent needs to live at the kernel level (with all the potential device instability risks that entails), or accurate detection can take place exclusively at the kernel level. Any agent must be able to detect attacks at lower levels of the operating system – such as root kits – as well as any attempts at agent tampering (again, likely outside user space). Again, we don’t get religious, and we appreciate that user-space agents are less disruptive, but are not willing to compromise on detecting all attacks. Tamper proof: Speaking of tampering, to address another long standing issue with traditional EPP, you’ll want to dig into the product security of any agent you install on any endpoint in your environment. We can still remember the awesome Black Hat talks where EPP agent after EPP agent was shown to be more vulnerable than some enterprise applications. Let’s learn from those mistakes and dig into the security and resilience of the detection agents to make sure you aren’t just adding attack surface. Scalability: Finally, scale is a key consideration for any enterprise. You might have 1,000 or 100,000 devices, or even more; but regardless you need to ensure the tool will work for the number of endpoints you need to support, and the staff on your team – both in terms of numbers and sophistication. Of course you need to handle deployment and management of agents, but don’t forget the scalability and responsiveness of analysis and searching. Machine Learning Machine learning is a catch-all term which endpoint detection/response vendors use for sophisticated mathematical analysis across a large dataset to generate models, intended to detect malicious device activity. Many aspects of advanced mathematics are directly relevant to detection and response. Static file analysis: With upwards of a billion malicious file samples in circulation, mathematical malware analysis can pinpoint commonalities across malicious files. With a model of what malware looks like, detection offerings can then search for these attributes to identify ‘new’ malware. False positives are always a concern with static analysis, so part of diligence is ensuring the models are tested constantly, and static analysis should only be one part of malware detection. Behavioral profiles: Similarly, behaviors of malware can be analyzed and profiled using machine learning. Malware profiling produces a dynamic model which can be used to look for malicious behavior. Those are the main use cases for machine learning in malware detection, but there are a number of considerations when evaluating machine learning approaches, including: Targeted attacks: With an increasing amount of attacks specifically targeting individual organizations, it is again important to distinguish delivery from compromise. Targeted attacks use custom (and personalized) methods to deliver attacks – which may or may not involve custom malware – but once the attacker has access to a device they use similar tactics to a traditional malware attack, so machine learning models don’t necessarily need to do anything unusual to deal with targeted attacks. Cloud analytics: The analytics required to develop malware machine learning models are very computationally intensive. Cloud computing is the most flexible way to access that kind of compute power, so it makes sense that most vendors perform their number crunching and modeling in the cloud. Of course the models must be able to run on endpoints to detect malicious activity, so they are typically built in the cloud and executed locally on every endpoint. But don’t get distracted with where computation happens, so long as performance and accuracy are acceptable. Sample sizes: Some vendors claim that their intel is better than another company’s. That’s a hard claim to prove, but sample sizes matter. Looking at a billion malware samples is better than looking at 10,000. Is there a difference between looking at a hundred million samples and at a billion?

Share:
Read Post

Endpoint Advanced Protection Buyer’s Guide: Key Capabilities for Response and Hunting

As we resume posting Endpoint Detection and Response (D/R) selection criteria, let’s start with a focus on the Detection use case. Before we get too far into capabilities, we should clear up some semantics about the word ‘detection’. Referring back to our timeline in Prevention Selection Criteria, detection takes place during execution. You could make the case that detection of malicious activity is what triggers blocking, and so a pre-requisite to attack prevention – without detection, how could you know what to prevent?. But that’s too confusing. For simplicity let’s just say prevention means blocking an attack before it compromises a device, and can happen both prior to and during execution. Detection happens during and after execution, and implies a device was compromised because the attack was not prevented. Data Collection Modern detection requires significant analysis across a wide variety of telemetry sources from endpoints. Once telemetry is captured, a baseline of normal endpoint activity is established and used to look for anomalous behavior. Given the data-centric nature of endpoint detection, an advanced endpoint detection offering should aggregate and analyze the following types of data: Endpoint logs: Endpoints can generate a huge number of log entries, and an obvious reaction is to restrict the amount of log data ingested, but we recommend collecting as much log data from endpoint as possible. The more granular the better, given the sophistication of attackers and their ability to target anything on a device. If you do not collect the data on the endpoint, there is no way to get it when you need it to investigate later. Optimally, endpoint agents collect operating system activity alongside all available application logs. This includes identity activity such as new account creation and privilege escalation, process launching, and file system activity (key for detection ransomware). There is some nuance to how long you retain collected data because it can be voluminous and compute-intensive to process and analyze – both on devices and centrally. Processes: One of the more reliable ways to detect malicious behavior is by which OS processes are started and where they are launched from. This is especially critical when detecting scripting attacks because attackers love using legitimate system processes to launch malicious child processes. Network traffic: A compromised endpoint will inevitably connect to a command and control network for instructions and to download additional attack code. These actions can be detected by monitoring the endpoint’s network stack. An agent can also watch for connections to known malicious sites and for reconnaisance activity on the local network. Memory: Modern file-less attacks don’t store any malicious code in the file system, so modern advanced detection requires monitoring and analyzing activity within endpoint memory. Registry: As with memory-based attacks, attackers frequently store malicious code within the device registry to evade file system detection. So advanced detection agents need to monitor and analyze registry activity for signs of misuse. Configuration changes: It’s hard for attackers to totally obfuscate what is happening on an endpoint, so on-device configuration changes can indicate an attack. File integrity: Another long-standing method attack detection is monitoring changes to system files, because changes to such files outside administrative patching usually indicates something malicious. An advanced endpoint agent should collect this data and monitor for modified system files. Analytics As mentioned above, traditional endpoint detection relied heavily on simple file hashes and behavioral indicators. With today’s more sophisticated attacks, a more robust and scientific approach is required to distinguish legitimate from malicious activity. This more scientific approach is centered around machine learning techniques (advanced mathematics) to recognize the activity of adversaries before and during attacks. Modern detection products use huge amounts of endpoint telemetry (terabytes) to train mathematical models to detect anomalous activity and find commonalities between how attackers behave. These models then generate an attack score to prioritize alerts. Profiling applications: Detecting application misuse is predicated on understanding legitimate usage of the application, so the mathematical models analyze both legitimate and malicious usage of frequently targeted applications (browsers, office productivity suites, email clients, etc.). This is a similar approach to attack prevention, discussed in our Prevention Selection Criteria guide. Anomaly detection: With profiles in hand and a consistent stream of endpoint telemetry to analyze, the mathematical models attempt to identify abnormal device activity. When suspicion is high they trigger an alert, the device is marked suspicious, and an analyst determines whether the alert is legitimate. Tuning: To avoid wasting too much time on false positives, the detection function needs to constantly learn what is really an attack and what isn’t, based on the results of detection in your environment. In terms of process, you’ll want to ensure your feedback is captured by your detection offering, and used to constantly improve your models to keep detection precise and current. Risk scoring: We aren’t big fans of arbitrary risk scoring because the underlying math can be suspect. That said, there is a role for risk scoring in endpoint attack detection: prioritization. With dozens of alerts hitting daily – perhaps significantly more – it is important to weigh which alerts warrant immediate investigation, and a risk score should be able to tell you. Be sure to investigate the underlying scoring methodology, track scoring accuracy, and tune scoring to your environment. Data management: Given the analytics-centric nature of EDR, being able to handle and analyze large amounts of endpoint telemetry collected from endpoints is critical. Inevitably you’ll run into the big questions: where to store all the data, how to scale analytics to tens or hundreds of thousands of endpoints, and how to economically analyze all your security data. But ultimately your technology decision comes down to a few factors: Cost: Whether or not the cost of storage and analytics is included in the service (some vendors store all telemetry in a cloud instance) or you need to provision a compute cluster in your data center to perform the analysis, there is a cost to crunching all the numbers. Make sure hardware, storage, and networking costs (including management)

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.