I have received some great feedback on my post last week on bastion accounts and networks. Mostly that I left some gaps in my explanation which legitimately confused people. Plus, I forgot to include any pretty pictures. Let’s work through things a bit more.
First, I tended to mix up bastion accounts and networks, often saying “account/networks”. This was a feeble attempt to discuss something I mostly implement in Amazon Web Services that can also apply to other providers. In Amazon an account is basically an AWS subscription. You sign up for an account, and you get access to everything in AWS. If you sign up for a second account, all that is fully segregated from every other customer in Amazon. Right now (and I think this will change in a matter of weeks) Amazon has no concept of master and sub accounts: each account is totally isolated unless you use some special cross-account features to connect parts of accounts together. For customers with multiple accounts AWS has a mechanism called consolidated billing that rolls up all your charges into a single account, but that account has no rights to affect other accounts. It pays the bills, but can’t set any rules or even see what’s going on.
It’s like having kids in college. You’re just a checkbook and an invisible texter.
If you, like Securosis, use multiple accounts, then they are totally segregated and isolated. It’s the same mechanism that prevents any random AWS customer from seeing anything in your account. This is very good segregation. There is no way for a security issue in one account to affect another, unless you deliberately open up connections between them. I love this as a security control: an account is like an isolated data center. If an attacker gets in, he or she can’t get at your other data centers. There is no cost to create a new account, and you only pay for the resources you use. So it makes a lot of sense to have different accounts for different applications and projects. Free (virtual) data centers for everyone!!!
This is especially important because of cloud metastructure. All the management stuff like web consoles and APIs that enables you to do things like create and destroy entire class B networks with a couple API calls. If you lump everything into a single account, more administrators (and other power users) need more access, and they all have more power to disrupt more projects. This is compartmentalization and segregation of duties 101, but we have never before had viable options for breaking everything into isolated data centers. And from an operational standpoint, the more you move into DevOps and PaaS, the harder it is to have everyone running in one account (or a few) without stepping on each other.
These are the fundamentals of my blast radius post.
One problem comes up when customers need a direct connection from their traditional data center to the cloud provider. I may be all rah rah cloud awesome, but practically speaking there are many reasons you might need to connect back home. Managing this for multiple accounts is hard, but more importantly you can run into hard limits due to routing and networking issues.
That’s where a bastion account and network comes in. You designate an account for your Direct Connect. Then you peer into that account (in AWS using cross-account VPC peering support) any other accounts that need data center access. I have been saying “bastion account/network” because in AWS this is a dedicated account with its own dedicated VPC (virtual network) for the connection. Azure and Google use different structures, so it might be a dedicated virtual network within a larger account, but still isolated to a subscription, or sub-account, or whatever mechanism they support to segregate projects. This means:
- Not all your accounts need this access, so you can focus on the ones which do.
- You can tightly lock down the network configuration and limit the number of administrators who can change it.
- Those peering connections rely on routing tables, and you can better isolate what each peered account or network can access.
- One big Direct Connect essentially “flattens” the connection into your cloud network. This means anyone in the data center can route into and attack your applications in the cloud. The bastion structure provides multiple opportunities to better restrict network access to destination accounts. It is a way to protect your cloud(s) from your data center.
- A compromise in one peered account cannot affect another account. AWS networking does not allow two accounts peered to the same account to talk to each other. So each project is better isolated and protected, even without firewall rules.
For example the administrator of a project can have full control over their account and usage of AWS services, without compromising the integrity of the connection back to the data center, which they cannot affect – they only have access to the network paths they were provided. Their project is safe, even if another project in the same organization is totally compromised.
Hopefully this helps clear things up. Multiple accounts and peering is a powerful concept and security control. Bastion networks extend that capability to hybrid clouds. If my embed works, below you can see what it looks like (a VPC is a virtual network, and you can have multiple VPCs in a single account):
Posted at Monday 7th November 2016 8:36 pm
(0) Comments •
By Adrian Lane
This post is focused on security testing your code and container, and verifying that both conform to security and operational practices. One of the major advances over the last year or so is the introduction of security features for the software supply chain, from both Docker itself and a handful of third-party vendors. All the solutions focus on slightly different threats to container construction, with Docker providing tools to certify that containers have made it through your process, while third-party tools are focused on vetting the container contents. So Docker provides things like process controls, digital signing services to verify chain of custody, and creation of a Bill of Materials based on known trusted libraries. In contrast, third-party tools to harden container inputs, analyze resource usage, perform static code analysis, analyze the composition of libraries, and check against known malware signatures; they can then perform granular policy-based container delivery based on the results. You will need a combination of both, so we will go into a bit more detail:
Container Validation and Security Testing
- Runtime User Credentials: We could go into great detail here about runtime user credentials, but will focus on the most important thing: Don’t run the container processes as
root, as that provides attackers access to attack other containers or the Docker engine. If you get that right you’re halfway home for IAM. We recommend using specific user accounts with restricted permissions for each class of container. We do understand that roles and permissions change over time, which requires some work to keep permission maps up to date, but this provides a failsafe when developers change runtime functions and resource usage.
- Security Unit Tests: Unit tests are a great way to run focused test cases against specific modules of code – typically created as your dev teams find security and other bugs – without needing to build the entire product every time. This can cover things such as XSS and SQLi testing of known attacks against test systems. Additionally, the body of tests grows over time, providing a regression testbed to ensure that vulnerabilities do not creep back in. During our research, we were surprised to learn that many teams run unit security tests from Jenkins. Even though most are moving to microservices, fully supported by containers, they find it easier to run these tests earlier in the cycle. We recommend unit tests somewhere in the build process to help validate the code in containers is secure.
- Code Analysis: A number of third-party products perform automated binary and white box testing, failing the build if critical issues are discovered. We recommend you implement code scans to determine if the code you build into a container is secure. Many newer tools have full RESTful API integration within the software delivery pipeline. These tests usually take a bit longer to run, but still fit within a CI/CD deployment framework.
- Composition Analysis: A useful technique is to check library and supporting code against the CVE (Common Vulnerabilities and Exposures) database to determine whether you are using vulnerable code. Docker and a number of third parties provide tools for checking common libraries against the CVE database, and they can be integrated into your build pipeline. Developers are not typically security experts, and new vulnerabilities are discovered in common tools weekly, so an independent checker to validate components of your container stack is essential.
- Resource Usage Analysis: What resources does the container use? What external systems and utilities does it depend upon? To manage the scope of what containers can access, third-party tools can monitor runtime access to environment resources both inside and outside the container. Basically, usage analysis is an automated review of resource requirements. These metrics are helpful in a number of ways – especially for firms moving from a monolithic to a microservices architecture. Stated another way, this helps developers understand what references they can remove from their code, and helps Operations narrow down roles and access privileges.
- Hardening: Over and above making sure what you use is free of known vulnerabilities, there are other tricks for securing applications before deployment. One is to check the contents of the container and remove items that are unused or unnecessary, reducing attack surface. Don’t leave hard-coded passwords, keys, or other sensitive items in the container – even though this makes things easy for you, it makes them much easier for attackers. Some firms use manual scans for this, while others leverage tools to automate scanning.
- App Signing and Chain of Custody: As mentioned earlier, automated builds include many steps and small tests, each of which validates that some action was taken to prove code or container security. You want to ensure that the entire process was followed, and that somewhere along the way some well-intentioned developer did not subvert the process by sending along untested code. Docker now provides the means to sign code segments at different phases of the development process, and tools to validate the signature chain. While the code should be checked prior to being placed into a registry or container library, the work of signing images and containers happens during build. You will need to create specific keys for each phase of the build, sign code snippets on test completion but before the code is sent onto the next step in the process, and – most importantly – keep these keys secured so an attacker cannot create their own code signature. This gives you some guarantee that the vetting process proceeded as intended.
Posted at Monday 7th November 2016 5:30 pm
(0) Comments •
Mike and Rich had a call this week with another prospect who was given some pretty bad cloud advice. We spend a little time trying to figure out why we keep seeing so much bad advice out there (seriously, BIG B BAD not OOPSIE bad). Then we focus on the key things to look for to figure out when someone is leading you down the wrong path in your cloud migration.
Oh… and for those with sensitive ears, time to engage the explicit flag.
Watch or listen:
Posted at Monday 7th November 2016 2:00 pm
(3) Comments •
By Adrian Lane
As we mentioned in our last post, most people don’t seem to consider the build environment when thinking about container security, but it’s important. Traditionally, the build environment is the domain of developers, and they don’t share a lot of details with outsiders (in this case, Operations folks). But this is beginning to change with Continuous Integration (CI) or full Continuous Deployment (CD), and more automated deployment. The build environment is more likely to go straight into production. This means that operations, quality assurance, release management, and other groups find themselves having to cooperate on building automation scripts and working together more closely. Collaboration means a more complex, distributed working environment, with more stakeholders having access. DevOps is rapidly breaking down barriers between groups, even getting some security teams to contribute test scripts and configuration updates. Better controls are needed to restrict who can alter the build environment and update code, and an audit process to validate who did what.
Don’t forget why containers are so attractive to developers. First, a container simplifies building and packaging application code – abstracting the app from its physical environment – so developers can worry about the application rather than its supporting systems. Second, the container model promotes lightweight services, breaking large applications down into small pieces, easing modification and scaling – especially in cloud and virtual environments. Finally, a very practical benefit is that container startup is nearly instant, allowing agile scaling up and down in response to demand. It is important to keep these in mind when considering security controls, because any control that reduces one of these core advantages will not be considered, or is likely to be ignored.
Build environment security breaks down into two basic areas. The first is access and usage of the basic tools that form the build pipeline – including source code control, build tools, the build controller, container management facilities, and runtime access. At Securosis we often call this the “management plane”, as these interfaces — whether API or GUI – are used to set access policies, automate behaviors and audit activity. Second is security testing of your code and the container, validating it conforms to security and operational practices. This post will focus on the former.
Securing the Build
Here we discuss the steps to protect your code – more specifically to protect build systems, to ensure they implement the build process you intended. This is conceptually very simple, but there are many pieces to this puzzle, so implementation can get complicated.
People call this Secure Software Delivery, Software Supply Chain Management, and Build Server Security – take your pick. It is management of the assemblage of tools which oversee and implement your process. For our purposes today these terms are synonymous.
Following is a list of recommendations for securing platforms in the build environment to ensure secure container construction. We include tools from Docker and others which that automate and orchestrate source code, building, the Docker engine, and the repository. For each tool you will employ a combination of identity management, roles, platform segregation, secure storage of sensitive data, network encryption, and event logging. Some of the subtleties follow.
- Source Code Control: Stash, Git, GitHub, and several variants are common. Source code control is one of the tools with a wide audience, because it is now common for Security, Operations, and Quality Assurance to all contribute code, tests, and configuration data. Distributed access means all traffic should run over SSL or VPN connections. User roles and access levels are essential for controlling who can do what, but we recommend requiring token-based or certificate-based authentication, with two-factor authentication a minimum for all administrative access.
- Build Tools and Controllers: The vast majority of development teams we spoke with use build controllers like Bamboo and Jenkins, with these platforms becoming an essential part of their automated build processes. These provide many pre-, post- and intra-build options, and can link to a myriad of other facilities, complicating security. We suggest full network segregation of the build controller system(s), and locking down network connections down to source code controller and docker services. If you can, deploy build servers as on-demand containers – this ensures standardization of the build environment and consistency of new containers. We recommend you limit access to the build controllers as tightly as possible, and leverage built-in features to restrict capabilities when developers need access. We also suggest locking down configuration and control data to prevent tampering with build controller behavior. We recommend keeping any sensitive data, such as
ssh keys, API access keys, database credentials, and the like in a secure database or data repository (such as a key manager,
.dmg file, or vault) and pulling credentials on demand to ensure sensitive data never sits on disk unprotected. Finally, enable logging facilities or add-ons available for the build controller, and stream output to a secure location for auditing.
- Docker: You will use Docker as a tool for pre-production as well as production, building the build environment and test environments to vet new containers. As with build controllers like Jenkins, you’ll want to limit Docker access in the build environment to specific container administrator accounts. Limit network access to accept content only from the build controller system(s) and whatever trusted repository or registry you use.
Our next post will discuss validation of individual containers and their contents.
Posted at Sunday 6th November 2016 9:00 pm
(0) Comments •
By Mike Rothman
Now that we have gotten through 80% of the Endpoint Advanced Protection lifecycle we can focus on remediation, and then how to start getting value from these new alternatives.
Once you have detailed information from the investigation, what are the key decision points? As usual, to simplify we step back to the who, what, where, when, and how of the situation. And yes, any time we can make difficult feel seem like being back in grade school, we do.
- Who? The first question is about organizational dynamics. In this new age, when advanced attackers seem to be the norm, who should take lead in remediation? Without delving into religion or other politics, the considerations are really time and effectiveness. Traditionally IT Operations has tools and processes for broad changes, reimaging, or network-based workarounds. But for advanced malware or highly sensitive devices, or when law enforcement is involved, you might also want a small Security team which can remediate targeted devices.
- What? This question is less relevant because you are remediating a device, right? There may be some question of whether to prevent further outbreaks at the network level by blocking certain sites, applications, users, or all of the above, but ultimately we are talking about endpoints.
- Where? One of the challenges of dealing with endpoints is that you have no idea where a device will be at any point in time. So remote remediation is critical to any Endpoint Advanced Protection lifecycle. There are times you will need to reimage a machine, and that’s not really feasible remotely. But having a number of different options for remediation depending on device location can ensure minimal disruption to impacted employees.
- When? This is one of the most challenging decisions, because there are usually reasonable points for both sides of the argument: whether to remediate devices immediately, or whether to quarantine the device and observe the adversary a bit to gain intelligence. We generally favor quick and full eradication, which requires leveraging retrospection to figure all impacted devices (even if they aren’t currently participating in the attack) and cleaning devices as quickly as practical. But there are times which call for more measured remediation.
- How? This question is whether reimaging the device, or purging malware without reimaging, is the right approach. We favor reimaging because of the various ways attackers can remain persistent on a device. Even if you think a device has been cleaned… perhaps it really wasn’t. But with the more granular telemetry gathered by today’s endpoint investigation and forensics tools (think DVR playback), it is possible to reliably back out all the changes made, even within the OS innards. Ultimately the decision comes back to the risk posed by the device, as well as disruption to the employee. The ability to both clean and reimage is key to the remediation program.
There is a broad range of available actions, so we advocate flexibility in remediation – as in just about everything. We don’t think there is any good one-size-fits-all approach any more; each remediation needs to be planned according to risk, attacker sophistication, and the skills and resources available between Security and Operations teams. Taking all that into account, you can choose the best approach.
One of the most frustrating aspects of doing security is having to spend money on things you know don’t really work. Traditional endpoint protection suites fit into that category. Which begs the question: are Endpoint Advanced Protection products robust enough, effective enough, and broad enough to replace the EPP incumbents?
To answer this question you must consider it from two different standpoints. First, the main reason you renew your anti-malware subscription each year is for that checkbox on a compliance checklist. So get a sense of whether your assessor/auditor would you a hard time if you come up with something that doesn’t use signatures to detect malicious activity. If they are likely to push back, maybe find a new assessor. Kidding aside, we haven’t seen much pushback lately, in light of the overwhelming evidence that Endpoint Advanced Detection/Prevention is markedly more effective at blocking current attacks. That said, it would be foolish to sign a purchase order to swap out protection on 10,000 devices without at least putting a call into your assessor and understanding whether there is precedent for them to accept a new style of agent.
You will also need to look at your advanced endpoint offering for feature parity. Existing EPP offerings have been adding features (to maintain price points) for a decade. A lot of stuff you don’t need has been added, but maybe there is some you do use. Make sure replacing your EPP won’t leave a gap you will just need to fill with another product.
Keep in mind that some EPP features are now bundled into operating systems. For example, full disk encryption is now available free as part of the operating system. In some cases you need to manage these OS-level capabilities separately, but that weighs against an expensive renewal which doesn’t effectively protect endpoints.
Finally, consider price. Pretty much every enterprise tells us they want to reduce the number of security solutions they need. And supporting multiple agents and management consoles to protect endpoints doesn’t make much sense. In your drive to consolidate, play off aggressive new EAP vendors against desperate incumbents willing to perform unnatural acts to keep business.
Endpoint protection has been a zero-sum game for a while. Pretty much every company has some kind of endpoint protection strategy. So every deal that one vendor wins is lost by at least one competitor. Vendors make it very easy to migrate to their products by providing tools and services to facilitate the transition. Of course you need to verify what’s involved in moving wholesale to a new product, but the odds are it will be reasonably straightforward.
Many new EAP tools are managed in the cloud. Typically that saves you from needing to install an onsite management server to test and deploy. This makes things much easier and facilitates migration – employees can connect to a cloud-based software installation/distribution engine, without needing to bring devices to HQ for upgrades. Some organizations still resist cloud-based management; if this sounds like you, you’ll want to check with the vendor to ensure they can support on-premise installation.
Finally, when planning the migration you need to consider which security functions should be implemented on each category of devices, as defined by the risk they pose. Earlier in this series we talked about categorizing devices into risk buckets, and implementing controls based on the risk they present. You can install or enable different EAP modules depending on the needs of the employee or device.
The vendor may well make it worth your while to license all their capabilities on all your devices. There is nothing wrong with that, if the price is right. But do not consider only purchase price – keep in mind the total cost of managing the various capabilities across all your devices. Also consider the impact on employees in terms of device performance and user experience. Not every device needs application whitelisting, for example. Or EDR, given the challenge of moving endpoint telemetry across the network.
Finally, any new EAP offering needs to play nice with existing enterprise security tools. Here are a few, with their integration points.
- Network Controls: If you detect an attack on an endpoint and isolate the C&C (Command and Control) network it’s connecting to, wouldn’t it be great to automagically block that address so other devices don’t connect to that bot network? That’s why many EAP vendors also offer network security devices, or at least partner with those players to offer an integrated experience.
- Security Monitoring/Analytics: An EAP product – especially EDR functionality – generates a bunch of telemetry which can be useful within your security monitoring environment. So the ability to send it directly to a SIEM or security analytics program helps leverage it in any analyses you perform.
- Forensics/Case Management: If you can foresee a situation where you’ll want to prosecute an attacker, you need the ability to integrate with your existing case management product. This is about protecting the chain of custody of captured data, and allowing more sophisticated forensics tools to use endpoint data to better determine what malware does to a device.
- Operations Platform: Finally, we need to highlight potential integration with an IT ops platform, especially as it relates to endpoint hygiene and asset management. An EAP products gathers much more detailed device data, which can be very useful to Operations.
Security is too complicated for any tool to stand on its own, so any EAP offering’s ability to send and receive data, to and from your other security tools, is a key selection criteria.
With that we have run through the Endpoint Advanced Protection lifecycle. At this point in time we see legitimate alternatives to the ineffective EPP products which have been holding you and your organization hostage for years. But before jumping in with both feet test the tool, plan and stage your migration, and most importantly implement a risk-based approach to protecting endpoints. There are many alternatives for protecting devices, so it’s more important than ever to match your security controls to the risk presented by the device.
Posted at Friday 4th November 2016 9:06 pm
(0) Comments •
In an earlier post I mentioning bastion accounts or virtual networks. Amazon calls these “transit VPCs” and has a good description. Before I dive into details, the key difference is that I focus on using the concept as a security control, and Amazon for network connectivity and resiliency. That’s why I call these “bastion accounts/networks”.
Here is the concept and where it comes from:
- As I have written before, we recommend you use multiple account with a partitioned network architecture structure, which often results in 2-4 accounts per cloud application stack (project). This limits the ‘blast radius’ of an account compromise, and enables tighter security control on production accounts.
- The problem is that a fair number of applications deployed today still need internal connectivity. You can’t necessarily move everything up to the cloud right away, and many organizations have entirely legitimate reasons to keep some things internal. If you follow our multiple-account advice, this can greatly complicate networking and direct connections to your cloud provider.
- Additionally, if you use a direct connection with a monolithic account & network at your cloud provider, that reduces security on the cloud side. Your data center is probably the weak link – unless you are as good at security as Amazon/Google/Microsoft. But if someone compromises anything on your corporate network, they can use it to attack cloud assets.
- One answer is to create a bastion account/network. This is a dedicated cloud account, with a dedicated virtual network, fo the direct connection back to your data center. You then peer the bastion network as needed with any other accounts at your cloud provider. This structure enables you to still use multiple accounts per project, with a smaller number of direct connections back to the data center.
- It even supports multiple bastion accounts, which only link to portions of your data center, so they only gain access to the necessary internal assets, thus providing better segregation. Your ability to do this depends a bit on your physical network infrastructure, though.
- You might ask how this is more secure. It provides more granular access to other accounts and networks, and enables you to restrict access back to the data center. When you configure routing you can ensure that virtual networks in one account cannot access another account. If you just use a direct connect into a monolithic account, it becomes much harder to manage and maintain those restrictions.
- It also supports more granular restrictions from your data center to your cloud accounts (some of which can be enforced at a routing level – not just firewalls), and because you don’t need everything to phone home, accounts which don’t need direct access back to the data center are never exposed.
A bastion account is like a weird-ass DMZ to better control access between your data center and cloud accounts; it enables multiple account architectures which would otherwise be impossible. You can even deploy virtual routing hardware, as per the AWS post, for more advanced configurations.
It’s far too late on a Friday for me to throw a diagram together, but if you really want one or I didn’t explain clearly enough, let me know via Twitter or a comment and I’ll write it up next week.
Posted at Friday 4th November 2016 9:05 pm
(3) Comments •
By Adrian Lane
After a somewhat lengthy hiatus – sorry about that – I will close out this series over the next couple days.
In this post I want to discuss container threat models – specifically for Docker containers. Some of these are known threats and issues, some are purely lab exercises for proof-of-concept, and others are threat vectors which attackers have yet to exploit – likely because there is so much low-hanging fruit for them elsewhere.
So what are the primary threats to container environments?
One area that needs protection is the build environment. It’s not first on most people’s lists for container security, but it’s first on mine because it’s the easiest place to insert malicious code. Developers tend to loathe security in development as it slows them down. This is why there is an entire industry dedicated to test data management and masked data: developers tend to do an end-run around security if it slows down their build and testing process.
What kinds of threats are we talking about specifically? Things like malicious or moronic source code changes. Malicious or moronic alterations to automated build controllers. Configuration scripts with errors, or with credentials sitting around. The addition of insecure libraries or back-rev/insecure versions of existing code. We want to know if the runtime code has been scanned for vulnerabilities. And we worry about a failure to audit all the above and catch any errors.
What the hell is in the container? What does it do? Is that even the correct version of the container? These are common questions I hear a lot from operations folks. They have no idea. Nor do they know what permissions the container has or requires – all too often lazy developers run everything as
root, breaking operational security models and opening up the container engine and underlying OS to various attacks. And security folks are unaware of what – if any – container hardening may have been performed. You want to know the container’s contents have been patched, vetted, hardened, and registered prior to deployment.
So what are the threats to worry about? We worry a container will attack or infect another container. We worry a container may quietly exfiltrate data, or just exhibit any other odd behavior. We worry containers have been running a long time, and not rotated to newer patched versions. We worry about whether the network has been properly configured to limit damage from a compromise. And we worry about attackers probing containers, looking for vulnerabilities.
Finally, the underlying platform security is a concern. We worry that a container will attack the underlying host OS or the container engine. If it succeeds it’s pretty much game over for that cluster of containers, and you may have given malicious code resources to pivot and attack other systems.
If you are in the security industry long enough, you see several patterns repeat over and over. One is how each hot new tech becomes all the rage, finds its way into your data center, and before you have a chance to fully understand how it works, someone labels it “business critical”. That’s about when security and operations teams get mandated to secure that hot new technology. It’s a natural progression – every software platform needs to focus on attaining minimum usability, scalability, and performance levels before competitors come and eat their lunch. After a certain threshold of customer adoption is reached – when enterprises really start using it – customers start asking, “Hey, how do we secure this thing?”
The good news is that Docker has reached that point in its evolutionary cycle. Security is important to Docker customers, so it has become important to Docker as well. They have now implemented a full set of IAM capabilities: identity management, authentication, authorization, and (usually) single sign-on or federation – along with encrypted communications to secure data in transit. For the rest of the features enterprises expect: configuration analysis, software assessment, monitoring, logging, encryption for data at rest, key management, development environment security, etc. – you’re looking at a mixture of Docker and third-party solution providers to fill in gaps. We also see cloud providers like Azure and AWS mapping their core security services over the container environment, providing different security models from what you might employ on-premise. This is an interesting time for container security in general… and a bit confusing, as you have a couple different ways to address any given threat. Next we will delve into how to address these threats at each stage of the pipeline, with build environment security.
Posted at Thursday 3rd November 2016 12:40 am
(1) Comments •
The following steps are very specific to AWS, but with minimal modification they will work for other cloud platforms which support multi factor authentication. And if your cloud provider doesn’t support MFA and the other features you need to follow these steps… find another provider.
- Register with a dedicated email address that follows this formula: email@example.com. Instead of project name you could use a business unit, cost code, or some other team identifier. The environment is dev/test/prod/whatever. The most important piece is the random seed added to the email address. This prevents attackers from figuring out your naming scheme, and then your account with email.
- Subscribe the project administrators, someone from central ops, and someone from security to receive email sent to that address.
- Establish a policy that the email account is never otherwise directly accessed or used.
- Disable any access keys (API credentials) for the root account.
- Enable MFA and set it up with a hardware token, not a soft token.
- Use a strong password stored in a password manager.
- Set the account security/recovery questions to random human-readable answers (most password managers can create these) and store the answers in your password manager.
- Write the account ID and username/email on a sticker on the MFA token and lock it in a central safe that is accessible 24/7 in case of emergency.
- Create a full-administrator user account even if you plan to use federated identity. That one can use a virtual MFA device, assuming the virtual MFA is accessible 24/7. This becomes your emergency account in case something really unusual happens, like your federated identity connection breaking down (it happens – I have a call with someone this week who got locked out this way).
After this you should never need to use your root account. Always try to use a federated identity account with admin rights first, then you can drop to your direct AWS user account with admin rights if your identity provider connection has issues. If you need the root account it’s a break-glass scenario, the worst of circumstances. You can even enforce dual authority on the root account by separating who has access to the password manager and who has access to the physical safe holding the MFA card.
Setting all this up takes less than 10 minutes once you have the process figured out. The biggest obstacle I run into is getting new email accounts provisioned. Turns out some email admins really hate creating new accounts in a timely manner. They’ll be first up against the wall when the revolution comes, so they have that going for them. Which is nice.
Posted at Wednesday 2nd November 2016 6:26 pm
(0) Comments •
Yesterday I warned against building a monolithic cloud infrastructure to move into cloud computing. It creates a large blast radius, is difficult to secure, costs more, and is far less agile than the alternative. But I, um… er… uh… didn’t really mention an alternative.
Here is how I recommend you start a move to the cloud. If you have already started down the wrong path, this is also a good way to start getting things back on track.
- Pick a starter project. Ideally something totally new, but migrating an existing project is okay, so long as you can rearchitect it into something cloud native.
- Applications that are horizontally scalable are often good fits. These are stacks without too many bottlenecks, which allow you to break up jobs and distribute them. If you have a message queue, that’s often a good sign. Data analytics jobs are also a very nice fit, especially if they rely on batch processing.
- Anything with a microservice architecture is also a decent prospect.
- Put together a cloud team for the project, and include ops and security – not just dev. This team is still accountable, but they need extra freedom to learn the cloud platform and adjust as needed. They have additional responsibility for documenting and reporting on their activities to help build a blueprint for future projects.
- Train the team. Don’t rely on outside consultants and advice – send your own people to training specific to their role and the particular cloud provider.
- Make clear that the project is there to help the organization learn, and the goal is to design something cloud native – not merely to comply with existing policies and standards. I’m not saying you should (or can) throw those away, but the team needs flexibility to re-interpret them and build a new standard for the cloud. Meet the objectives of the requirements, and don’t get hung up on existing specifics.
- For example, if you require a specific firewall product, throw that requirement out the window in favor of your cloud provider’s native capabilities. If you require AV scanning on servers, dump it in favor of immutable instances with remote access disabled.
- Don’t get hung up on being cloud provider agnostic. Learn one provider really well before you start branching out. Keep the project on your preferred starting provider, and dig in deep.
- This is also a good time to adopt DevOps practices (especially continuous integration). It is a very effective way to manage cloud infrastructure and platforms.
- Once you get that first successful project up and running, then use that team to expand knowledge to the next team and the next project.
- Let each project use its own cloud accounts (around 2-4 per project is normal). If you need connections back to the data center, then look at a bastion/transit account/virtual network and allow the project accounts to peer with the bastion account.
- Whitelist that team for direct
ssh access to the cloud provider to start, or use a jump box/VPN. This reduces the hang-ups of having to route everything through private network connections.
- Use an identity broker (Ping/Okta/RSA/IBM/etc.) instead of allowing the team to create their own user accounts at the cloud provider. Starting off with federated identity avoids some problems you will otherwise hit later.
And that’s it: start with a single project, staff it and train people on the platform they plan to use, build something cloud native, and then take those lessons and use them on the next one.
I have seen companies start with 1-3 of these and then grow them out, sometimes quite quickly. Often they simultaneously start building some centralized support services so everything isn’t in a single team’s silo. Learn and go native early on, at a smaller scale, rather than getting overwhelmed by starting too big. Yeah yeah, too simple, but it’s surprising how rarely I see organizations start out this way.
Posted at Tuesday 1st November 2016 6:49 pm
(0) Comments •
By Mike Rothman
As we discussed previously, despite all the cool innovation happening to effectively prevent compromises on endpoints, the fact remains that you cannot stop all attacks. That means detecting the compromise quickly and effectively, and then figuring out how far the attack has spread within your organization, continues to be critical.
The fact is, until fairly recently endpoint detection and forensics was a black art. Commercial endpoint detection tools were basically black boxes, not really providing visibility to security professionals. And the complexity of purpose-built forensics tools put this capability beyond the reach of most security practitioners. But a new generation of endpoint detection and response (EDR) tools is now available, with much better visibility and more granular telemetry, along with a streamlined user experience to facilitate investigations – regardless of analyst capabilities.
Of course it is better to have a more-skilled analyst than a less-skilled one, but given the hard truth of the security skills gap, our industry needs to provide better tools to make those less-skilled analysts more productive, faster. Now let’s dig into some key aspects of EDR.
In order to perfrom any kind of detection, you need telemetry from endpoints. This begs the question of how much to collect from each device, and how long to keep it. This borders on religion, but we remain firmly in the camp that more data is better than less. Some tools can provide a literal playback of activity on the endpoint, like a DVR recording of everything that happened. Others focus on log events and other metadata to understand endpoint activity.
You need to decide whether to pull data from the kernel or from user space, or both. Again, we advocate for data, and there are definite advantages to pulling data from the kernel. Of course there are downsides as well, including potential device instability from kernel interference.
Again recommend the risk-centric view on protecting endpoints, as discussed in our prevention post. Some devices possess very sensitive information, and you should collect as much telemetry as possible. Other devices present less risk to the enterprise, and may only warrant log aggregation and periodic scans.
There are also competing ideas about where to store the telemetry captured from all these endpoint devices. Some technologies are based upon aggregating the data in an on-premise repository, others perform real-time searches using peer-to-peer technology, and a new model involves sending the data to a cloud-based repository for larger scale-analysis.
Again, we don’t get religious about any specific approach. Stay focused on the problem you are trying to solve. Depending on the organization’s sensitivity, storing endpoint data in the cloud may not be politically feasible. On the other hand it might be very expensive to centralize data in a highly distributed organization. So the choice of technology comes down to the adversary’s sophistication, along with the types and locations of devices to be protected.
It’s not like threat intelligence is a new concept in the endpoint protection space. AV signatures are a form of threat intel – the industry just never calls it that. What’s different is that now threat intelligence goes far beyond just hashes of known bad files, additionally looking for behavioral patterns that indicate an exploit. Whether the patterns are called Indicators of Compromise (IoC), Indicators or Attack (IoA), or something else, endpoints can watch for them in real time to detect and identify attacks.
This new generation of threat intelligence is clearly more robust than yesterday’s signatures. But that understates the impact of threat intel on EDR. These new tools provide retrospection, which is searching the endpoint telemetry data store for newly emerging attack patterns. This allows you to see if a new attack has been seen in the recent past on your devices, before you even knew it was an attack.
The goal of detection/forensics is to shorten the window between compromise and detection. If you can search for indicators when you learn about them (regardless of when the attack happens), you may be able to find compromised devices before they start behaving badly, and presumably trigger other network-based detection tactics.
A key aspect of selecting any kind of advanced endpoint protection product is to ensure the vendor’s research team is well staffed and capable of keeping up with the pace of emerging attacks. The more effective the security research team is, the more emerging attacks you will be able to look for before an adversary can compromise your devices. This is the true power of threat intelligence.
Once you have all of the data gathered and have enriched it with external threat intelligence, you are ready to look for patterns that may indicate compromised devices. Analytics is now a very shiny term in security circles, which we find very amusing. Early SIEM products offered analytics – you just needed to tell them what to look for. And it’s not like math is a novel concept for detecting security attacks. But security marketers are going to market, so notwithstanding the particular vernacular, more sophisticated analytics do enable more effective detection of sophisticated attacks today.
But what does that even mean? First we should define probably the term machine learning, because every company claims they do this to find zero-day attacks and all other badness with no false positives or latency. No, we don’t believe that hype. But the advance of analytical techniques, harnessed by math ninja known as data scientists, enables detailed analysis of every attack to find commonalities and patterns. These patterns can then be used to find malicious code or behavior in new suspicious files. Basically security research teams sets up their math machines to learn about these patterns. Ergo machine learning. Meh.
The upshot is that these patterns can be leveraged for both static analysis (what the file looks like) and dynamic analysis (what the software does), making detection faster and more accurate.
Once you have detected a potentially compromised devices you need to engage your response process. We have written extensively about incident response (including Using TI in Incident Response and Incident Response in the Cloud Age), so we won’t go through the details of the IR process again here. Though as we have described, advanced endpoint protection tools now provide both more granular telemetry, and a way to investigate an attack within the management console.
Additionally, these tools increasingly integrate with other response tools in use within your environment. Advanced endpoint protection products bring several capabilities to response, including:
- Attack Visualization: In many cases, being able to visualize the attack on a device is very instructive for understanding how the malware works and what it does to devices. The management consoles of some EAP products offer a visual map to follow the activity of malware on a device – including the process the attack impacted, kernel-level activity, and/or API calls. This timeline of sorts must also specify the files involved in the attack and track network connectivity.
- Understanding Outbreaks: As discussed above, a key aspect of EAP products is their ability to aggregate telemetry and search after the fact to determine if other devices have been attacked by similar malware. This provides invaluable insight into how the attack has proliferated through your environment, and identifies specific devices in need of remediation or quarantine.
- Forensics: You also need the endpoint agent to be able to gather raw telemetry from the device and provide tools to analyze the data. At times, especially when skilled forensicators are involved, they need full data to really dig into what the malware did. A key aspect of forensic analysis is the need to enforce chain of custody for collected data, especially if prosecution is an option.
- Ease of Use: EAP tools have been built for more general security practitioners, rather than only forensics ninja, so user experience has been a focus for helping less experienced professionals be more productive. This requires a much easier workflow for drilling down into attacks, and pivoting to find the root cause.
- Integration with Enterprise Tools: Another key criteria for EAP products is making sure they play nice with tools already in use. You’ll want to be able to send data directly to a SIEM for further correlation and analysis. You’ll also want to integrate with a case management system to track investigations. Finally, think about integrations with network security controls (including firewalls and web filters) to block C&C sites and other malicious addresses discovered on endpoints, preventing other devices from contacting known-bad Internet addresses.
Finally we should acknowledge another very shiny concept in security circles: hunting. It seems every practitioner aspires to be a hunter nowadays. OK, maybe that’s a little exaggerated, but it’s a cool gig. Hunters go out and proactively look for adversary activity on networks and systems, as opposed to waiting for monitors to alert, and then investigating.
Psychologically, hunting is great for security teams because it puts the team more in control of their environment. Instead of waiting for a tool to tell you things are bad, you can go out and figure it out yourself.
But the reality is that hunting is primarily relevant to the most sophisticated and advanced security teams. It requires staff to look around, and unfortunately most organizations are not sufficiently staffed to achieve core operational goals, so there isn’t much chance they have folks sitting around, available to proactively look for bad stuff.
Keep in mind the tools used by hunters are largely the same ones useful to practitioners focused on validating attacks on endpoints. A hunter needs to be able to analyze granular telemetry from endpoints and other devices. They need to search through telemetry to find activity patterns that could be malicious. They need to forensically investigate a device when they find something suspicious. Hunters also need to retrospectively look for indicators of attack to understand which devices have been targeted. Pretty much what EDR tools do.
To be clear, we aren’t maligning hunting at all. If your organization can devote the resources to stand up a hunting function, that’s awesome. Our point is simply that the tools needed to hunt are pretty much the same tools used by responders to verify alerts.
That’s detection and response as part of an Endpoint Advanced Protection lifecycle. Our next post will wrap up with the sticky questions that need to be answered – including remediation once you find a compromised device, whether an EAP product can replace your existing AV, and how to integrate these tools with existing network and security management controls.
Posted at Tuesday 1st November 2016 1:15 pm
(0) Comments •
There is a disturbing consistency in the kinds of project requests I see these days. Organizations call me because they are in the midst of their first transition to cloud, and they are spending many months planning out their exact AWS environment and all the security controls “before we move any workloads up”. More often than not some consulting firm advised them they need to spend 4-9 months building out 1-2 virtual networks in their cloud provider and implementing all the security controls before they can actually start in the cloud.
This is exactly what not to do.
As I discussed in an earlier post on blast radius, you definitely don’t want one giant cloud account/network with everything shoved into it. This sets you up for major failures down the road, and will slow down cloud initiatives enough that you lose many of the cloud’s advantages. This is because:
- One big account means a larger blast radius (note that ‘account’ is the AWS term – Azure and Google use different structures, but you can achieve the same goals). If something bad happens, like someone getting your cloud administrator credentials, the damage can be huge.
- Speaking of administrators, it becomes very hard to write identity management policies to restrict them to only their needed scope, especially as you add more and more projects. With multiple accounts/networks you can better segregate them out and limit entitlements.
- It becomes harder to adopt immutable infrastructure (using templates like CloudFormation or Terraform to define the infrastructure and build it on demand) because developers and administators end up stepping on each other more often.
- IP address space management and subnet segregation become very hard. Virtual networks aren’t physical networks. They are managed and secured differently in fundamental ways. I see most organizations trying to shove existing security tools and controls into the cloud, until eventually it all falls apart. In one recent case it became harder and slower to deploy things into the company’s AWS account than to spend months provisioning a new physical box on their network. That’s like paying for Netflix and trying to record Luke Cage on your TiVo so you can watch it when you want.
Those are just the highlights, but the short version is that although you can start this way, it won’t last. Unfortunately I have found that this is the most common recommendation from third-party “cloud consultants”, especially ones from the big firms. I have also seen Amazon Solution Architects (I haven’t worked with any from the other cloud providers) not recommend this practice, but go along with it if the organization is already moving that way. I don’t blame them. Their job is to reduce friction and get customer workloads on AWS, and changing this mindset is extremely difficult even in the best of circumstances.
Here is where you should start instead:
- Accept that any given project will have multiple cloud accounts to limit blast radius. 2-4 is average, with dev/test/prod and a shared services account all being separate. This allows developers incredible latitude to work with the tools and configurations they need, while still protecting production environments and data, as you pare down the number of people with administrative privileges.
- I usually use “scope of admin” to define where to draw the account boundaries.
- If you need to connect back into the datacenter you still don’t need one big cloud account – use what I call a ‘bastion’ account (Amazon calls these transit VPCs). This is the pipe back to your data center; you peer other accounts off it.
- You still might want or need a single shared account for some workloads, and that’s okay. Just don’t make it the center of your strategy.
- A common issue, especially for financial services clients, is that outbound
ssh is restricted from the corporate network. So the organization assumes they need a direct/VPN connection to the cloud network to enable remote access. You can get around this with jump boxes, software VPNs, or bastion accounts/networks.
- Another common concern is that you need a direct connection to manage security and other enterprise controls. In reality I find this is rarely the case, because you shouldn’t be using all the same exact tools and technologies anyway. There is more than I can squeeze into this post, but you should be adopting more cloud-native architectures and technologies. You should not be reducing security – you should be able to improve it or at least keep parity, but you need to adjust existing policies and approaches.
I will be writing much more on these issues and architectures in the coming weeks. In short, if someone tells you to build out a big virtual network that extends your existing network before you move anything to the cloud, run away. Fast.
Posted at Monday 31st October 2016 6:00 pm
(0) Comments •
I started Securosis as a blog a little over 10 years ago. 9 years ago it became my job. Soon after that Adrian Lane and Mike Rothman joined me as partners. Over that time we have published well over 10,000 posts, around 100 research papers, and given countless presentations. When I laid down that first post I was 35, childless, a Research VP at Gartner still, and recently married. In other words I had a secure job and the kind of free time no one with a kid ever sees again. Every morning I woke up energized to tell the Internet important things!
In those 10 years I added three kids and two partners, and grew what may be the only successful analyst firm to spin out of Gartner in decades. I finished my first triathlons, marathon, and century (plus) bike ride. I started programming again. We racked up a dream list of clients, presented at all the biggest security events, and built a collection of research I am truly proud of, especially my more recent work on the cloud and DevOps, including two training classes.
But it hasn’t all been rainbows and unicorns, especially the past couple years. I stopped training in martial arts after nearly 20 years (kids), had two big health scares (totally fine now), and slowly became encumbered with all the time-consuming overhead of being self-employed. We went through 3 incredibly time-consuming and emotional failed acquisitions, where offers didn’t meet our goals. We spent two years self-funding, designing, and building a software platform that every iota of my experience and analysis says is desperately needed to manage security as we all transition to cloud computing, but we couldn’t get it over the finish line. We weren’t willing to make the personal sacrifices you need must to get outside funding, and we couldn’t find another path.
In other words, we lived life.
A side effect, especially after all the effort I put into Trinity (you can see a video of it here), is that I lost a lot of my time and motivation to write, during a period where there is a hell of a lot to write about. We are in the midst of the most disruptive transition in terms of how we build, operate, and manage technology. Around seven years ago I bet big on cloud (and then DevOps), with both research and hands-on work. Now there aren’t many people out there with my experience, but I’ve done a crappy job of sharing it. In part I was holding back to give Trinity and our cloud engagements an edge. More, though, essentially (co-)running two companies at the same time, and then seeing one of them fail to launch, was emotionally crushing.
Why share all of this? Why not. I miss the days when I woke up motivated to tell the Internet those important things. And the truth is, I no longer know what my future holds. Securosis is still extremely strong – we grew yet again this year, and it was probably personally my biggest year yet. On the downside that growth is coming at a cost, where I spend most of my time traveling around performing cloud security assessments, building architectures, and running training classes. It’s very fulfilling but a step back in some ways. I don’t mind some travel, but most of my work now involves it, and I don’t like spending that much time away from the family.
Did I mention I miss being motivated to write?
Over the next couple months I will brain dump everything I can, especially on the cloud and DevOps. This isn’t for a paper. No one is licensing it, and I don’t have any motive other than to core dump everything I have learned over the past 7 years, before I get bored and do something else. Clients have been asking for a long time where to start in cloud security, and I haven’t had any place to send them. So I put up a page to collect all these posts in some relatively readable order. My intent is to follow the structure I use when assessing projects, but odds are it will end up being a big hot mess. I will also be publishing most of the code and tools I have been building but holding on to.
Yeah, this post is probably TMI, but we have always tried to be personal and honest around here. That is exactly what used to excite me so much that I couldn’t wait to get out of bed and to work. Perhaps those days are past. Or perhaps it’s just a matter of writing for the love of writing again – instead of for projects, papers, or promotion.
Posted at Monday 31st October 2016 5:56 pm
(1) Comments •
By Adrian Lane
I wanted to do a quick post on a question I’ve been getting a lot: “Is there a difference between SecDevOps, Rugged DevOps, DevSecOps, and the rest of those various terms? Aren’t they all the same?”
No, they are not. I realized that Rich and I have been making this distinction for some time, and while we have made references in presentations, I don’t think we have ever discussed it on the blog. So here they are, our definitions of Rugged DevOps and SecDevOps:
Rugged is about bashing your code prior to production, to ensure it holds up to external threats once it gets into production, and using runtime code to help applications protect themselves. Be as mean to your code as attackers will, and make it resilient against attacks.
SecDevOps, or DevSecOps, is about using the wonders of automation to tackle security-related problems including composition analysis, configuration management, selecting approved images/containers, use of immutable servers, and other techniques to address security challenges facing operations teams. It also helps to eliminate certain classes of attacks. For instance immutable servers in a security zone which blocks port 22 can prevent both hackers and administrators from logging in.
In simplest terms, Rugged DevOps is more developer-focused, while SecDevOps is more operations-focused.
Before you ask, yes, DevOps disposes with the silos between development, QA, operations, and security. They are all part of the same team. They work together. Security’s role changes a bit. They help advise, help with tool selection, and more technically astute members even help write code or tests to validate code. But we are still having developer-centric conversations and operations conversations, so this merger is clearly a work in progress.
Please feel free to disagree.
Posted at Wednesday 26th October 2016 11:20 pm
(1) Comments •
By Adrian Lane
This post will discuss the division of responsibility between a cloud provider and you as a tenant, and how to define aspects of that relationship in your service contract. Renting a platform from a service provider does not mean you can afford to cede all security responsibility. Cloud services free you from many traditional IT jobs, but you must still address security. The cloud provider assumes some security responsibilities, but many still fall into your lap, while others are shared. The administration and security guides don’t spell out all the details of how security works behind the scenes, or what the provider really provides. Grey areas should be defined and clarified in your contract up fron. During an incident response is a terrible time to discover what SAP actually offers.
SAP’s brochures on cloud security imply you will tackle security in a simple and transparent way. That’s not quite accurate. SAP has done a good job providing basic security controls, and they have obtained certifications for common regulatory and compliance requirements on their infrastructure. But you are renting a platform, which leaves a lot up to you. SAP does not provide a good roadmap of what you need to tackle, or a list of topics to understand before you deploy into an SAP HCP cloud.
Our first goal for this section is to help you identify which areas of cloud security you are responsible for. Just as important is identifying and clarifying shared responsibilities. To highlight important security considerations which generally are not discussed in service contracts, we will guide you through assessing exactly what a cloud provider’s responsibilities are, and what they do not provide. Only then does it become clear where you need to deploy resources.
Divisions of Responsibility
What is PaaS? Readers who have worked with SAP Hana already know what it is and how it works. Those new to cloud may understand the Platform as a Service (PaaS) concept, but not yet be fully aware what it means structurally. To highlight what a PaaS service provides, let’s borrow Christopher Hoff’s cloud taxonomy for PaaS; this illustrates what SAP provides.
This diagram includes the components of IaaS and PaaS systems. Obviously the facilities (such as power, HVAC, and physical space) and hardware (storage, network, and computing power) portions of the infrastructure are provided, as are the virtualization and cluster management technologies to make it all work together. More interesting, though: SAP Hana, its associated business objects, personalization, integration, and data management capabilities are all provided – as well as APIs for custom application development. This enables you to focus on delivering custom application features, tailored UIs, workflows, and data analytics, while SAP takes care of managing everything else.
The Good, the Bad, and the Uncertain
The good news is that this frees you up from lengthy hardware provisioning cycles, network setup, standing up DNS servers, cluster management, database installations, and the myriad things it takes to stand up a data center. And all the SAP software, middleware components, and integration are built in – available on demand. You can stand up an entire SAP cluster through their management console in hours instead of weeks. Scaling up – and down – is far easier, and you are only charged for what you use.
The bad news is that you have no control over underlying network security; and you do not have access to network events to seed your on-premise DLP, threat analysis, SIEM, and IDS systems. Many traditional security tools therefore no longer function, and event collection capabilities are reduced. The net result is that you become more reliant than ever on the application platform’s built-in security, but you do not fully control it. SAP provides fairly powerful management capabilities from a single console, so administrative account takeovers or malicious employees can cause considerable damage.
There are many security details the vendor may share with you, but wherever they don’t publish specifics, you need to ask. Specifically, things like segregation of administrative duties, data encryption and key management, employee vetting process, and how they monitor their own systems for security events. You’ll need to dig in a bit and ask SAP about details of the security capabilities they have built into the platform.
At Securosis we call the division between your security responsibilities and your vendor’s “the waterline”. Anything above the waterline is your responsibility, and everything below is SAP’s. In some areas, such as identity management, both parties have roles to play. But you generally don’t see below the waterline – how they perform their work is confidential. You have very little visibility into their work, and very limited ability to audit it – for SAP and other cloud services.
This is where your contract comes into play. If a service is not in the contract, there is a good chance it does not exist. It is critical to avoid assumptions about what a cloud provider offers or will do, if or when something like a data breach occurs. Get everything in writing.
The following are several areas we advise you to ask about. If you need something for security, include it in your contract.
- Event Logs: Security analytics require event data from many sources. Network flows,
syslog, database activity, application logs, IDS, IAM, and many others are all useful. But SAP’s cloud does not offer all these sources. Further, the cloud is multi-tenant, so logs may include activity from other tenants, and therefore not be available to you. For platforms and applications you manage in the cloud, event logs are available. Assess what you rely on today that’s unavailable. In most cases you can switch to more application-centric event sources to collect required information. You also need to determine how data will be collected – agents are available for many things, while other logs must be gathered via API requests.
- Testing and Assessment: SAP states that they conduct internal penetration tests to verify common defects are not present, and attempts to validate that their own business logic functions as intended. This does not extend to your custom applications. Additionally, SAP may or may not allow you to run penetration tests, dynamic application security testing, or even remote vulnerability assessment – against your applications and/or theirs. This is a critical area you need to understand, to determine which of your application security efforts can continue. Most cloud service providers allow limited external testing with advance permission, and some scans can be conducted internally – against only your assigned resources. You need to specify these activities in your contract, specifically including which tests will be performed, how permissions are obtained if needed, timeframes, and test scopes. The good news is that some of your existing application scanning responsibility is reduced, because the service provider takes care it. The bad news concerns the extra work to set up a new assessment process in the cloud.
- Breach Response: If a data breach occurs, what happens? Will SAP investigate? Will they share data with you? Who is at fault, and who decides? If federal or local law enforcement becomes involved, will you still be kept in the loop? We have witnessed cases where other cloud service vendors have not assisted their tenants with event analysis, and others where they declined to share event data – instead only confirming that an event took place. This is an area your security team needs to be comfortable with, especially if your firm runs a Security Operations Center. Because you won’t control the platform or infrastructure, your analysis is limited. This shared responsibility must be spelled out in your contract.
- Certifications: SAP obtains periodic certifications on their infrastructure and platforms. Things like PCI-DSS, ISO 9001, ISO 27001, ISAE 3402, and several others we won’t bother to list here. The key is whether SAP has the certifications important to you, and exactly which parts of their service are certified. This will give you a good idea of where their efforts ended, and where yours must pick up. Additionally, some audits only cover what the service provider listed as important – omitting items you might find relevant. We recommend you contrast their certification reports against your current certifications for on-premise systems to ensure you’re covered.
- Segregation of Duties: Remember that SAP’s admins have access to your platforms. For most cloud services consumers, who worry about admins accessing data stores, this means database encryption is needed. You will need to decide how to encrypt data and where to store encryption keys. In most cases we find the in-cloud offerings insufficient, so a hybrid model is employed.
- Data Privacy Regulations: Additional data privacy concerns may arise, depending on which data center you choose. SAP will rightfully tell you that you need to understand which laws apply to you, as both compliance and legal jurisdiction change depending on your data center’s geographic region. SAP states they adhere to German government and EU requirements for data processors, but you will need to independently verify that these meet your requirements, and develop a mitigation plan for any unaddressed items. Additionally, you need to reconsider these issues if you select fail-over data centers in different regions. Some compliance and privacy laws and requirements follow the data, but some laws will change, and there are cases where these two areas will conflict.
- Platform Updates: Cloud service vendors tend to be very agile in deployment of patches and new features. That means they have the capacity to develop and roll out security patches on a regular basis. In some cases this alters platform behavior and function.
Keep in mind that public cloud service providers like SAP don’t like what we suggest. They are not really set up to provide custom security and compliance offerings, are reluctant to share data, and don’t like to go into detail on their operations. We encourage you to ask for clarification on what the service offers, but don’t expect tailored security or compliance service. It’s set up to be on-demand, self-service, with standard pricing. Customers can have anything they want, so long as it’s already on the menu. It’s a bit like arguing with a vending machine – barter, or trying to get a Pepsi from a Coke machine, rarely works out. Cloud vendors are designed to provide a basic service, with customization being something you build on top of their basic service. Unless you spend absurd amounts of money. But custom services are atypical. This goes for general features as well as security add-ons.
You will have less control over infrastructure and no physical access to hardware. The people managing the platform don’t report to you. To compensate you will rely more on contracts, service level agreements, and audit reports from the provider on their service. Be aggressive in requesting documents on which security controls are provided and how they work; some documents are not provided to the general public. Request compliance reports to see where SAP was tested, and where they weren’t. Understand that there are many things you cannot bargain for, but you will have more success asking for data and clarification on what SAP provides. But for anything critical (and anything non-critical, too), if it’s not spelled out in the contract, don’t expect it to work the way you want or need it to.
Posted at Monday 24th October 2016 11:41 am
(0) Comments •
By Mike Rothman
As we discussed in our last post, there is a logical lifecycle which you can implement to protect endpoints. Once you know what you need to protect and how vulnerable the devices are, you try to prevent attacks, right? Was that a snicker? You’ve been reading the trade press and security marketing telling you prevention is futile, so you’re a bit skeptical. You have every right to be – time and again you have had to clean up ransomware attacks (hopefully before they encrypt entire file servers), and you detect command and control traffic indicating popped devices frequently. A sense of futility regarding actually preventing compromise is all too common.
Despite any feelings of futility, we still see prevention as key to any Endpoint Protection strategy. It needs to be. Imagine how busy (and frustrated) you’d be if you stopped trying to prevent attacks, and just left a bunch of unpatched Internet-accessible Windows XP devices on your network, figuring you’d just detect and clean up every compromise after the fact. That’s about as silly as basing your plans on stopping every attack.
So the key objective of any prevention strategy must be making sure you aren’t the path of least resistance. That entails two concepts: reducing attack surface, and risk-based prevention. Shame on us if devices are compromised by attacks which have been out there for months. Really. So ensuring proper device hygiene on endpoints is job one. Then it’s a question of deciding which controls are appropriate for each specific employee (or more likely, group of employees). There are plenty of alternatives to block malware attacks, some more effective than others. But unfortunately the most effective controls are also highly disruptive to users. So you need to balance inconvenience against risk to determine which makes the most sense. If you want to keep your job, that is.
“Legacy” Prevention Techniques
It is often said that you can never turn off a security control. You see the truth in that adage when you look at the technologies used to protect endpoints today. We carry around (and pay for) historical technologies and techniques, largely regardless of effectiveness, and that complicates actually defending against the attacks we see.
The good news is that many organizations use an endpoint protection suite, which over time mitigates the less effective tactics. At least in concept. But we cannot fully cover prevention tactics without mentioning legacy technologies. These techniques are still in use, but largely under the covers of whichever endpoint suite you select.
- Signatures (LOL): Signature-based controls are all about maintaining a huge blacklist of known malicious files to prevent from executing. Free AV products currently on the market typically only use this strategy, but the broader commercial endpoint protection suites have been supplementing traditional signature engines with additional heuristics and cloud-based file reputation for years. So this technique is used primarily to detect known commodity attacks representing the low bar of attacks seen in the wild.
- Advanced Heuristics: Endpoint detection needed to evolve beyond what a file looks like (hash matching), paying much more attention to what malware does. The issue with early heuristics was having enough context to know whether an executable was taking a legitimate action. Malicious actions were defined generically for each device based on operating system characteristics, so false positives (notably blocking a legitimate action) and false negatives (failing to block an attack) were both common – a lose/lose scenario. Fortunately heuristics have evolved to recognize normal application behavior. This dramatically improved accuracy by building and matching against application-specific rules. But this requires understanding all legitimate functions within a constrained universe of frequently targeted applications, and developing a detailed profile of each covered application. Any unapproved application action is blocked. Vendors need a positive security model for each application – a tremendous amount of work. This technique provides the basis for many of the advanced protection technologies emerging today.
- AWL: Application White Listing entails implementing a default deny posture on endpoint devices (often servers). The process is straightforward: Define a set of authorized executables that can run on a device, and block everything else. With a strong policy in place, AWL provides true device lockdown – no executables (either malicious or legitimate) can execute without explicit authorization. But the impact to user experience is often unacceptable, so this technology is mostly restricted to very specific use cases, such as servers and fixed-function kiosks, which shouldn’t run general-purpose applications.
- Isolation: A few years ago the concept of running apps in a “walled garden” or sandbox on each device came into vogue. This technique enables us to shield the rest of a device from a compromised application, greatly reducing the risk posed by malware. Like AWL, this technology continues to find success in particular niches and use cases, rather than as a general answer for endpoint prevention.
You can’t ignore old-school techniques, because a lot of commodity malware still in circulation every day can be stopped by signatures and advanced heuristics. Maybe it’s 40%. Maybe it’s 60%. Regardless, it’s not enough to fully protect endpoints. So endpoint security innovation has focused on advanced prevention and detection, and also on optimizing for prevalent attacks such as ransomware.
Let’s unpack the new techniques to make sense of all the security marketing hyperbole getting thrown around. You know, the calls you get and emails flooding your inbox, telling you how these shiny new products can stop zero-day attacks with no false positives and insignificant employee disruption. But we don’t know of any foolproof tools or techniques, so we will focus the latter half of this series on detection and investigation. But in fairness, advanced techniques do dramatically increase the ability of endpoints to block attacks.
The first major category of advanced prevention techniques focus on blocking exploits before the device is compromised. Security research has revealed a lot of how malware actually compromises endpoints at a low level, so tools now look for those indicators. You can pull out our favorite healthcare analogy: by understanding the fundamental changes an attack causes within an organism, you learn what to look for generally, rather than focusing on a specific attack, which can morph in an infinite number of ways.
These tactics break down into a few buckets:
- Profiling exploit behavior: This takes the advanced heuristics approach described above deeper into the innards of the operating system. Where the advanced heuristics focus on identifying anomalous application behavior, these anti-exploit tools focus on what happens to the actual machine when malicious code takes over the device. The concept is that there are a discrete and known number of ways to compromise the operating system, regardless of the attack vector and by blocking those behaviors, you stop the exploit.
- Memory analysis/protection: One of the latest waves of attack doesn’t even deal with traditional malware files. Malicious code is inserted directly into a command line or other means of manipulating the operating system without hitting disk. This attack requires analyzing the memory of the device on a continuous basis and preventing memory corruption and logic flaws. Suffice it to say this kind of technology is very sophisticated and can really impact the operation of the device, so full testing to ensure no impact on your devices is critical in evaluating this technology.
- Malware-less defense: Aside from hiding attacks in memory, attackers are now using fundamental operating system features to defeat whitelisting and isolation techniques. The most frequently targeted OS services include WMI, PowerShell, and EMET. These attacks are much more challenging to detect because these system processes are authorized by definition. To defend against these attacks, advanced technologies need to monitor the behaviors of all processes to make sure an approved process hasn’t been hijacked. This requires profiling legitimate behavior of common system processes, and then looking for anomalous activity.
All ‘advanced’ endpoint protection technology includes these techniques, though they may be branded differently. It is all largely the same approach of looking for anomalous behavior, but focused on OS and device innards instead of user-space applications.
Endpoint Bot Detection
Pretty much every modern attack, whether it involves malware or not, involves communicating with a command and control network to download the attack payload and receive instructions. So endpoint network-based detection has evolved to look for command and control patterns, similar to non-endpoint network malware detection.
This capability is important for full protection, because endpoints aren’t always on the corporate network, which you are presumably already scanning for command and control traffic. So recognizing when a device in a coffee shop or hotel is communicating with known malicious sites, can help you detect a compromise before the device reconnects to the corporate network. This requires integration with a threat intelligence source to keep an updated list of known malicious sites.
Dynamic File Testing
Many attacks still involve a compromised file executing code on a device, so network and cloud sandboxes are heavily used to dynamically execute inbound files and ensure they are not malicious. You have a number of options for where to test files, including the perimeter and/or email security gateway. But remote personnel remain a challenge because their network traffic doesn’t run through the network’s corporate defenses.
So you can supplement those corporate controls with the ability to extract and test files on endpoints as well. The file will be checked to see if it has a known bad hash; if not, it can be tested in the corporate sandbox. Some organizations now converting any easily compromised file (meaning Office files) into a sanitized PDF to remove any active code without impacting document appearance. If the original file is needed it can be routed to the recipient after clearing the sandbox.
The first technology you have certainly been hearing a lot about is machine learning. It is used in many contexts aside from endpoint protection, but the advanced endpoint security messaging has become very prominent. We just chuckle – statistical analysis of malware has been a popular technique as long as we can remember. And all of a sudden, math is our savior, to stop all these nasty attacks?
But the math really is better now. Combined with much more detailed understanding of how malware actually compromises devices, more sophisticated static file analysis does help detect attacks. But we have to wonder whether these new techniques are really just next-generation AV signatures.
Ultimately we try to avoid getting wrapped up in vernacular or semantics. If these techniques help detect attacks more accurately at scale, the important thing isn’t whether they look like signatures or not. It’s not like we (or anyone else) believe machine learning is the perfect solution for endpoint protection. It’s just another development in the never-ending arms race of malware protection.
The other enabling technology that warrants mention is threat intelligence. Or security research, as endpoint protection vendors have been calling it for a decade. The reality is that whether you are adding new indicators to an endpoint agent, or updating the list of known malicious sites for command and control detection, each endpoint agent needs to be updated frequently to keep current. Especially devices that don’t sit behind the corporate network’s perimeter defenses.
You wouldn’t necessarily buy threat intelligence as part of an endpoint protection project, but during technology evaluation you should ensure that agents are kept current, and updates don’t put too much strain on either endpoints or the network.
Protecting the Point of Attack
We should address the best place to place protection, because you have a few options. The path of least resistance remains network-based solutions, which can be deployed without any user impact. Of course these options don’t protect device which aren’t behind the corporate perimeter. Nor can network-based solutions provide context for individual user behavior like something running on the device can.
You can run all traffic through a VPN or a cloud-based filtering service to provide some protection for remote devices. Running traffic through either enables yoyu to gather telemetry and enforce corporate usage policies. On the downside, this impacts traffic flow and can be evaded by both savvy users and attackers. But it offers an option for addressing the limitations of filtering traffic through network defenses.
But this research is focused on endpoint protection, so let’s assume that protecting endpoints is important. So do you add yet another agent to your endpoint, or use a plug-in into a common application like a browser to protect against the most common attack vector? If for some reason you cannot replace the existing endpoint agent, looking at a plug-in approach to provide additional protection can certainly help as a stopgap.
But if we haven’t yet made it clear, these advanced endpoint security offerings are neither a long-term alternative, nor meant to run alongside an existing endpoint protection suite. These new offerings represent an evolution of endpoint protection; so either incumbents will add these capabilities to their existing offerings or they won’t survive. And this is not just about prevention – we will discuss endpoint detection and response capabilities in our next post.
We don’t normally call out specifically attacks because they change so frequently. But ransomware is a bit different. The ability to so cleanly and quickly monetize successful attacks has made it the most visible attack strategy. And ransomware is not restricted to just one size or type of company, or device type. We have seen ransomware targeting everyone and everything.
So how can you combine these advanced techniques to prevent a ransomware attack? Fortunately in technical terms ransomware is just another attack, so it can be profiled and blocked using advanced heuristics and exploit profiling. First look for attack patterns as they attempt to compromise the device; ransomware doesn’t look fundamentally different than other attacks.
Next look for clues within the endpoint’s network stack – particularly command and control traffic – because attackers need to deliver their payload to lock down the machine. You can also look for anomalous searching of file shares because ransomware typically targets shared file systems for extra impact.
Additionally, because ransomware encrypts the local file system, you can monitor file I/O for anomalous activity. We also suggest organizations more aggressively monitor their storage networks and arrays for anomalous file activity. This can help shorten the detection window, and stop encryption before too much data is impacted.
And yes, they are out of the scope for this research, but device and data backup are essential for quick restoration of service in case of a ransomware attack.
A Note on ‘Effectiveness’
It’s worth mentioning how to evaluate the effectiveness of these solutions. We refer back to our Advanced Endpoint and Server Protection research a few years ago, as this material hasn’t changed.
As you start evaluating these advanced prevention offerings, don’t be surprised to get a bunch of inconsistent data on the effectiveness of specific approaches. You are also likely to encounter many well-spoken evangelists spouting monumental amounts of hyperbole and religion in favor of their particular approach – whatever it may be – at the expense of all other options. This happens in every security market undergoing rapid innovation, as companies try to establish momentum for their approaches and products.
A lab test favoring one product or approach over another isn’t much consolation when you need to clean up an attack your tools failed to prevent. And those evangelists are nowhere to be found when a security researcher shows how to evade their shiny technology at the latest Black Hat conference. We at Securosis try to float above the hyperbole and propaganda to keep you focused on what’s really important – not claimed 1% effectiveness differences. If products or categories are within a few percent of each other across a variety of tests, we consider that a draw.
But if you look hard enough, you can find value in comparative tests. An outlier warrants investigation and a critical assessment of the test and methodology. Was it skewed toward one category? Was the test commissioned by a vendor or someone else with an agenda? Was real malware, freshly found in the wild, used in the test? All testing methodologies have issues and limitations – don’t base a decision, or even a short list, around a magic chart or a product review/test.
A Risk-Based Approach to Defending Endpoints
Yet, security practitioners have an unfortunate tendency to miss the forest for the trees when discussing advanced endpoint protection. The reality is that each device contains a mixture of data types; some data types present great risk to the organization, and others don’t. You also need to consider that some protection techniques are very disruptive to end users and can be expensive to both procure and manage.
So we advocate a risk-based approach to protecting endpoints. This involves grouping endpoint devices into a handful (or less than a handful) of risk categories. Then determine the most effective means to protect the devices based in each category. For example you might want to implement whitelisting on all kiosks in stores and warehouses. Or you might add an advanced exploit prevention agent to devices used by senior management, Human Resources, and Finance, and anyone else handling especially sensitive or attractive information. Finally you might just use free AV on devices which only have outbound access from common areas, because they don’t have access to anything important on the corporate network.
There are as many permutations as devices on your network. To scale this approach you need to categorize risk tiers effectively. But a one-size-fits-all approach doesn’t work either given the variety of different approaches that can be brought to bear on detecting advanced malware.
As we mentioned above, our next post will cover endpoint detection and response technologies which are increasingly important to defending endpoints.
Posted at Monday 17th October 2016 4:16 pm
(0) Comments •