By Mike Rothman
Many security professionals feel the deck is stacked against them. Adversaries continue to improve their techniques, aided by plentiful malware kits and botnet infrastructures. Continued digitization at pretty much every enterprise means everything of interest in on some system somewhere. Don’t forget the double whammy of mobile and cloud, which democratizes access without geographic boundaries, and takes the one bastion of control, the traditional data center, out of your direct control. Are we having fun yet?
Of course the news isn’t all bad – security has become very high profile. Getting attention and resources can sometimes be a little too easy – life was simpler when we toiled away in obscurity bemoaning that senior management didn’t understand or care about security. That’s clearly not the case today, as you get ready to present the security strategy to the board of directors. Again. And after that’s done you get to meet with the HR team trying to fill your open positions. Again.
In terms of fundamentals of a strong security program, we have always believed in the importance of security monitoring to shorten the window between compromise and detection of compromise. As we posted in our recent SIEM Kung Fu paper:
Security monitoring needs to be a core, fundamental, aspect of every security program.
There are a lot of different concepts of what security monitoring actually is. It certainly starts with log aggregation and SIEM, although many organizations are looking to leverage advanced security analytics (either built into their SIEM or using third-party technology) to provide better and faster detection. But that’s not what we want to tackle in this new series, titled Managed Security Monitoring. It’s not about whether to do security monitoring, it’s a question of the most effective way to monitor resources.
Given the challenges of finding and retaining staff, the increasingly distributed nature of data and systems that need to be monitored, and the rapid march of technology, it’s worth considering whether a managed security monitoring service makes sense for your organization. The fact is that, under the right circumstances, a managed service presents an interesting alternative to racking and stacking another set of SIEM appliances. We will go through drivers, use cases, and deployment architectures for those considering managed services. And we will provide cautions for areas where a service offering might not meet expectations.
As always, our business model depends on forward-looking companies who understand the value of objective research. We’d like to thank IBM Security Systems for agreeing to potentially license this paper once completed. We’ll publish the research using our Totally Transparent Research methodology, which ensures our work is done in an open and accessible manner.
Drivers for Managed Security Monitoring
We have no illusions about the amount of effort required to get a security monitoring platform up and running, or what it takes to keep one current and useful, given the rapid adaptation of attackers and automated attack tools in use today. Many organizations feel stuck in a purgatory of sorts, reacting without sufficient visibility, yet not having time to invest to gain that much-needed visibility into threats. A suboptimal situation, often the initial trigger for discussion of managed services. Let’s be a bit more specific about situations where it’s worth a look at managed security monitoring.
- Lack of internal expertise: Even having people to throw at security monitoring may not be enough. They need to be the right people – with expertise in triaging alerts, validating exploits, closing simple issues, and knowing when to pull the alarm and escalate to the incident response team. Reviewing events, setting up policies, and managing the system, all take skills that come with training and time with the security monitoring product. Clearly this is not a skill set you can just pick up anywhere – finding and keeping talented people is hard – so if you don’t have sufficient expertise internally, that’s a good reason to check out a service-based alternative.
- Scalability of existing technology platform: You might have a decent platform, but perhaps it can’t scale to what you need for real-time analysis, or has limitations in capturing network traffic or other voluminous telemetry. And for organizations still using a first generation SIEM with a relational database backend (yes, they are still out there), you face a significant and costly upgrade to scale the system. With a managed service offering scale is not an issue – any sizable provider is handling billions of events per day and scalability of the technology isn’t your problem – so long as the provider hits your SLAs.
- Predictable Costs: To be the master of the obvious, the more data you put into a monitoring system, the more storage you’ll need. The more sites you want to monitor and the deeper you want visibility into your network, the more sensors you need. Scaling up a security monitoring environment can become costly. One advantage of managed offerings is predictable costs. You know what you’re monitoring and what it costs. You don’t have variable staff costs, nor do you have out-of-cycle capital expenses to deal with new applications that need monitoring.
- Technology Risk Transference: You have been burned before by vendors promising the world without delivering much of anything. That’s why you are considering alternatives. A managed monitoring service enables you to focus on the functionality you need, instead of trying to determine which product can meet your needs. Ultimately you only need to be concerned with the application and the user experience – all that other stuff is the provider’s problem. Selecting a provider becomes effectively an insurance policy to minimize your technology investment risk. Similarly, if you are worried about your ops team’s ability to keep a broad security monitoring platform up and running, you can transfer operational risk to the provider, who assumes responsibility for uptime and performance – so long as your SLAs are structured properly.
- Geographically dispersed small sites: Managed services also interest organizations needing to support many small locations without a lot of technical expertise. Think retail and other distribution-centric organizations. This presents a good opportunity for a service provider who can monitor remote sites.
- Round the clock monitoring: As security programs scale and mature, some organizations decide to move from an 8-hour/5-day monitoring schedule to a round-the-clock approach. Soon after making that decision, the difficult of staffing a security operations center (SOC) 24/7 sets in. A service provider can leverage a 24/7 staffing investment to deliver round-the-clock services to many customers.
Of course you can’t outsource thinking or accountability, so ultimately the buck stops with the internal team, but under the right circumstances managed security monitoring services can address skills and capabilities gaps.
Favorable Use Cases
The technology platform used by the provider may be the equal of an in-house solution, as many providers use commercial monitoring platforms as the basis for their managed services. This is a place for significant diligence during procurement, as we will discuss in our next post. As mentioned above, there are a few use cases where managed security monitoring makes a lot of sense, including:
- Device Monitoring/Alerting: This is the scaling and skills issue. If you have a ton of network and security devices, but you don’t have the technology or people to properly monitor them, managed security monitoring can help. These services are generally architected to aggregate data on your site and ship it to the service provider for analysis and alerting, though a variety of different options are emerging for where the platform runs and who owns it. Central to this use case is a correlation system to identify issues, a means to find new attacks (typically via a threat intelligence capability) and a bunch of analysts who can triage and validate issues quickly, and then provide an actionable alert.
- Advanced Detection: With the increasing sophistication of attackers, it can be hard for an organization’s security team to keep pace. A service provider has access to threat intelligence, presumably multiple clients across which to watch for emerging attacks, and the ability to amortize advanced security analytics across customers. Additionally specialized (and expensive) malware researchers can be shared among many customers, making it more feasible for a service provider to employ those resources than many organizations.
- Compliance Reporting: Another no-brainer for a managed security monitoring alternative is basic log aggregation and reporting – typically driven by a compliance requirement. This isn’t a very complicated use case, and it fits service offerings well. It also gets you out of the business of managing storage and updating reports when a requirement/mandate changes. The provider should take care of all that for you.
- CapEx vs. OpEx: As much as it may hurt a security purist, buying decisions come down to economics. Depending on your funding model and your organization’s attitude toward capital expenses, leasing a service may be a better option than buying outright. Of course there are other ways to turn a capital purchase into an operational expense, and we’re sure your CFO will have plenty of ideas on that front, but buying a service can be a simple option for avoiding capital expenditure. Obviously, given the long and involved process to select a new security monitoring platform, you must make sure the managed service meets your needs before economic considerations come into play – especially if there’s a risk of Accounting’s preferences driving you to spend big on an unsuitable product. No OpEx vs. CapEx tradeoff can make a poorly matched service offering meet your requirements.
There are other offerings and situations where managed security monitoring makes sense, which have nothing to do with the nice clean buckets above. We have seen implementations of all shapes and sizes, and we need to avoid overgeneralizing. But the majority of service implementations fit these general use cases.
Unfavorable Use Cases
Of course there are also situations where a monitoring service may not be a good fit. That doesn’t mean you can’t use a service because of extenuating circumstances, typically having to do with a staffing and skills gap. But generally these situations don’t make for the best fit for a service:
- Dark Networks: Due to security requirements, some networks are dark, meaning no external access is available. These are typically highly sensitive military and/or regulated environments. Clearly this is problematic for a security monitoring service because the provider cannot access the customer network. To address skills gaps you’d instead consider a dedicated onsite resource and either buying a security monitoring platform yourself or leasing it from the provider.
- Highly Sensitive IP: On networks where the intellectual property is particularly valuable, the idea of providing access to external parties is usually a non-starter. Again, this situation would call for dedicated on-site resources helping to run your on-premise security monitoring platform.
- Large Volumes of Data: If your organization is very large and has a ton of logs and other telemetry for security monitoring, this can challenge a service offering that requires data to be moved to a cloud-based service, including network forensics and packet analytics. In this case an on-premise monitoring service will likely be the best solution. Note the new hybrid offerings which capture data and perform security analytics on-premise using resources in a shared SOC. We’ll discuss these hybrid offerings in our next post.
As with the favorable use cases, the unfavorable use cases are strong indicators but not absolute. It really depends on the specific requirements of your situation, your ability to invest in technology, and the availability of skilled resources.
These generalizations should give you a starting point to consider a managed security monitoring service. Our next post will get into specifics of selection criteria, service levels, and deployment models.
Posted at Monday 27th June 2016 2:00 pm
(0) Comments •
Quick note: I basically wrote an entire technical post for Tool of the Week, so feel free to skip down if that’s why you’re reading.
Ah, summer. As someone who works at home and has children, I’m learning the pains of summer break. Sure, it’s a wonderful time without homework fights and after-school activities, but it also means all 5 of us in the house nearly every day. It’s a bit distracting. I mean do you have any idea how to tell a 3-year-old you cannot ditch work to play Disney Infinity on the Xbox?
Me neither, which explains my productivity slowdown.
I’ve actually been pretty busy at ‘real work’, mostly building content for our new Advanced Cloud Security course (it’s sold out, but we still have room in our Hands-On class). Plus a bunch of recent cloud security assessments for various clients. I have been seeing some interesting consistencies, and will try to write those up after I get these other projects knocked off. People are definitely getting a better handle on the cloud, but they still tend to make similar mistakes.
With that, let’s jump right in…
Top Posts for the Week
Tool of the Week
I’m going to detour a bit and focus on something all you admin types are very familiar with: rsyslog. Yes, this is the default system logger for a big chunk of the Linux world, something most of us don’t think that much about. But as I build out a cloud logging infrastructure I found I needed to dig into it to make some adjustments, so here is a trick to insert critical Amazon metadata into your logs (usable on other platforms, but I can only show so many examples).
syslog-compatible tools generate standard log files and allow you to ship them off to a remote collector. That’s the core of a lot of performance and security monitoring. By default log lines look something like this:
Jun 24 00:21:27 ip-172-31-40-72 sudo: ec2-user : TTY=pts/0 ; PWD=/var/log ; USER=root ; COMMAND=/bin/cat secure
That’s the line outputting the security log from a Linux instance. See a problem?
This log entry includes the host name (internal IP address) of the instance, but in the cloud a host name or IP address isn’t nearly as canonical as in traditional infrastructure. Both can be quite ephemeral, especially if you use auto scale groups and the like. Ideally you capture the instance ID or equivalent on other platforms, and perhaps also some other metadata such as the internal or external IP address currently associated with the instance. Fortunately it isn’t hard to fix this up.
The first step is to capture the metadata you want. In AWS just visit:
To get it all. Or use something like:
to get the instance ID. Then you have a couple options. One is to change the host name to be the instance ID. Another is to append it to entries by changing the
rsyslog configuration (
/etc/rsyslog.conf on CentOS systems), as in the below to add a
%INSTANCEID% environment variable to the hostname (yes, this means you need to set
INSTANCEID as an environment variable, and I haven’t tested this because I need to post the Summary before I finish, so you might need a little more text manipulation to make it work… but this should be close):
string="<%PRI%>%TIMESTAMP:::date-rfc3339% %INSTANCEID%-%HOSTNAME% %syslogtag:1:32%%msg:::sp-if-no-1st-sp%%msg%"
There are obviously a ton of ways you could slice this, and you need to add it to your server build configurations to make it work (using Ansible/Chef/Puppet/packer/whatever). But the key is to capture and embed the instance ID and whatever other metadata you need. If you don’t care about strict
syslog compatibility, you have more options. The nice thing about this approach is that it will capture all messages from all the system sources you normally log, and you don’t need to modify individual message formats.
If you use something like the native Amazon/Azure/Google instance logging tools… you don’t need to bother with any of this. Those tools tend to capture the relevant metadata for you (e.g., using Amazon’s CloudWatch logs agent, Azure’s Log Analyzer, or Google’s StackDriver). Check the documentation to make sure you get them correct. But many clients want to leverage existing log management, so this is one way to get the essential data.
Securosis Blog Posts this Week
Other Securosis News and Quotes
Another quiet week…
Training and Events
- We are running two classes at Black Hat USA. Early bird pricing ends in a month – just a warning:
Posted at Friday 24th June 2016 5:23 am
(0) Comments •
By Mike Rothman
Visible devices are only some of the network-connected devices in your environment. There are hundreds, quite possibly thousands, of other devices you don’t know about on your network. You don’t scan them periodically, and you have no idea of their security posture. Each one can be attacked, and might provide an adversary with opportunity to gain presence in your environment. Your attack surface is much larger than you thought. In our Shining a Light on Shadow Devices paper, we discuss the attacks on these devices which can become an issue on your network, along with some tactics to provide visibility and then control to handle all these network-connected devices.
We would like to thank ForeScout Technologies for licensing the content in this paper. Our unique Totally Transparent Research model enables us to think objectively about future attack vectors and speculate a bit on the impact to your organization, without paywalls or other such gates restricting access to research you may need.
You can get the paper from the landing page in our research library.
Posted at Wednesday 15th June 2016 8:16 pm
(0) Comments •
By Adrian Lane
Before we jump into today’s post, we want to thank Immunio for expressing interest in licensing this content. This type of support enables us to bring quality research to you, free of charge. If you are interested in licensing this Securosis research as well, please let us know. And we want to thank all of you who have been commenting throughout this series – we have received many good comments and questions. We have in fact edited most of the posts to integrate your feedback, and added new sections to address your questions. This research is certainly better for it! And it’s genuinely helpful that the community at large can engage is an open discussion, so thanks again to all you who have participated.
We will close out this series by directing your attention to several key areas for buyers to evaluate, in order to assess suitability for your needs. With new technologies it is not always clear where the ‘gotchas’ are. We find many security technologies meet basic security goals, but after they have been on-premise for some time, you discover management or scalability nightmares. To help you avoid some of these pitfalls, we offer the following outline of evaluation criteria. The product you choose should provide application protection, but it should also be flexible enough to work in your environment. And not just during Proof of Concept (PoC) – every day.
- Language Coverage: Your evaluation should ensure that the RASP platforms you are considering all cover the programming languages and platforms you use. Most enterprises we speak with develop applications on multiple platforms, so ensure that there is appropriate coverage for all your applications – not just the ones you focus on during the evaluation process.
- Blocking: Blocking is a key feature. Sure, some of you will use RASP for monitoring and instrumentation – at least in the short term – but blocking is a huge part of RASP’s value. Without blocking there is no protection – even more to the point, get blocking wrong and you break applications. Evaluating how well a RASP product blocks is essential. The goal here is twofold: make sure the RASP platform is detecting the attacks, and then determine if its blocking action negatively affects them. We recommend penetration testing during the PoC, both to verify that common attack vectors are handled, and to gauge RASP behavior when attacks are discovered. Some RASPs simply block the request and return an error message to the user. In some cases RASP can alter a request to make it benign, then proceed as normal. Some products alter user sessions and redirect users to login again, or jump through additional hoops before proceeding. Most RASP products provide customers a set of options for how they should respond to different types of attacks. Most vendors consider attack detection techniques part of their “secret sauce”, so we are unable to offer insight into the differences. But just as important is how well application continuity is preserved when responding to threats, which you can monitor directly during evaluation.
- Policy Coverage: It’s not uncommon for one or more members of a development team to be proficient with application security. That said, it’s unreasonable to expect developers to understand the nuances of new attacks and the details behind every CVE. Vulnerability research, methods of detection, and appropriate methods to block attacks are large parts of the value each RASP vendor provides. Your vendor spends days – if not weeks – developing each policy embedded into their tool. During evaluation, it’s important to ensure that critical vulnerabilities are addressed. But it is arguably more important to determine how – and how often – vendors update policies, and verify they include ongoing coverage. A RASP product cannot better than its policies, so ongoing support is critical as new threats are discovered.
- Policy Management: Two facets of policy management come up most often during our discussions. The first is identification of which protections map to specific threats. Security, risk, and compliance teams all ask, “Are we protected against XYZ threat?” You will need to show that you are. Evaluate policy lookup and reporting. The other is tuning how to respond to threats. As we mentioned above under ‘Blocking’, most vendors allow you to tune responses either by groups of issues, or on a threat-by-threat basis. Evaluate how easy this is to use, and whether you have sufficient options to tailor responses.
- Performance: Being embedded into applications enables RASP to detect threats at different locations within your app, with context around the operation being performed. This context is passed. along with the user request, to a central enforcement point for analysis. The details behind detection vary widely between vendors, so performance varies as well. Each user request may generate dozens of checks, possibly including multiple external references. This latency can easily impact user experience, so sample how long analysis takes. Each code path will apply a different set of rules, so you will need to test several different paths, measuring both with and without RASP. You should do this under load to ensure that detection facilities do not bottleneck application performance. And you’ll want to understand what happens when some portion of RASP fails, and how it responds – does it “fail open”?
- Scalability: Most web applications scale by leveraging multiple application instances, distributing user requests distributed via a load balancer. As RASP is typically built into the application, it scales right along with it, without need for additional changes. But if RASP leverages external threat intelligence, you will want to verify this does not hamper scalability. For RASP platforms where the point of analysis – as opposed to the point of interception – is outside your application, you need to verify how the analysis component scales. For RASP products that work as a cloud service using non-deterministic code inspection, evaluate how their services scale.
- API Compatibility: Most interest in RASP is prompted by a desire to integrate into application development processes, automating security deployment alongside application code, so APIs are a central feature. Ensure the RASP products you consider are compatible with Jenkins, Ansible, Chef, Puppet, or whatever automated build tools you employ. On the back end make sure RASP feeds information back into your systems for defect tracking, logging, and Security Information and Event Management (SIEM). This data is typically available in JSON,
syslog, and other formats, but ensure each product provides what you need.
That concludes our series on RASP. As always, we encourage comments, questions and critique, so please let us know what’s on your mind.
Posted at Monday 13th June 2016 12:00 pm
(0) Comments •
By Mike Rothman
As long as I have been in security and following the markets, I have observed that no one says security is unimportant. Not out loud, anyway. But their actions usually show a different view. Maybe there is a little more funding. Maybe somewhat better visibility at the board level. But mostly security gets a lot of lip service.
In other words, security doesn’t matter. Until it does.
The international interbank payment system called SWIFT has successfully been hit multiple times by hackers, and a few other attempts have been foiled. Now they are going to start turning the screws on member banks, because SWIFT has finally realized they can be very secure but still get pwned. It doesn’t help when the New York Federal Reserve gets caught up in a ruse due to lax security at a bank in Bangladesh.
So now the lip service is becoming threats. That member banks will have their access to SWIFT revoked if they don’t maintain a sufficient security posture. Ah, more words. Will this be like the words uttered every time someone asks if security is important? Or will there be actual action behind them?
That action needs to include specific guidance on what security actually looks like. This is especially important for banks in emerging countries, which may not have a good idea of where to start. And yes, those organizations are out there. The action also needs to involve some level of third-party assessment. Self-assessment doesn’t cut it.
I think SWIFT can take a page from the Payment Card Industry. The initial PCI-DSS, and the resulting work to get laggards over a (low) security bar did help. It’s not an ongoing sustainable answer, because at some point the assessments became a joke and the controls required by the standard have predictably failed to keep pace with attacks.
But security at a lot of these emerging banks is a dumpster fire. And the folks who work with them realize where the weakest links are. But actions speak much louder than words, so watch for actions.
Photo credit: “Boots” originally uploaded by Rob Pongsajapan
Posted at Monday 13th June 2016 6:00 am
(0) Comments •
By Adrian Lane
A phone call about Activity Monitoring administrative actions on mainframes, followed by a call on security architectures for new applications in AWS. A call on SAP vulnerability scans, followed by a call on Runtime Application Self-Protection. A call on protecting relational databases against SQL injection, followed by a discussion of relevant values to key security event data for a big data analytics project. Consulting with a firm which releases code every 12 months, and discussing release management with a firm that is moving to two-a-day in a continuous deployment model. This is what my call logs look like.
If you want to see how disruptive technology is changing security, you can just look at my calendar. On any given day I am working at both extremes in security. On one hand we have the old and well-worn security problems; familiar, comfortable and boring. On the other hand we have new security problems, largely part driven by cloud and mobile technologies, and the corresponding side-effects – such as hybrid architectures, distributed identity management, mobile device management, data security for uncontrolled environments, and DevOps. Answers are not rote, problems do not always have well-formed solutions, and crafting responses takes a lot of work. Worse, the answer I gave yesterday may be wrong tomorrow, if the pace of innovation invalidates my answer. This is our new reality.
Some days it makes me dizzy, but I’ve embraced the new, if for no other reason that to avoid being run over by it. It’s challenging as hell, but it’s not boring.
On to this week’s summary:
If you want to subscribe directly to the Friday Summary only list, just click here.
Top Posts for the Week
Tool of the Week
I decided to take some to and learn about tools more common to clouds other than AWS. I was told Kubernetes was the GCP open source version of Docker, so I though that would be a good place to start. After I spent some time playing with it, I realized what I was initially told was totally wrong! Kubernetes is called a “container manager”, but it’s really focused on setting up services. Docker focuses on addressing app dependencies and packaging; Kubernetes on app orchestration. And it runs anywhere you want – not just GCP and GCE, but in other clouds or on-premise. If you want to compare Kubernetes to something in the Docker universe, it’s closest to Docker Swarm, which tackles some of the management and scalability issues.
Kubernetes has three basic parts: controllers that handle things like replication and pod behaviors; a simple naming system – essentially using key-value pairs – to identify pods; and a services directory for discovery, routing, and load balancing. A pod can be one or more Docker containers, or a standalone application. These three primitives make it pretty easy to stand up code, direct application requests, manage clusters of services, and provide basic load balancing. It’s open source and works across different clouds, so your application should work the same on GCP, Azure, or AWS. It’s not super easy to set up, but it’s not a nightmare either. And it’s incredibly flexible – once set up, you can easily create pods for different services, with entirely different characteristics.
A word of caution: if you’re heavily invested in Docker, you might instead prefer Swarm. Early versions of Kubernetes seemed to have Docker containers in mind, but the current version does not integrate with native Docker tools and APIs, so you have to duct tape some stuff together to get Docker compliant containers. Swarm is compliant with Docker’s APIs and works seamlessly. But don’t be swayed by studies that compare container startup times as a main measure of performance; that is one of the least interesting metrics for comparing container management and orchestration tools. Operating performance, ease of use, and flexibility are all far more important. If you’re not already a Docker shop, check out Kubernetes – its design is well-thought-out and purpose-built to tackle micro-service deployment. And I have not yet had a chance to use Google’s Container Engine, but it is supposed to make setup easier, with a number of supporting services.
Securosis Blog Posts this Week
Other Securosis News and Quotes
Training and Events
- We are running two classes at Black Hat USA:
Posted at Friday 10th June 2016 4:48 am
(0) Comments •
By Mike Rothman
Building Resilient Cloud Network Architectures builds on our Pragmatic Security Cloud and Hybrid Networks research, focusing on cloud-native network architectures that provide security and availability infeasible in a traditional data center. The key is that cloud computing provides architectural options which are either impossible or economically infeasible in traditional data centers, enabling greater protection and better availability.
We would like to thank Resilient Systems, an IBM Company, for licensing the content in this paper. We built the paper using our Totally Transparent Research model, leveraging what we’ve learned building cloud applications over the past 4 years.
You can get the paper from the landing page in our research library.
Posted at Thursday 9th June 2016 8:25 pm
(0) Comments •
This is the third in a three-part series on evolving encryption key management best practices. The first post is available here. This research is also posted at GitHub for public review and feedback. My thanks to Hewlett Packard Enterprise for licensing this research, in accordance with our strict Totally Transparent Research policy, which enables us to release our independent and objective research for free.
Now that we’ve discussed best practices, it’s time to cover common use cases. Well, mostly common – one of our goals for this research is to highlight emerging practices, so a couple of our use cases cover newer data-at-rest key management scenarios, while the rest are more traditional options.
Traditional Data Center Storage
It feels a bit weird to use the word ‘traditional’ to describe a data center, but people give us strange looks when we call the most widely deployed storage technologies ‘legacy’. We’d say “old school”, but that sounds a bit too retro. Perhaps we should just say “big storage stuff that doesn’t involve the cloud or other weirdness”.
We typically see three major types of data storage encrypted at rest in traditional data centers: SAN/NAS, backup tapes, and databases. We also occasionally we also see file servers encrypted, but they are in the minority. Each of these is handled slightly differently, but normally one of three ‘meta-architectures’ is used:
- Silos: Some storage tools include their own encryption capabilities, managed within the silo of the application/storage stack. For example a backup tape system with built-in encryption. The keys are managed by the tool within its own stack. In this case an external key manager isn’t used, which can lead to a risk of application dependency and key loss, unless it’s a very well-designed product.
- Centralized key management: Rather than managing keys locally, a dedicated central key management tool is used. Many organizations start with silos, and later integrate them with central key management for advantages such as improved separation of duties, security, auditability, and portability. Increasing support for KMIP and the PKCS 11 standards enables major products to leverage remote key management capabilities, and exchange keys.
- Distributed key management: This is very common when multiple data centers are either actively sharing information or available for disaster recovery (hot standby). You could route everything through a single key manager, but this single point of failure would be a recipe for disaster. Enterprise-class key management products can synchronize keys between multiple key managers. Remote storage tools should connect to the local key manager to avoid WAN dependency and latency. The biggest issue with this design is typically ensuring the different locations synchronize quickly enough, which tends to be more of an issue for distributed applications balanced across locations than for a hot standby sites, where data changes don’t occur on both sides simultaneously. Another major concern is ensuring you can centrally manage the entire distributed deployment, rather than needing to log into each site separately.
Each of those meta-architectures can manage keys for all of the storage options we see in use, assuming the tools are compatible, even using different products. The encryption engine need not come from the same source as the key manager, so long as they are able to communicate.
That’s the essential requirement: the key manager and encryption engines need to speak the same language, over a network connection with acceptable performance. This often dictates the physical and logical location of the key manager, and may even require additional key manager deployments within a single data center. But there is never a single key manager. You need more than one for availability, whether in a cluster or using a hot standby.
As we mentioned under best practices, some tools support distributing only needed keys to each ‘local’ key manager, which can strike a good balance between performance and security.
There are as many different ways to encrypt an application as there are developers in the world (just ask them). But again we see most organizations coalescing around a few popular options:
- Custom: Developers program their own encryption (often using common encryption libraries), and design and implement their own key management. These are rarely standards-based, and can become problematic if you later need to add key rotation, auditing, or other security or compliance features.
- Custom with external key management: The encryption itself is, again, programmed in-house, but instead of handling key management itself, the application communicates with a central key manager, usually using an API. Architecturally the key manager needs to be relatively close to the application server to reduce latency, depending on the particulars of how the application is programmed. In this scenario, security depends strongly on how well the application is programmed.
- Key manager software agent or SDK: This is the same architecture, but the application uses a software agent or pre-configured SDK provided with the key manager. This is a great option because it generally avoids common errors in building encryption systems, and should speed up integration, with more features and easier management. Assuming everything works as advertised.
- Key manager based encryption: That’s an awkward way of saying that instead of providing encryption keys to applications, each application provides unencrypted data to the key manager and gets encrypted data in return, and vice-versa.
We deliberately skipped file and database encryption, because they are variants of our “traditional data center storage” category, but we do see both integrated into different application architectures.
Based on our client work (in other words, a lot of anecdotes), application encryption seems to be the fastest growing option. It’s also agnostic to your data center architecture, assuming the application has adequate access to the key manager. It doesn’t really care whether the key manager is in the cloud, on-premise, or a hybrid.
Speaking of hybrid cloud, after application encryption (usually in cloud deployments) this is where we see the most questions. There are two main use cases:
- Extending existing key management to the cloud: Many organizations already have a key manager they are happy with. As they move into the cloud they may either want to maintain consistency by using the same product, or need to support a migrating application without having to gut their key management to build something new. One approach is to always call back over the network to the on-premise key manager. This reduces architectural changes (and perhaps additional licensing), but often runs into latency and performance issues, even with a direct network connection. Alternatively you can deploy a virtual appliance version of your key manager as a ‘bastion’ host, and synchronize keys so assets in the cloud connect to the distributed virtual server for better performance.
- Building a root of trust for cloud deployments: Even if you are fully comfortable deploying your key manager in the cloud, you may still want an on-premise key manager to retain backups of keys or support interoperability across cloud providers.
Generally you will want to run a virtual version of your key manager within the cloud to satisfy performance requirements, even though you could route all requests back to your data center. It’s still essential to synchronize keys, backups, and even logs back on-premise or to multiple, distributed cloud-based key managers, because no single instance or virtual machine can provide sufficient reliability.
Bring Your Own Key
This is a very new option with some cloud providers who allow you to use an encryption service or product within their cloud, while you retain ownership of your keys. For example you might provide your own file encryption key to your cloud provider, who then uses it to encrypt your data, instead of using a key they manage.
The name of the game here is ‘proprietary’. Each cloud provider offers different ways of supporting customer-managed keys. You nearly always need to meet stringent network and location requirements to host your key manager yourself, or you need to use your cloud provider’s key management service, configured so you can manage your keys yourself.
Posted at Wednesday 8th June 2016 10:42 pm
(0) Comments •
By Mike Rothman
Like many of you, I spend a lot of time sitting on my butt banging away at my keyboard. I’m lucky that the nature of my work allows me to switch locations frequently, and I can choose to have a decent view of the world at any given time. Whether it’s looking at a wide assortment of people in the various Starbucks I frequent, my home office overlooking the courtyard, or pretty much any place I can open my computer on my frequent business travels. Others get to spend all day in their comfy (or not so comfy) cubicles, and maybe stroll to the cafeteria once a day.
I have long thought that spending the day behind a desk isn’t the most effective way to do things. Especially for security folks, who need to be building relationships with other groups in the organization and proselytizing the security mindset. But if you are reading this, your job likely involves a large dose of office work. Even if you are running from meeting to meeting, experiencing the best conference rooms, we spend our days inside breathing recycled air under the glare of florescent lights.
Every time I have the opportunity to explore nature a bit, I remember how cool it is. Over the long Memorial Day weekend, we took a short trip up to North Georgia for some short hikes, and checked out some cool waterfalls. The rustic hotel where we stayed didn’t have cell service (thanks AT&T), but that turned out to be great. Except when Mom got concerned because she got a message that my number was out of service. But through the magic of messaging over WiFi, I was able to assure her everything was OK. I had to exercise my rusty map skills, because evidently the navigation app doesn’t work when you have no cell service. Who knew?
It was really cool to feel the stress of my day-to-day activities and responsibilities just fade away once we got into the mountains. We wondered where the water comes from to make the streams and waterfalls. We took some time to speculate about how long it took the water to cut through the rocks, and we were astounded by the beauty of it all. We explored cute towns where things just run at a different pace. It really put a lot of stuff into context for me. I (like most of you) want it done yesterday, whatever we are talking about.
Being back in nature for a while reminded me there is no rush. The waterfalls and rivers were there long before I got here. And they’ll be there long after I’m gone. In the meantime I can certainly make a much greater effort to take some time during the day and get outside. Even though I live in a suburban area, I can find some green space. I can consciously remember that I’m just a small cog in a very large ecosystem. And I need to remember that the waterfall doesn’t care whether I get through everything on my To Do list. It just flows, as should I.
Photo credit: “Panther Falls - Chattachoochee National Forest” - Mike Rothman May 28, 2016
Security is changing. So is Securosis. Check out Rich’s post on how we are evolving our business.
We’ve published this year’s Securosis Guide to the RSA Conference. It’s our take on the key themes of this year’s conference (which is really a proxy for the industry), as well as deep dives on cloud security, threat protection, and data security. And there is a ton of meme goodness… Check out the blog post or download the guide directly (PDF).
The fine folks at the RSA Conference posted the talk Jennifer Minella and I did on mindfulness at the 2014 conference. You can check it out on YouTube. Take an hour. Your emails, alerts, and Twitter timeline will be there when you get back.
Have you checked out our video podcast? Rich, Adrian, and Mike get into a Google Hangout and… hang out. We talk a bit about security as well. We try to keep these to 15 minutes or less, and usually fail.
We are back at work on a variety of blog series, so here is a list of the research currently underway. Remember you can get our Heavy Feed via RSS, with our content in all its unabridged glory. And you can get all our research papers too.
Evolving Encryption Key Management Best Practices
Incident Response in the Cloud Age
Understanding and Selecting RASP
Maximizing WAF Value
Recently Published Papers
Incite 4 U
Healthcare endpoints are sick: Not that we didn’t already know, given all the recent breach notifications from healthcare organizations, but they are having a tough time securing their endpoints. The folks at Duo provide some perspective on why. It seems those endpoints log into twice as many apps, and a large proportion are based on leaky technology like Flash and Java. Even better, over 20% use unsupported (meaning unpatched) versions of Internet Explorer. LOL. What could possibly go wrong? I know it’s hard, and I don’t mean to beat up on our fine healthcare readers. We know there are funding issues, the endpoints are used by multiple people, and they are in open environments where almost anyone can go up and mess around with them. And don’t get me started on the lack of product security in too many medical systems and products. But all the same, it’s not like they have access to important information or anything. Wait… Oh, they do. Sigh. – MR
Insecure by default: Scott Schober does a nice job outlining Google’s current thinking on data encryption and the security of users’ personal data. Essentially for the new class of Google’s products, the default is to disable end-to-end encryption. You do have the option of turning it on, but Google still manages the encryption keys (unlike Apple). But their current advertising business model, and the application of machine learning to aid users beyond what’s provided today, pretty much dictate Google’s need to collect and track personally identifiable information. Whether that is good or bad is in the eye of the beholder, but realize that when you plunk a Google Home device into your home, it’s always listening and will capture and analyze everything. We now understand that at the very least the NSA siphons off all content sent to the Google cloud, so we recommend enabling end-to-end encryption, which forces intelligence and law enforcement to crack the encryption or get a warrant to view personal information. Even though this removes useful capabilities. – AL
Moby CEO: It looks like attackers are far better at catching whales than old Ahab. In what could be this year’s CEO cautionary tale (after the Target incident a few years back), an Austrian CEO got the ax because he got whaled to the tune of $56MM. Yes, million (US dollars, apparently). Of course if a finance staffer is requested to transfer millions in [$CURRENCY], there should be some means of verifying the request. It is not clear where the internal controls failed in this case. All the same, you have to figure that CEO will have “confirm internal financial controls” at the top of his list at his next gig. If there is one. – MR
Tagged and tracked: It’s fascinating to watch the number of ways users’ online activity can be tracked, with just about every conceivable browser plug-in and feature minable for user identity and activity. A recent study from Princeton University called The Long Tail of Online Tracking outlines the who, what, and how of tracking software. It’s no surprise that Google, Facebook, and Twitter are tracking users on most sites. What is surprising is that many sites won’t load under the HTTPS protocol, and degenerate to HTTP to ensure content sharing with third parties. As is the extent to which tracking firms go to identify your devices – using AudioContext, browser configuration, browser extensions, and just about everything else they can access to build a number of digital fingerprints to identify people. If you’re interested in the science behind this, that post links to a variety of research, as well as the Technical Analysis of client identification mechanisms from the Google Chromium Security team. And they should know how to identify users (doh!). – AL
Why build it once when you can build it 6 times? I still love that quote from the movie Contact. “Why build it once, when you can build it twice for twice the price?” Good thing they did when the first machine was bombed. It seems DARPA takes the same approach – they are evidently underwriting 6 different research shops to design a next generation DDoS defense. It’s not clear (from that article, anyway) whether the groups were tasked with different aspects of a larger solution. DDoS is a problem. But given the other serious problems facing IT organizations, is it the most serious? It doesn’t seem like it to me. But all the same, if these research shops make some progress, that’s a good thing and it’s your tax dollars at work (if you pay taxes in the US, anyway). – MR
Posted at Wednesday 8th June 2016 11:00 am
(0) Comments •
By Mike Rothman
The old business rule is: when something works, do more of it. By that measure ransomware is clearly working. One indication is the number of new domains popping up which are associated with ransomware attacks. According to an Infoblox research report (and they provide DNS services, so they should know), there was a 35x increase in ransomware domains in Q1.
You have also seen the reports of businesses getting popped when an unsuspecting employee falls prey to a ransomware attack; the ransomware is smart enough to find a file share and encrypt all those files too. And even when an organization pays, the fraudster is unlikely to just give them the key and go away.
This is resulting in real losses to organizations – the FBI says organizations lost over $200 million in Q1 2016. Even if that number is inflated, it’s a real business, so you will see a lot more of it. The attackers follow Mr. Market’s lead, and clearly the ‘market’ loves ransomware right now.
So what can you do? Besides continue to train employees not to click stuff? An article at NetworkWorld claims to have the answer for how to deal with ransomware. They mention strategies for trying to recover faster via “regular and consistent backups along with tested and verified restores.” This is pretty important – just be aware that you may be backing up encrypted files, so make sure you have backups from far enough back that you can recover the files before the attack. This is obvious in retrospect, but backup/recovery is a good practice regardless of whether you are trying to deal with malware, ransomware, or hardware failure that puts data at risk.
Their other suggested defense is to prevent the infection. The article’s prescribed approach is application whitelisting (AWL). We are fans of AWL in specific use cases – here the ransomware wouldn’t be allowed to run on devices, because it’s not authorized. Of course the deployment issues with AWL, given how it can impact user experience, are well known. Though we do find whitelisting appropriate for devices that don’t change frequently or which hold particularly valuable information, so long as you can deal with the user resistance.
They don’t mention other endpoint protection solutions, such as isolation on endpoint devices. We have discussed the various advanced endpoint defense strategies, and will be updating that research over the next couple of months. Adding to the confusion, every endpoint defense vendor seems to be shipping a ‘ransomware’ solution… which is really just their old stuff, rebranded.
So what’s the bottom line? If you have an employee who falls prey to ransomware, you are going to lose data. The question is: How much? With advanced prevention technologies deployed, you may stop some of the attacks. With a solid backup strategy, you may minimize the amount of data you lose. But you won’t escape unscathed.
Posted at Tuesday 7th June 2016 3:41 pm
(0) Comments •
By Mike Rothman
In Building a Vendor (IT) Risk Management Program, we explain why you can no longer ignore the risk presented by third-party vendors and other business partners, including managing an expanded attack surface and new regulations demanding effective management of vendor risk. We then offer ideas for how to build a structured and systematic program to assess vendor (IT) risk, and take action when necessary.
We would like to thank BitSight Technologies for licensing the content in this paper. Our unique Totally Transparent Research model allows us to perform objective and useful research without requiring paywalls or other such nonsense, which make it hard for the people who need our research to get it. A day doesn’t go by where we aren’t thankful to all the companies who license our research.
You can get the paper from the landing page in our research library.
Posted at Monday 6th June 2016 12:30 pm
(0) Comments •
This is the second in a four-part series on evolving encryption key management best practices. The first post is available here. This research is also posted at GitHub for public review and feedback. My thanks to Hewlett Packard Enterprise for licensing this research, in accordance with our strict Totally Transparent Research policy, which enables us to release our independent and objective research for free.
If there is one thread tying together all the current trends influencing data centers and how we build applications, it’s distribution. We have greater demand for encryption in more locations in our application stacks – which now span physical environments, virtual environments, and increasing barriers even within our traditional environments.
Some of the best practices we will highlight have long been familiar to anyone responsible for enterprise encryption. Separation of duties, key rotation, and meeting compliance requirements have been on the checklist for a long time. Others are familiar, but have new importance thanks changes occurring in data centers. Providing key management as a service, and dispersing and integrating into required architectures aren’t technically new, but they are in much greater demand than before. Then there are the practices which might not make the list, such as supporting APIs and distributed architectures (potentially spanning physical and virtual appliances).
As you will see, the name of the game is consolidation for consistency and control; simultaneous with distribution to support diverse encryption needs, architectures, and project requirements.
But before we jump into recommendations, keep our focus in mind. This research is for enterprise data centers, including virtualization and cloud computing. There are plenty of other encryption use cases out there which don’t necessarily require everything we discuss, although you can likely still pick up a few good ideas.
Build a key management service
Supporting multiple projects with different needs can easily result in a bunch of key management silos using different tools and technologies, which become difficult to support. One for application data, another for databases, another for backup tapes, another for SANs, and possibly even multiple deployments for the same functions, as individual teams pick and choose their own preferred technologies. This is especially true in the project-based agile world of the cloud, microservices, and containers. There’s nothing inherently wrong with these silos, assuming they are all properly managed, but that is unfortunately rare. And overlapping technologies often increase costs.
Overall we tend to recommend building centralized security services to support the organization, and this definitely applies to encryption. Let a smaller team of security and product pros manage what they are best at and support everyone else, rather than merely issuing policy requirements that slow down projects or drive them underground.
For this to work the central service needs to be agile and responsive, ideally with internal Service Level Agreements to keep everyone accountable. Projects request encryption support; the team managing the central service determines the best way to integrate, and to meet security and compliance requirements; then they provide access and technical support to make it happen.
This enables you to consolidate and better manage key management tools, while maintaining security and compliance requirements such as audit and separation of duties. Whatever tool(s) you select clearly need to support your various distributed requirements. The last thing you want to do is centralize but establish processes, tools, and requirements that interfere with projects meeting their own goals.
And don’t focus so exclusively on new projects and technologies that you forget about what’s already in place. Our advice isn’t merely for projects based on microservices containers, and the cloud – it applies equally for backup tapes and SAN encryption.
Centralize but disperse, and support distributed needs
Once you establish a centralized service you need to support distributed access. There are two primary approaches, but we only recommend one for most organizations:
- Allow access from anywhere. In this model you position the key manager in a location accessible from wherever it might be needed. Typically organizations select this option when they want to only maintain a single key manager (or cluster). It was common in traditional data centers, but isn’t well-suited for the kinds of situations we increasingly see today.
- Distributed architecture. In this model you maintain a core “root of trust” key manager (which can, again, be a cluster), but then you position distributed key managers which tie back to the central service. These can be a mix of physical and virtual appliances or servers. Typically they only hold the keys for the local application, device, etc. that needs them (especially when using virtual appliances or software on a shared service). Rather than connecting back to complete every key operation, the local key manager handles those while synchronizing keys and configuration back to the central root of trust.
Why distribute key managers which still need a connection back home? Because they enable you to support greater local administrative control and meet local performance requirements. This architecture also keeps applications and services up and running in case of a network outage or other problem accessing the central service. This model provides an excellent balance between security and performance.
For example you could support a virtual appliance in a cloud project, physical appliances in backup data centers, and backup keys used within your cloud provider with their built-in encryption service.
This way you can also support different technologies for distributed projects. The local key manager doesn’t necessarily need to be the exact same product as the central one, so long as they can communicate and both meet your security and compliance requirements. We have seen architectures where the central service is a cluster of Hardware Security Modules (appliances with key management features) supporting a distributed set of HSMs, virtual appliances, and even custom software.
The biggest potential obstacle is providing safe, secure access back to the core. Architecturally you can usually manage this with some bastion systems to support key exchange, without opening the core to the Internet. There may still be use cases where you cannot tie everything together, but that should be your last option.
Be flexible: use the right tool for the right job
Building on our previous recommendation, you don’t need to force every project to use a single tool. One of the great things about key management is that modern systems support a number of standards for intercommunication. And when you get down to it, an encryption key is merely a chunk of text – not even a very large one.
With encryption systems, keys and the encryption engine don’t need to be the same product. Even your remote key manager doesn’t need to be the same as the central service if you need something different for that particular project.
We have seen large encryption projects fail because they tried to shoehorn everything into a single monolithic stack. You can increase your chances for success by allowing some flexibility in remote tools, so long as they meet your security requirements. This is especially true for the encryption engines that perform actual crypto operations.
Provide APIs, SDKs, and toolkits
Even off-the-shelf encryption engines sometimes ship with less than ideal defaults, and can easily be used incorrectly. Building a key management service isn’t merely creating a central key manager – you also need to provide hooks to support projects, along with processes and guidance to ensure they are able to get up and running quickly and securely.
- Application Programming Interfaces: Most key management tools already support APIs, and this should be a selection requirement. Make sure you support RESTful APIs, which are particularly ubiquitous in the cloud and containers. SOAP APIs are considered burdensome these days.
- Software Development Kits: SDKs are pre-built code modules that allow rapid integration into custom applications. Provide SDKs for common programming languages compatible with your key management service/products. If possible you can even pre-configure them to meet your encryption requirements and integrate with your service.
- Toolkits: A toolkit includes all the technical pieces a team needs to get started. It can include SDKs, preconfigured software agents, configuration files, and anything else a project might need to integrate encryption into anyything from a new application to an old tape backup system.
Provide templates and recommendations, not just standards and requirements
All too often security sends out requirements, but fails to provide specific instructions for meeting those requirements. One of the advantages of standardization around a smaller set of tools is that you can provide detailed recommendations, instructions, and templates to satisfy requirements.
The more detail you can provide the better. We recommend literally creating instructional documents for how to use all approved tools, likely with screenshots, to meet encryption needs and integrate with your key management service. Make them easily available, perhaps through code repositories to better support application developers. On the operations side, include them not only for programming and APIs, but for software agents and integration into supported storage repositories and backup systems.
If a project comes up which doesn’t fit any existing toolkit or recommendations, build them with that project team and add the new guidance to your central repository. This dramatically speeds up encryption initiatives for existing and new platforms.
Meet core security requirements
So far we have focused on newer requirements to meet evolving data center architectures, the impact of the cloud, and new application design patterns; but all the old key management practices still apply:
- Enforce separation of duties: Implement multiple levels of administrators. Ideally require dual authorities for operations directly impacting key security and other major administrative functions.
- Support key rotation: Ideally key rotation shouldn’t create downtime. This typically requires both support in the key manager and configuration within encryption engines and agents.
- Enable usage logs for audit, including purpose codes: Logs may be required for compliance, but are also key for security. Purpose codes tell you why a key was requested, not just by who or when.
- Support standards: Whatever you use for key management must support both major encryption standards and key exchange/management standards. Don’t rely on fully proprietary systems that will overly limit your choices.
- Understand the role of FIPS and its different flavors, and ensure you meet your requirements: FIPS 140-2 is the most commonly accepted standard for cryptographic modules and systems. Many products advertise FIPS compliance (which is often a requirement for other compliance, such as PCI). But FIPS is a graded standard with different levels ranging from a software module, to plugin cards, to a fully tamper-resistant dedicated appliance. Understand your FIPS requirements, and if you evaluate a “FIPS certified” ‘appliance’, don’t assume the entire appliance is certified – it might be only the software, not the whole system. You may not always need the highest level of assurance, but start by understanding your requirements, and then ensure your tool actually meets them.
There are many more technical best practices beyond the scope of this research, but the core advice that might differ from what you have seen in the past is:
- Provide key management as a service to meet diverse encryption needs.
- Be able to support distributed architectures and a range of use cases.
- Be flexible on tool choice, then provide technical components and clear guidance on how to properly use tools and integrate them into your key management program.
- Don’t neglect core security requirements.
In our next section we will start looking at specific use cases, some of which we have already hinted at.
Posted at Friday 3rd June 2016 5:55 pm
(0) Comments •
By Adrian Lane
Unlike my business partners, who have been logging thousands of air miles, speaking at conferences and with clients around the country, I have been at home. And the mildest spring in Phoenix’s recorded history has been a blessing, as we’re 45 days past the point 100F days typically start. Bike rides. Hiking. Running. That is, when I get a chance to sneak outdoors and enjoy it. With our pivot there is even more writing and research going on than normal, which I wasn’t sure was possible. You will begin to see the results of this work within the next couple weeks, and we look forward to putting a fresh face on our business. That launch will coincide with us posting lots more hands-on advice for cloud security and migrations.
And as a heads-up, I will be talking big data security over at SC Magazine on the 20th. I’ll tweet out a link at @AdrianLane next week if you’re interested.
You can subscribe to only the Friday Summary.
Top Posts for the Week
Tool of the Week
“Server-less computing? What do you mean?” Rich and I were discussing cloud deployment options with one of the smartest engineering managers I know, and he was totally unaware of serverless cloud computing architectures. If he was unaware of this capability, lots of other people probably are as well. So this week’s Tool of the Week section will discuss not a single tool, but instead a functional paradigm offered by multiple cloud vendors. What are they? Google’s GCP page best captures the idea: essentially a “lightweight, event-based, asynchronous solution that allows you to create small, single-purpose functions that respond to Cloud events without the need to manage a server or a runtime environment” What Google does not mention there is that these functions tend to be very fast, and you can run multiple copies in parallel to scale capacity.
It really embodies microservices. You can construct an entire application from these functions. For example take a stream of data and run it through a series of functions to process it. It could be audio or image files’ or real-time event data inspection, transformation, enrichment, comparison… or any combination you can think of. The best part? There is no server. There is no OS to set up. No CPU or disk capacity to specify. No configuration files. No network ports to manage. It’s simply a logical function running out there in the ‘ether’ of your public cloud.
Google calls its version on GCP Cloud Functions. Amazon’s version on AWS is called (Lambda functions](http://docs.aws.amazon.com/lambda/latest/dg/welcome.html). Microsoft calls the version on Azure simply Functions. Check out their API documentation – they all work slightly differently, and some have specific storage requirements to act as endpoints, but the concept is the same. And the pricing for these services is pretty low – with Lambda, for example, the first million requests are free, and Amazon charges 20 cents per million requests after that.
This feature is one of the many reasons we tell companies to reconsider their application architectures when moving to cloud services. We’ll post some tidbits on security for these services in the future. For now, check them out!
Securosis Blog Posts this Week
Training and Events
- We are running two classes at Black Hat USA:
Posted at Friday 3rd June 2016 5:21 am
(0) Comments •
By Mike Rothman
When we do a process-centric research project, it works best to wrap up with a scenario to illuminate the concepts we discuss through the series, and make things a bit more tangible.
In this situation imagine you work for a mid-sized retailer which uses a mixture of in-house technology and SaaS, and has recently moved a key warehousing system to an IaaS provider as part of rebuilding the application for cloud computing. You have a modest security team of 10, which is not enough, but a bit more than many of your peers. Senior management understands why security is important (to a point) and gives you decent leeway, especially regarding the new IaaS application. In fact you were consulted during the IaaS architecture phase and provided some guidance (with some help from your friends at Securosis) on building a resilient cloud network architecture, and how to secure the cloud control plane. You also had an opportunity to integrate some orchestration and automation technology into the new cloud technology stack.
You have your team on fairly high alert, because a number of your competitors have recently been targeted by an organized crime ring which has gained a foothold among your competitors; and proceeded to steal a ton of information about customers, pricing, and merchandising strategies. This isn’t your first rodeo, so you know that when there is smoke there is usually fire, and you decide to task one of your more talented security admins with a little proactive hunting in your environment. Just to make sure nothing bad is going on.
The admin starts poking around, searching internal security data with some of the more recent malware samples found in the attacks on the other retailers. The samples were provided by your industry’s ISAC (Information Sharing and Analysis Center). The analyst got a hit on one of the samples, confirming your concern. You have an active adversary on your network. So now you need to engage your incident response process.
Job 1: Initial Triage
Once you know there is a situation you assemble the response team. There aren’t that many of you, and half the team needs to pay attention to ongoing operational tasks, because taking down systems wouldn’t make you popular with senior management or investors. You also don’t want to jump the gun until you know what you’re dealing with, so you inform the senior team of the situation, but don’t take any systems down. Yet.
The adversary is active on your internal network, so they most likely entered via phishing or another social engineering attack. Searches found indications of the malware on 5 devices, so you take those devices off the network immediately. Not shut down, but put on a separate network with Internet access to avoid tipping off the adversary to their discovery.
Then you check your network forensics tool, looking for indications that data has been leaking. There are a few suspicious file transfers, but luckily you integrated your firewall egress filtering capability with your forensics tool. So once the firewall showed anomalous traffic being sent to known bad sites (via a threat intelligence integration on the firewall), you automatically started capturing network traffic from the devices which triggered the alert. Automation is sure easier that doing everything manually.
As part of your initial triage you got endpoint telemetry alerting you to issues, and network forensics data for a clue to what’s leaking. This is enough to know you not only have an active adversary, but that more than likely you lost data. So you fire up your case management system to structure your investigation and store all the artifacts of your investigation.
Your team is tasked with specific responsibilities, and sent on their way to get things done. You make the trek to the executive floor to keep senior management updated on the incident.
Check the Cloud
The attack seems to have started on your internal network, but you don’t want to take chances, and you need to make sure the new cloud-based application isn’t at risk. A quick check of the cloud console shows strange activity on one of your instances. A device within the presentation layer of the cloud stack was flagged by your IaaS provider’s monitoring system because there was an unauthorized change on that specific instance. It looks like the time you spent setting up that configuration monitoring service was well spent.
Security was involved in architecting the cloud stack, so you are in good shape. The application was built to be isolated. Even though it appears the presentation layer has been compromised, adversaries shouldn’t be able to reach anything of value. And the clean-up has already happened. Once the IaaS monitoring system threw an alert, that instance was taken offline and put into a special security group accessible only by investigators. A forensic server was spun up, and some additional analysis was performed. Orchestration and automation facilitating incident response again.
The presentation layer has large variances in how much traffic it needs to handle, so it was built using auto-scaling technology and immutable servers. Once the (potentially) compromised instance was removed from the group, another instance with a clean configuration was spun up and to share workload. But it’s not clear whether this attack is related to the other incident, so you take the information about the cloud attack, pull it down, and feed it into your case management system. But the reality is that this attack, even if related, doesn’t present a danger at this point, so it’s put to the side while you focus on the internal attack and probable exfiltration.
Building the Timeline
Now that you have completed initial triage, it’s time to dig into the attack and start building a timeline of what happened. You start by looking at the comprised endpoints and network metadata to see what the adversaries did. From examining endpoint telemetry you deduced that Patient Zero was a contractor on the Human Resources (HR) team. This individual was tasked with looking at resumes submitted to the main HR email account, and initial qualification screening for an open position. The resume was a compromised Word file using a pretty old Windows 7 attack. It turns out the contractor was using their own machine, which hadn’t been patched and was vulnerable. You can’t be that irritated with the contractor – it was their job to open those files. The malware rooted the device, connected up to a botnet, and then installed a Remote Access Trojan (RAT) to allow the adversary to take control of the device and start a systematic attack against the rest of your infrastructure.
You ponder how your organization’s BYOD policy enables contractors to use their own machines. The operational process failure was in not inspecting the machine on connection to the network; you didn’t make sure it was patched, or running an authorized configuration. That’s something to scrutinize as part of the post-mortem.
Once the adversary had presence on your network, they proceeded to compromise another 4 devices, ultimately ending up on both the CFO’s and the VP of Merchandising’s devices. Network forensic metadata shows how they moved laterally within the network, taking advantage of weak segmentation between internal networks. There are only so many hours in the day, and the focus had been on making sure the perimeter was strong and monitoring ingress traffic.
Once you know the CFO’s and VP of Merchandising’s devices were compromised, you can clearly see exfiltration in network metadata. A quick comparison of file sizes in data captured once the egress filter triggered shows that they probably got the latest quarterly board report, as well as a package of merchandising comps and plans for an exclusive launch with a very hot new fashion company. It was a bit of a surprise that the adversary didn’t bother encrypting the stolen data, but evidently they bet that a mid-sized retailer wouldn’t have sophisticated DLP or egress content filtering. Maybe they just didn’t care whether anyone found out what was exfiltrated after the fact, or perhaps they were in a hurry and wanted the data more than to remain undiscovered.
You pat yourself on the back, once, that your mature security program included an egress filter triggered a full packet capture of outbound traffic from all the compromised devices. So you know exactly what was taken, when, and where it went. That will be useful later, when talking to law enforcement and possibly prosecuting at some point, but right now that’s little consolation.
Cleaning up the Mess
Now that you have an incident timeline, it’s time to clean up and return your environment to a good state. The first step is to clean up the affected machines. Executives are cranky because you decided to reimage their machines, but your adversary worked to maintain persistence on compromised devices in other attacks, so prudence demands you wipe them.
The information on this incident will need to be aggregated, then packaged up for law enforcement and the general counsel, in preparation for the unavoidable public disclosure. You take another note that the team should consider using a case management system to track incident activity, provide a place to store case artifacts, and ensure proper chain of custody. Given your smaller team, that should help smooth your next incident response.
Finally, this incident was discovered by a savvy admin hunting across your networks. So to complete the active part of this investigation, you task the same admin with hunting back through the environment to make sure this attack has been fully eradicated, and no similar attacks are in process. Given the size of your team, it’s a significant choice to devote resources to hunting, but given the results, this is an activity you will need to perform on a monthly cadence.
Closing the Loop
To finalize this incident, you hold a post-mortem with the extended team, including representatives from the general counsel’s office. The threat intelligence being used needs to be revisited and scrutinized, because the adversary connected to a botnet but wasn’t detected. And the rules on your egress filters have been tightened because if the exfiltrated data had been encrypted, your response would have been much more complicated. The post-mortem also provided a great opportunity to reinforce the importance of having security involved in application architecture, given how well the new IaaS application stood up under attack.
Another reminder that sometimes a skilled admin who can follow their instincts is the best defense. Tools in place helped accelerate response and root cause identification, and made remediation more effective. But Incident Response in the Cloud Age involves both people and technology, along with internal and external data, to ensure effective and efficient investigation and successful remediation.
Posted at Thursday 2nd June 2016 12:46 pm
(0) Comments •
By Adrian Lane
This post will offer examples for how to integrate RASP into a development pipeline. We’ll cover both how RASP fits into the technology stack, and development processes used to deliver applications. We will close this post with a detailed discussion of how RASP differs from other security technologies, and discuss advantages and tradeoffs compared to other security technologies.
As we mentioned in our introduction, our research into DevOps produced many questions on how RASP worked, and whether it is an effective security technology. The questions came from non-traditional buyers of security products: application developers and product managers. Their teams, by and large, were running Agile development processes. The majority were leveraging automation to provide Continuous Integration – essentially rebuilding and retesting the application repeatedly and automatically as new code was checked in. Some had gone as far as Continuous Deployment (CD) and DevOps. To address this development-centric perspective, we offer the diagram below to illustrate a modern Continuous Deployment / DevOps application build environment. Consider each arrow a script automating some portion of source code control, building, packaging, testing, or deployment of an application.
Security tools that fit this model are actively being sought by development teams. They need granular API access to functions, quick production of test results, and delivery of status back to supporting services.
- Installation: As we mentioned back in the technology overview, RASP products differ in how they embed within applications. They all offer APIs to script configuration and runtime policies, but how and where they fit in differ slightly between products. Servlet filters, plugins, and library replacement are performed as the application stack is assembled. These approaches augment an application or application ‘stack’ to perform detection and blocking. Virtualization and JVM replacement approaches augment run-time environments, modifying the subsystems that run your application modified to handle monitoring and detection. In all cases these, be it on-premise or as a cloud service, the process of installing RASP is pretty much identical to the build or deployment sequence you currently use.
- Rules & Policies: We found the majority of RASP offerings include canned rules to detect or block most known attacks. Typically this blacklist of attack profiles maps closely to the OWASP Top Ten application vulnerability classes. Protection against common variants of standard attacks, such as SQL injection and session mis-management, is included. Once these rules are installed they are immediately enforced. You can enable or disable individual rules as you see fit. Some vendors offer specific packages for critical attacks, mapped to specific CVEs such as Heartbleed. Bundles for specific threats, rather than by generic attack classes, help security and risk teams demonstrate policy compliance, and make it easier to understand which threats have been addressed. But when shopping for RASP technologies you need to evaluate the provided rules carefully. There are many ways to attack a site with SQL injection, and many to detect and block such attacks, so you need to verify the included rules cover most of the known attack variants you are concerned with. You will also want to verify that you can augment or add rules as you see fit – rule management is a challenge for most security products, and RASP is no different.
- Learning the application: Not all RASP technologies can learn how an application behaves, or offer whitelisting of application behaviors. Those that do vary greatly in how they function. Some behave like their WAF cousins, and need time to learn each application – whether by watching normal traffic over time, or by generating their own traffic to ‘crawl’ each application in a non-production environment. Some function similarly to white-box scanners, using application source to learn.
- Coverage capabilities: During our research we found uneven RASP coverage of common platforms. Some started with Java or .Net, and are iterating to cover Python, Ruby, Node.js, and others. Your search for RASP technologies may be strongly influenced by available platform support. We find that more and more, applications are built as collections of microservices across distributed architectures. Application developers mix and match languages, choosing what works best in different scenarios. If your application is built on Java you’ll have no trouble finding RASP technology to meet your needs. But for mixed environments you will need to carefully evaluate each product’s platform coverage.
Development Process Integration
Software development teams leverage many different tools to promote security within their overarching application development and delivery processes. The graphic below illustrates the major phases teams go through. The callouts map the common types of security tests at specific phases within an Agile, CI, and DevOps frameworks. Keep in mind that it is still early days for automated deployment and DevOps. Many security tools were built before rapid and automated deployment existed or were well known. Older products are typically too slow, some cannot focus their tests on new code, and others do not offer API support. So orchestration of security tools – basically what works where – is far from settled territory. The time each type of test takes to run, and the type of result it returns, drives where it fits best into the phases below.
RASP is designed to be bundled into applications, so it is part of the application delivery process. RASP offers two distinct approaches to help tackle application security. The first is in the pre-release or pre-deployment phase, while the second is in production. Either way, deployment looks very similar. But usage can vary considerably depending on which is chosen.
- Pre-release testing: This is exactly what it sounds like: RASP is used when the application is fully constructed and going through final tests prior to being launched. Here RASP can be deployed in several ways. It can be deployed to monitor only, using application tests and instrumenting runtime behavior to learn how to protect the application. Alternatively RASP can monitor while security tests are invoked in an attempt to break the application, with RASP performing security analysis and transmitting its results. Development and Testing teams can learn whether RASP detected the tested attacks. Finally, RASP can be deployed in full blocking mode to see whether security tests were detected and blocked, and how they impacted the user experience. This provides an opportunity to change application code or augment the RASP rules before the application goes into production.
- Production testing: Once an application is placed in a production environment, either before actual customers are using it (using Blue-Green deployment) or afterwards, RASP can be configured to block malicious application requests. Regardless of how the RASP tool works (whether via embedded runtime libraries, servlet filters, in-memory execution monitoring, or virtualized code paths), it protects applications by detecting attacks in live runtime behavior. This model essentially provides execution path scanning, monitoring all user requests and parameters. Unlike technologies which block requests at the network or web proxy layer, RASP inspects requests at the application layer, which means it has full access to the application’s inner workings. Working at the API layer provides better visibility to determine whether a request is malicious, and more focused blocking capabilities than external security products.
- Runtime protection: Ultimately RASP is not just for testing, but for full runtime protection and blocking of attacks.
Regardless of where you deploy RASP, you need to test to ensure it is delivering on its promise. We advocate an ongoing testing process to ensure your policies are sound, and that you ultimately block what you need to block. Of course you can use other scanners to probe an application to ensure RASP is working prior to deployment, and other tools (such as Havij and SQLmap) to automate testing, but that’s only half the story. For full confidence that your apps are protected, we still recommend actual humans banging away at your applications. Penetration testing, at least periodically, helps verify your defenses are effective.
To WAF or not to WAF
Why did the market develop this brand-new security technology? Especially when existing technologies – most notably Web Application Firewalls (WAF) – already provided similar functions. Both block attacks on web-facing applications. They are both focused on known attack vectors, and include blacklists of attack patterns. Some optionally offer whitelists of known (approved) application functions. And both can ‘learn’ appropriate application behaviors. In fact most enterprises, especially which must comply with PCI-DSS, have already bought and deployed WAF. So why spend time and money on a new tool?
WAF management teams speak of the difficulty maintaining ‘positive’ security rules, and penetration testers grouse about how most WAFs are misconfigured, but neither was the primary driver of the search for an alternative which produced RASP. Development teams were looking for something different. Most stated their basic requirement was for something to work within their development pipeline. WAF’s lack of APIs for automatic setup, the time needed to learn application behavior, and most importantly the ability to pinpoint vulnerable code modules, were all cited as reasons WAF failed to satisfy developers. Granted, these requests came from more ‘Agile’ teams, more often building new applications than maintaining existing platforms. Still, we heard consistently that RASP meets a market demand unsatsfied by other application security technologies.
It is important to recognize that these technologies can be complementary, not necessarily competitive. There is absolutely no reason you can’t run RASP alongside your existing WAF. Some organizations continue to use cloud-based WAF as front-line protection, while embedding RASP into applications. Some use WAF to provide “threat intelligence”, DoS protection, and network security, while using RASP to fine-tune application security. Still others double down with overlapping security functions, much the way many organizations use layered anti-spam filters, accepting redundancy for broader coverage or unique benefits from each product. WAF platforms have a good ten-year head start, with broader coverage and very mature platforms, so some firms are loath to throw away WAF until RASP is fully proven.
Tomorrow we will close out this series with a brief buyers guide. We look forward to your comments!
Posted at Tuesday 31st May 2016 3:50 pm
(0) Comments •