We will wrap up this series with a migration path to monitoring the hybrid cloud. Whether you choose to monitor the cloud services you consume, or go all the way and create your own SOC in the cloud, these steps will get you there. Let’s dive in.
Phase 1: Deploy Collectors
The first phase is to collect and aggregate the data. You need to decide how to deploy event collectors – including agents, ‘edge’ proxies, and reverse proxies – to gather information from cloud resources. Your goal is to gather events as quickly and easily as possible, so start with what you know. That basically means leveraging the capabilities of your current security solution(s) to get these new events into the existing system. The complexity is not around understanding these new data sources – flow data and
syslog output are well understood. The challenge comes in adapting collection methods designed for on-premises services with a cloud model. If an agent or collector works with your cloud provider’s environment, either to consume cloud vendor logs or those created by your own cloud-based servers, you are in luck. If not you will likely find yourself rerouting traffic to and/or from the cloud into a network proxy to capture events.
Depending on the type of cloud service (such as SaaS or IaaS) you will have various means to access event data (such as logs and API connectivity), as outlined in our solution architectures post. We suggest collecting data directly from the cloud provider whenever possible, because much of that data is unavailable from instances or applications running inside the cloud. Monitoring agents can be deployed in IaaS or private cloud environments, where you control the full stack.
But in other cloud models, particularly PaaS and SaaS, agents are generally not viable. There you need to rely on proxies that can collect data from all types of cloud deployments, provided you can route traffic through their data-gathering choke points. It is decidedly suboptimal to insert choke points in your cloud network, but it may be necessary. Finally, you have might instead be able to use remote API calls from an on-premise collector to pull events directly from your cloud provider. Not all cloud providers offer this access, and if they do you will likely need to code something yourself from their API documentation.
Once you understand what is available you can figure out whether your source provides sufficiently granular data. Each cloud provider/vendor API, and each event log, offer a slightly different set of events in a slightly different format. Be prepared to go back to the future – you may need to build a collector based on sample data from your provider, because not all of the cloud vendors/providers offer logs in
syslog or a similarly convenient format. Also look for feed filter options to screen out events you are not interested in – cloud services are excellent at flooding systems with (irrelevant) data.
Our monitoring philosophy hasn’t changed. Collect as much data as possible. Get everything the cloud vendor provides as the basis for security monitoring. Then fill in the deficiencies with agents, proxy filters, and cloud monitoring services as needed. This is a very new capability, so likely you will need to build API interface layers to your cloud service providers.
Finally keep in mind that using proxies and/or forcing cloud traffic through appliances at the ‘edge’ of your cloud is likely to require re-architecting both on-premise and cloud networks to funnel traffic in and out of your collection point. This also requires that disconnected devices (phones/tablets and laptops not on the corporate network) be configured to send traffic through the choke points / gateways, and cloud services must be configured to reject any direct access which bypasses these portals. If an inspection point can be bypassed it cannot effectively monitor security.
Now that you have figured out your strategy and deployed basic collectors, it is time to integrate these new data sources into the monitoring environment.
Phase 2: Integrate and Monitor Cloud-based Resources
To integrate these cloud-based event sources into the monitoring solution you need to decide which deployment model will best fit your needs. If you already have an on-premise SOC platform and supporting infrastructure it may make sense to simply feed the events into your existing SIEM, malware detection, or other monitoring systems. But a few considerations might change your decision.
- Capacity: Ensure the existing system can handle your anticipated event volume. SaaS and PaaS environments can be noisy, so expect a significant uptick in event volume, and account for the additional storage and processing overhead.
- Push vs. Pull: Log Management and SIEM systems can collect events as remote systems and agents push events to them. Then the collector grabs the events, possibly performing some event preprocessing, and forwards the stream to the main aggregation point. But what if you cannot run a remote agent to push the data to you? Most cloud events must be pulled from the cloud service via an active API request. While pull requests are secured across HTTPS, SSL, or even VPN connections, this doesn’t happen magically – a program or script must initiate the transfer. Additionally, the program (script) must supply credentials or identity tokens to the cloud service. You need to know whether your current system is capable of initiating the pull request, and whether it can securely manage the remote API service credentials necessary to collect data.
- Data Retention: Cloud services require network access, so you need to plan for when your connection is down – especially given the frequency of DoS attacks and network service outages. Make sure you understand the impact if you cannot collect remote events for a time. If the connection goes down, how long can relevant security data be retained or buffered? You don’t want to lose that data. The good news is that many PaaS and IaaS platforms provide easy mechanisms to archive event feeds to long-term storage, to avoid event data loss, but this also requires setup.
- Aggregation and Correlation: Determine whether you need to aggregate and correlate cloud and on-premise activity. Given the disparity between cloud and internal systems, you may want to reconsider traditional everything in one place event aggregation. It may make more sense to store and monitor cloud resources separate from on-premise resources in your data centers. Normal use (and misuse) event patterns differ between cloud and on-premise usage. Enforcement polices may be stricter for cloud resources because that data is “out there”. You will also need to consider how best to correlate event data between the cloud and on-premise systems; cloud access logs typically include different information and may make it difficult to track users, requiring additional correlation against directory services and identity stores to produce more actionable alerts.
If the conditions being monitored are different enough between on-premise and what happens in the cloud, it may make more sense to use a separate cloud service to monitor cloud resources. If scalability of your on-premise system is an issue consider pushing more monitoring and alerting to take advantage of easy cloud scaling; this may enable more robust analytics as well.
Phase 3: Policy Development and Testing
The security policies you have today, and the specific conditions that trigger alerts, will need to change for cloud monitoring. Of course there is no simple guide for this, so once again you need to go back to the future: start with which kinds of threats you want to catch, the data you need to detect those conditions, and the alert thresholds that will help your folks respond faster to potential incidents. You can then identify your gaps and incrementally tune your policies to improve both accuracy and actionability.
For example you want to ensure only the right folks can access your cloud services, but may not be able to correlate IP addresses against user identities for those cloud resources. Further complicating matters, you likely don’t want to limit access to cloud services from mobile devices like you do with in-house services. And finally you need to build new policies that address specific cloud use cases – including access to the management plane, issuance of security certificates, and launching new applications. So we recommend taking a fresh look at cloud resources, and not assuming how you will monitor cloud resources the same as on-premise systems.
There is no shortcut – this is work, plain and simple. Verification that you have the event data necessary to evaluate activity, and policy adjustment that is suitable to cloud use cases is part of your migration and deployment processes. Understand this is an incremental process, and it will take time to both enumerate all of the potential attacks you need to look for in your hybrid environment, as well as tune the policies for maximum effectiveness.
Phase 4: Automation and Orchestration
The bad news is that many of the workflows and incident response plans that (more or less) work for internal systems break when you move to the cloud. The way to detect a malware infection may be similar between IaaS and on-premise systems, but the way you conduct incident response is entirely different. So you need to consider how your SOC processes and tools will change. The cloud removes physical aspects of the security job – including taking devices offline, upgrading hardware, and extracting images from the devices. All these functions now happen via a cloud console or API calls. These cloud APIs provide a comprehensive way to orchestrate response and recovery without human intervention, based on a variety of triggers for policy violations.
The good news is that cloud computing enables you to react to events faster and better. First some types of attacks (including traffic anomalies, DoS, and Bitcoin mining) will be detected by your cloud service provider, and you will get a notification that something is amiss. For a real-life example check out our own cloud faux pas. Other issues are no longer problems in the cloud, so you won’t have to respond to those situations. For example SQL injection attacks and blocking of inappropriate protocols are handled by SaaS vendors. Some cloud vendors even offer ‘add-on’ security services such as configuration change detection, intrusion detection, and application threat analysis so you don’t need to handle these functions yourself.
More importantly cloud services provide a means to automate responses to events. CloudSOC personnel no longer need to manually respond to all attacks, rather leveraging of the cloud APIs lets you automate IT and security functions within your response plan. For example it is trivial to isolate a suspect application or server, spin up up a freshly provisioned and patched replacement, move workloads over to the new clean replacement, and then investigate the potentially compromised device and its forensic images at leisure. All this can happen automatically within seconds, if your response processes have been rebuilt to take advantage of these capabilities.
And that is just scratching the surface of how you can establish a far more sophisticated prevention, detection, and response environment. Of course this requires work on your part – each cloud provider supports different actions than their competition – but the potential for much more sophisticated security than simple alerting and blocking on common attacks, and the possibility of automating so much of response, offer compelling motivation for migration to the cloud.
Phase 5: Migrate SOC Infrastructure to the Cloud
Due to sunk cost in existing infrastructure and economic realities, many of you will use your in-house SOC to monitor cloud services. There is nothing wrong with that – much of your infrastructure will likely run on-premise for the next couple years. For those looking to leave the in-house SOC behind, to leverage the agility and flexibility of the cloud, we offer some advice on migrating your SOC to an IaaS hosted environment.
First get your feet wet by standing up a smaller version of what you run today – basically a mini CloudSOC – which mirrors your in-house platform in an IaaS cloud. Use this mini CloudSOC to monitor cloud-based resources: aggregate and analyze event traffic and data collected from cloud API services, and dedicate a portion of your team to manage it. This approach offers multiple advantages: it is only a minor disruption to current on-premise SOC efforts, and your team gets time to understand the new tools and to focus on tuning your collectors and policies for the cloud. Once you have vetted the new system’s functionally and learned how to tune it, you can scale up your CloudSOC as you redirect more and more collectors to the new environment – perhaps including events and other data from on-premise data center assets.
At some point, if you choose to move all security monitoring to your CloudSOC, you will have processes and infrastructure in place to support your move. A wholesale shift of on-premise security monitoring to an IaaS CloudSOC is not for the faint of heart, so supplementing in-house systems with a mini CloudSOC dedicated to monitoring cloud services offers a good path for scaling up gradually. The fact is that you will be running two systems for a while in any case; by moving forward cautiously you can make informed decisions about how quickly to proceed.
Yet another option is to engage a third party for migration and management assistance. Third parties can fill the gaps when your SOC team is stretched thin or lacks needed cloud skills. Third party providers can help with migration or even take over CloudSOC operations entirely, depending on what you need. Monitoring cloud activity is complicated by the cloud’s tendency to break assumptions baked into many monitoring tools, and the lack of in-house expertise needed to fully capitalize on the cloud’s advantages. This transition is not easy, so either short-term or long-term help can be helpful for getting where you need to be.