Logging and monitoring for cloud infrastructure has become the top topic we are asked about lately. Even general conversations about moving applications to the cloud always seem to end with clients asking how to ‘do’ logging and monitoring of cloud infrastructure. Logs are key to security and compliance, and moving into cloud services – where you do not actually control the infrastructure – makes logs even more important for operations, risk, and security teams. But these questions make perfect sense – logging in and across cloud infrastructure is complicated, offering technical challenges and huge potential cost overruns if implemented poorly.
The road to cloud is littered with the charred remains of many who have attempted to create multi-cloud logging for their respective employers. But cloud services are very different – structurally and operationally – than on-premise systems. The data is different; you do not necessarily have the same event sources, and the data is often different or incomplete, so existing reports and analytics may not work the same. Cloud services are ephemeral so you can’t count on a server “being there” when you go looking for it, and IP addresses are unreliable identifiers. Networks may appear to behave the same, but they are software defined, so you cannot tap into them the same way as on-premise, nor make sense of the packets even if you could. How you detect and respond to attacks differs, leveraging automation to be as agile as your infrastructure. Some logs capture every API call; while their granularity of information is great, the volume of information is substantial. And finally, the skills gap of people who understand cloud is absent at many companies, so they ‘lift and shift’ what they do today into their cloud service, and are then forced to refactor the deployment in the future.
One aspect that surprised all of us here at Securosis is the adoption of multi-cloud; we do not simply mean some Software as a Service (SaaS) along with a single Infrastructure as a Service (IaaS) provider – instead firms are choosing multiple IaaS vendors and deploying different applications to each. Sometimes this is a “best of breed” approach, but far more often the selection of multiple vendors is driven by fear of getting locked in with a single vendor. This makes logging and monitoring even more difficult, as collection across IaaS providers and on-premise all vary in capabilities, events, and integration points.
Further complicating the matter is the fact that existing Security Information and Event Management (SIEM) vendors, as well as some security analytics vendors, are behind the cloud adoption curve. Some because their cloud deployment models are no different than what they offer for on-premise, making integration with cloud services awkward. Some because their solutions rely on traditional network approaches which don’t work with software defined networks. Still others employ pricing models which, when hooked into highly verbose cloud log sources, cost customers small fortunes. We will demonstrate some of these pricing models later in this paper.
Here are some common questions:
- What data or logs do I need? Server/network/container/app/API/storage/etc.?
- How do I get them turned on? How do I move them off the sources?
- How do I get data back to my SIEM? Can my existing SIEM handle these logs, in terms of both different schema and volume & rate?
- Should I use log aggregators and send everything back to my analytics platform? At what point during my transition to cloud does this change?
- How do I capture packets and where do I put them?
These questions, and many others, are telling because they come from trying to fit cloud events into existing/on-premise tools and processes. It’s not that they are wrong, but they highlight an effort to map new data into old and familiar systems. Instead you need to rethink your logging and monitoring approach.
The questions firms should be asking include:
- What should my logging architecture look like now and how should it change?
- How do I handle multiple accounts across multiple providers?
- What cloud native sources should I leverage?
- How do I keep my costs manageable? Storage can be incredibly cheap and plentiful in the cloud, but what is the pricing model for various services which ingest and analyze the data I’m sending them?
- What should I send to my existing data analytics tools? My SIEM?
- How do I adjust what I monitor for cloud security?
- Batch or real-time streams? Or both?
- How do I adjust analytics for cloud?
You need to take a fresh look at logging and monitoring, and adapt both IT and security workflows to fit cloud services – especially if you’re transitioning to cloud from an on-premise environment and will be running a hybrid environment during the transition… which may be several years from initial project kick-off.
Today we launch a new series on Building a Multi-cloud Logging Strategy. Over the next few weeks, Gal Shpantzer and I (Adrian Lane) will dig into the following topics to discuss what we see when helping firms migrate to cloud. And there is a lot to cover.
Our tentative outline is as follows:
- Barriers to Success: This post will discuss some reasons traditional approaches do not work, and areas where you might lack visibility.
- Cloud Logging Architectures: We discuss anti-patterns and more productive approaches to logging. We will offer recommendations on reference architectures to help with multi-cloud, as well as centralized management.
- Native Logging Features: We’ll discuss what sorts of logs you can expect to receive from the various types of cloud services, what you may not receive in a shared responsibility service, the different data sources firms have come to expect, and how to get them. We will also provide practical notes on logging in GCP, Azure, and AWS. We will help you navigate their native offerings, as well as the capabilities of PaaS/SaaS vendors.
- BYO Logging: Where and how to fill gaps with third-party tools, or building them into applications and service you deploy in the cloud.
- Cloud or On-premise Management? We will discuss tradeoffs between moving log management into the cloud, keeping these activities on-premise, and using a hybrid model. This includes risks and benefits of connecting cloud workloads to on-premise networks.
- Security Analytics: More and more firms are augmenting or replacing traditional SIEM with security analytics, data lakes, and ML/AI; so we will discuss some of these approaches along with various data sources for threat detection, compliance, and governance. This is the goal behind 1-5 above: facilitating the ability to collect, transform, and store – to gain real-time and historical insight.
As always, questions, comments, disagreements, and other constructive input are encouraged.
Comments