SOC 2025: Making Sense of Security Data
Intelligence comes from data. And there is no lack of security data, that’s for sure. Everything generates data. Servers, endpoints, networks, applications, databases, SaaS services, clouds, containers, and anything else that does anything in your technology environment. Just as there is no award for finding every vulnerability, there is no award for collecting all the security data. You want to collect the right data to make sure you can detect an attack before it becomes a breach. As we consider what the SOC will look like in 2025, given the changing attack surface and available skills base, we’ve got to face reality. The sad truth is that TBs of security data sit underutilized in various data stores throughout the enterprise. It’s not because security analysts don’t want to use the data. They don’t have a consistent process to evaluate ingested data and then analyze it constantly. But let’s not get the cart before the proverbial horse. First, let’s figure out what data will drive the SOC of the Future. Security Data Foundation The foundational sources of your security data haven’t changed much over the past decade. You start with the data from your security controls because 1) the controls are presumably detecting or blocking attacks, and 2) you still have to substantiate the controls in place for your friendly (or not so friendly) auditors. These sources include logs and alerts from your firewalls, IPSs, web proxies, email gateways, DLP systems, identity stores, etc. You may also collect network traffic, including flows and even packets. What about endpoint telemetry from your EDR or next-gen EPP product? There is a renewed interest in endpoint data because remote employees don’t always traverse the corporate network, resulting in a blind spot regarding their activity and security posture. On the downside, endpoint data is plentiful and can create issues in scale and cost. The same considerations must be weighed regarding network packets as well. But let’s table that discussion for a couple of sections since there is more context to discuss before truly determining whether you need to push all of the data into the security data store. Use Cases Once you get the obvious stuff in there, you need to go broader and deeper to provide the data required to evolve the SOC with advanced use cases. That means (selectively) pulling in application and database logs. You probably had an unpleasant flashback to when you tried that in the past. Your RDBMS-based SIEM fell over, and it took you three days to generate a report with all that data in there. But hear us out; you don’t need to get all the application logs, just the relevant ones. Which brings us to the importance of threat models when planning use cases. That’s right, old-school threat models. You figure out what is most likely to be attacked in your environment (think high-value information assets) and then work backward. How would the attacker compromise the data or the device? What data would you need to detect that attack? Do you have that data? If not, how do you get it? Aggregate and then tune. Wash, rinse, repeat for additional use cases. We know this doesn’t seem like an evolution; it’s the same stuff we’ve been doing for over a decade, right? Not exactly as the analytics you have at your disposal are much improved, which we’ll get into later in the series. Those analytics are constrained by the availability of security data. Yet you can’t capture all the data, so focus on the threat models and use cases that can answer the questions you need to know. Cloud Sources Given the cloudification of seemingly everything, we need to mention two (relatively) new sources of security data, and that’s your IaaS (infrastructure as a service) providers and SaaS applications. Given the sensitivity of the data going into the cloud, over the seemingly dead bodies of the security folks that would never let that happen, you’re going to need some telemetry from these environments to figure out what’s happening, if those environments are at risk, and ultimately to be able to respond to potential issues. Additionally, you want to pay attention to the data moving to/from the cloud, as detecting when an adversary can pivot between your environments is critical. Is this radically different from the application and database telemetry discussed above? Not so much in content, but absolutely in location. The question then becomes what and how much, if any, of the cloud security data do you centralize? What About External Data? Nowadays, you don’t just use your data to find attackers. You use other people’s data, or in other words, threat intelligence, which gives you the ability to look for attacks that you haven’t seen before. Threat intel isn’t new either, and threat intel platforms (TIP) are being subsumed into broader SOC platforms or evolving to focus more on security operations or analysts. There are still many sources of threat intel, some commercial and some open source. The magic is understanding which sources will be useful to you. That involves curation and evaluating the relevance of the third-party data. As we contemplate the security data that will drive the SOC, effectively leveraging threat intel is a cornerstone of the strategy. Chilling by the (Security Data) Lake In the early days of SIEM, there wasn’t a choice of where or how you would store your security data. You selected a SIEM, put the data in there, started with the rules and policies provided by the vendor, tuned the rules and added some more, generated the reports from the system, and hopefully found some attacks. As security tooling has evolved, now you’ve got options for how you build your security monitoring environment. Let’s start with aggregation. Or what’s now called a security data lake. This new terminology indicates that it’s not your grandad’s SIEM. Rather it’s a place to store significantly more telemetry and make better use of it. It turns out this new fangled data lake doesn’t