November 8, 2012 - Securosis

Implementing and Managing Patch and Configuration Management: Configuration Management Operations

Mike Rothman November 8, 2012 No Comments

The key high-level difference between configuration and patch management is that configuration management offers more opportunity for automation than patch management. Unless you are changing standard builds and/or reevaluating benchmarks – then operations are more of a high-profile monitoring function. You will be alerted to a configuration change, and like any other potential incident you need to investigate and determine the proper remediation as part of a structured response process. Continuous Monitoring The first operational decision comes down to frequency of assessment. In a perfect world you would like to continuously assess your devices, to shorten the window between attack-related configuration change and detection of the change. Of course there is a point of diminishing returns, in terms of device resources and network bandwidth devoted to continuous assessment. Don’t forget to take other resource constraints into account, either. Real-time assessment doesn’t help if it takes an analyst a couple days to validate each alert and kick off the investigation process. Another point to consider is the increasing overlap between real-time configuration assessment and the host intrusion prevention system (HIPS) capabilities built into endpoint protection suites. The HIPS is typically configured to catch configuration changes and usually brings along a more response-oriented process. That’s why we put configuration management in a periodic controls bucket in the Endpoint Security Management Buyer’s Guide. That said there is a clear role for configuration management technology in dealing with attacks and threats. It’s a question of which technology – active HIPS, passive configuration management, or both – will work best in your environment. Managing Alerts Given that many alerts from your configuration management system may indicate attacks, a key component of your operational process is handling these alerts and investigating each potential incident. We have done a lot of work on documenting incident response fundamentals and more sophisticated network forensics, so check that research out for more detail. For this series, a typical alert management process looks like: Route alert: The interface of your endpoint security management platform acts as the initial view into the potential issue. Part of the policy definition and implementation process is to set alerts based on conditions that you would want to investigate. Once the alert fires someone then needs to process it. Depending on the size of your organization that might be a help desk technician, someone on the endpoint operations team, or a security team member. Initial investigation: The main responsibility of the tier 1 responder is to validate the issue. Was it a false positive, perhaps because the change was authorized? If not, was it an innocent mistake that can be remedied with a quick fix or workaround? If not, and this is a real attack, then some kind of escalation is in order, based on your established incident handling process. Escalation: At this point the next person in the chain will want as much information as possible about the situation. The configuration management system should be able to provide information on the device, the change(s) made, the user’s history, and anything else that relates to the device. The more detail you can provide, the easier it will be to reconstruct what actually happened. If the responder works for the security team, he or she can also dig into other data sources if needed, such as SIEM and firewall logs. At this point a broader initiative with specialized tools kicks in, and it is more than just a configuration management issue. Close: Once the item is closed, you will likely want to generate a number of reports documenting what happened and the eventual resolution – at least to satisfy compliance requirements. But that shouldn’t be the end of your closing step. We recommend a more detailed post-mortem meeting to thoroughly understand what happened, what needs to change to avoid similar situations in the future, and to see how processes stood up under fire. Also critically assess the situation in terms of configuration management policies and make any necessary policy changes, as we will discuss later in this post. Troubleshooting In terms of troubleshooting, as with patch management, the biggest risk for configuration change is that might not be made correctly. The troubleshooting process is similar to the one laid out in Patch Management Operations, so we won’t go through the whole thing. The key is that you need to identify what failed, which typically involves either a server or agent failure. Don’t forget about connectivity issues, which can impact your ability to make configuration changes as well. Once the issue is addressed and the proper configuration changes made, you will want to confirm them. Keep in mind the need for aggressive discovery of new devices, as the longer a misconfigured device exists on your network, the more likely it is to be exploited. As we discussed in the Endpoint Security Management Buyer’s Guide, whether it’s via periodic active scanning, passive scanning, integration with the CMDB (or another asset repository) or another method, you can’t manage what you don’t know exists. So keep focus on a timely and accurate ongoing discovery process. Optimizing the Environment When you aren’t dealing with an alert or a failure, you will periodically revisit policies and system operations with an eye to optimizing them. That requires some introspection, to critically assess what’s working and what isn’t. How long is it taking to identify configuration changes, and how is resolution time trending? If things move in the wrong direction try to isolate the circumstances of the failure. Are the problems related to one of these? Devices or software Network connectivity or lack thereof Business units or specific employees When reviewing policies trends are your friend. When the system is working fine you can focus on trying to improve operations. Can you move, add, or change components to cut the time required for discovery and assessment? Look for incremental improvements and be sure to plan changes carefully. If you change too much at one time it will be difficult to figure out what worked and

Read Post

Building an Early Warning System: Internal Data Collection and Baselining

Mike Rothman November 8, 2012 No Comments

Now that we have provided the reasons you need to start thinking about an Early Warning System, and a high-level idea of the process involved, it’s time to dig into the different parts of the process. Third-party intelligence, which we’ll discuss in the next post, will tell you what kinds of attacks you are more likely to see, based on what else is happening in the world. But monitoring your own environment and looking for variation from normal activity tell you whether those attacks actually ARE hitting you. Internal Data Collection The process starts with collecting data from your internal sources for analysis. Most of you already have data aggregated in a log management environment because compliance has been mandating log management for years. More advanced organizations may have a Security Operations Center (SOC) leveraging a SIEM platform to do more security-oriented correlation and forensics to pinpoint and investigate attacks. Either way, you are likely collecting data which will provide the basis for the internal side of your EWS. Let’s take a quick look at the kinds of data you are likely already collecting and their relevance to the EWS: Network Security Devices: Your firewalls and IPS devices generate huge logs of what’s blocked, what’s not, and which rules are effective. The EWS will be matching attack patterns and traffic to what is known about other attacks, so recognizing port/protocol/destination combinations, or application identifiers for next-generation firewalls, will be helpful. Identity: Similarly, information about logins, authentication failures, and other identity-related data is useful for matching against attack profiles received from the third-party threat intelligence providers. Application/Database Logs: Application specific logs are generally less relevant, unless they come from standard applications or components likely to be specifically targeted by attackers. Database transaction logs are generally more useful for identifying unusual database transactions – which might represent bulk data removal, injection attempts, or efforts to bring applications down. Database Activity Monitoring (DAM) logs are useful for determinining the patterns of database requests, particularly when monitoring traffic within the database (or on the database server) consumes too many resources. NetFlow: Another data type commonly used in SIEM environments is NetFlow – which provides information on protocols, sources, and destinations for network traffic as it traversing devices. NetFlow records are similar to firewall logs but far smaller, making them more useful for high-speed networks. Network flows can identify lateral movement by attackers, as well as large file transfers. Vulnerability Scans: Scans offer an idea of which devices are vulnerable to specific attacks, which is critical for the EWS to help pinpoint which devices would be potential targets for which attacks. You don’t need to to worry about Windows exploits against Linux servers so this information enables you to focus monitoring, investigations, and workarounds on the devices more likely to be successfully attacked. Configuration Data: The last major security data category is configuration data, which provides information on changes to monitored devices. This is also critical for an EWS, because one of the most important intelligence types identifies specific malware attacks by their signature compromise indications. Matching these indicators against your configuration database enables you to detect successful (and even better, in-progress) attacks on devices in your environment. After figuring out which data you will collect, you need to decide where to put it. That means selecting a platform for your Early Warning System. You already have a SIEM/Log Management offering, so that’s one possibility. You also likely have a vulnerability management platform, so that’s another choice. We are not religious about which technology gets the nod, but a few capabilities are essential for an EWS. Let’s not put the cart before the horse, though – we don’t yet have enough context on other aspects of the process to understand which platform(s) might make sense. So we will defer the platform decision until later in this series. Baseline Once the data is collected, before it is useful to the EWS you need to define normal. As we mentioned earlier, ‘normal’ does not necessarily mean secure. If you are anything like almost every other enterprise, you likely have malware and compromised devices on your network already. Sorry to burst your bubble. You need to identify indications of something different. Something that could represent an attack, an outbreak, or an exfiltration attempt. It might be another false positive, or it could represent a new normal to accept, but either way the baseline will need to adapt and evolve. Let’s highlight a simple process for building a baseline: Pick data source(s): Start by picking a single data source and collect some data. Then determine the ranges you see within the data set. As an example we will use firewall logs. You typically have the type of traffic (ports, protocols, applications, etc.), the destination IP address, the time of day, and whether the packet was blocked, from the log. You can pick numerous data sources and do sophisticated data mining, but we will keep it simple for illustration purposes. Monitor the patterns: Then collect traffic for a while, typically a few days to a week, and then start analyzing it. Get your inner statistician on and start calculating averages, means, medians, and frequencies for your data set. In our example you might determine that 15% of your inbound web traffic during lunchtime is SSL destined for your shipping cart application. Define the initial thresholds: From the initial patterns you can set thresholds, outside which traffic indicate a potential problem. Maybe you set the initial thresholds 2 standard deviations above the mean for a traffic type. You look at the medians and means to figure out which initial threshold makes sense. You don’t need to be precise with the initial threshold – you don’t yet have enough data or knowledge to know what represents an attack – but to give you a place to start. Getting back to our firewall example, a spike in outbound SSL traffic spike to 30% might indicate an exfiltration. Or it could indicate a bunch

Read Post

Research

Implementing and Managing Patch and Configuration Management: Configuration Management Operations

Building an Early Warning System: Internal Data Collection and Baselining

Research

Firestarter: Multicloud Deployment Structures and Blast Radius

Firestarter: So you want to multicloud?

Firestarter: 2019: Insert Winter is Coming Meme Here

Firestarter: re:Invent Security Review

Firestarter: Hardware Hacks and Lift and Pray

Sign Up for Our Newsletter

Contact

About

Quick Links