Implementing and Managing Patch and Configuration Management: Configuration Management Operations
The key high-level difference between configuration and patch management is that configuration management offers more opportunity for automation than patch management. Unless you are changing standard builds and/or reevaluating benchmarks – then operations are more of a high-profile monitoring function. You will be alerted to a configuration change, and like any other potential incident you need to investigate and determine the proper remediation as part of a structured response process. Continuous Monitoring The first operational decision comes down to frequency of assessment. In a perfect world you would like to continuously assess your devices, to shorten the window between attack-related configuration change and detection of the change. Of course there is a point of diminishing returns, in terms of device resources and network bandwidth devoted to continuous assessment. Don’t forget to take other resource constraints into account, either. Real-time assessment doesn’t help if it takes an analyst a couple days to validate each alert and kick off the investigation process. Another point to consider is the increasing overlap between real-time configuration assessment and the host intrusion prevention system (HIPS) capabilities built into endpoint protection suites. The HIPS is typically configured to catch configuration changes and usually brings along a more response-oriented process. That’s why we put configuration management in a periodic controls bucket in the Endpoint Security Management Buyer’s Guide. That said there is a clear role for configuration management technology in dealing with attacks and threats. It’s a question of which technology – active HIPS, passive configuration management, or both – will work best in your environment. Managing Alerts Given that many alerts from your configuration management system may indicate attacks, a key component of your operational process is handling these alerts and investigating each potential incident. We have done a lot of work on documenting incident response fundamentals and more sophisticated network forensics, so check that research out for more detail. For this series, a typical alert management process looks like: Route alert: The interface of your endpoint security management platform acts as the initial view into the potential issue. Part of the policy definition and implementation process is to set alerts based on conditions that you would want to investigate. Once the alert fires someone then needs to process it. Depending on the size of your organization that might be a help desk technician, someone on the endpoint operations team, or a security team member. Initial investigation: The main responsibility of the tier 1 responder is to validate the issue. Was it a false positive, perhaps because the change was authorized? If not, was it an innocent mistake that can be remedied with a quick fix or workaround? If not, and this is a real attack, then some kind of escalation is in order, based on your established incident handling process. Escalation: At this point the next person in the chain will want as much information as possible about the situation. The configuration management system should be able to provide information on the device, the change(s) made, the user’s history, and anything else that relates to the device. The more detail you can provide, the easier it will be to reconstruct what actually happened. If the responder works for the security team, he or she can also dig into other data sources if needed, such as SIEM and firewall logs. At this point a broader initiative with specialized tools kicks in, and it is more than just a configuration management issue. Close: Once the item is closed, you will likely want to generate a number of reports documenting what happened and the eventual resolution – at least to satisfy compliance requirements. But that shouldn’t be the end of your closing step. We recommend a more detailed post-mortem meeting to thoroughly understand what happened, what needs to change to avoid similar situations in the future, and to see how processes stood up under fire. Also critically assess the situation in terms of configuration management policies and make any necessary policy changes, as we will discuss later in this post. Troubleshooting In terms of troubleshooting, as with patch management, the biggest risk for configuration change is that might not be made correctly. The troubleshooting process is similar to the one laid out in Patch Management Operations, so we won’t go through the whole thing. The key is that you need to identify what failed, which typically involves either a server or agent failure. Don’t forget about connectivity issues, which can impact your ability to make configuration changes as well. Once the issue is addressed and the proper configuration changes made, you will want to confirm them. Keep in mind the need for aggressive discovery of new devices, as the longer a misconfigured device exists on your network, the more likely it is to be exploited. As we discussed in the Endpoint Security Management Buyer’s Guide, whether it’s via periodic active scanning, passive scanning, integration with the CMDB (or another asset repository) or another method, you can’t manage what you don’t know exists. So keep focus on a timely and accurate ongoing discovery process. Optimizing the Environment When you aren’t dealing with an alert or a failure, you will periodically revisit policies and system operations with an eye to optimizing them. That requires some introspection, to critically assess what’s working and what isn’t. How long is it taking to identify configuration changes, and how is resolution time trending? If things move in the wrong direction try to isolate the circumstances of the failure. Are the problems related to one of these? Devices or software Network connectivity or lack thereof Business units or specific employees When reviewing policies trends are your friend. When the system is working fine you can focus on trying to improve operations. Can you move, add, or change components to cut the time required for discovery and assessment? Look for incremental improvements and be sure to plan changes carefully. If you change too much at one time it will be difficult to figure out what worked and