Implementing and Managing Patch and Configuration Management: Defining Policies
So far we have focused on all the preparatory work and technology deployment that needs to happen before you can finally flip the switch and start using an endpoint security management tool in production. With the pieces in place it is now time to configure and deploy policies to prepare for the inevitable patch cycles, and to start monitoring configurations on your key devices. The first major choice is between the Quick Wins and Full Deployment processes – Quick Wins is focused on information gathering and refining priorities & policies – proving the tool’s value and making sure your results from initial testing weren’t misleading. Full Deployment is all about full coverage for all endpoint devices and users. We generally recommend you start with Quick Wins, which produces much more information and treads a bit more lightly, before jumping into Full Deployment. Who knows – you might even realign your priorities. But even after a few Quick Wins, a structured and (somewhat) patient path to Full Deployment makes the most sense.
Before we get deep into staging your deployment, keep in mind that we break things out with extreme granularity, to fit the full range of organizations. Many of you won’t need this much depth, due to organizational size or the nature of your policies and priorities. Don’t get hung up on our multi-step process – many of you won’t need to move this cautiously, and can run through multiple steps quickly.
The key to success is to think incrementally – too often we hear about organizations which can pump out a bunch of agents quickly, so they think they should. Endpoints can be finicky devices, and you should be sure to provide adequate time for testing and burn-in before you go all-in on deployment. So it’s prudent to pick a single device type or group of users, create the appropriate policy, slowly roll out, and tune iteratively until you attain full coverage. We are not opposed to deploying quickly, but we have a keen appreciation for the challenges of fast deployment – especially in managing expectations. Better to under-promise and over-deliver than vice-versa, right?
So here is a reasonable deployment plan:
- Define the policy: This involves setting policies based on the type of device and what you are doing on it – patch or configuration management. We will dig into the specific policy decisions you need to make later in this post. Again, we suggest you start with a single device type – possibly even for a specific group of users – and expand incrementally once the first deployment is complete. This helps reduce management overhead and enables you to tune the policy. In most cases your vendor will provide prebuilt policies and categories to jumpstart your own policy development. It’s entirely appropriate to start with one of those and evaluate its results.
- Deploy to a subset: The next step is to deploy the policy to a limited subset (either device types, groups of users, or both) of your overall coverage goal. This limits the number of deployment failures, and gives you time to adjust and tune the policy. The key is to start small so you don’t get overloaded during the tuning process. It is much easier to grow a small deployment than to deal with overwhelming fallout from a poorly tuned policy.
- Analyze and tune: During analysis and tuning you iteratively observe results and adjust the policy. If you see too many deployment/remediation failures or false positives you adjust the policy.
- Expand scope: Once the policy is tuned you can start thinking about expanding the deployment scope and size. You can add additional devices and groups of users, expand the number of applications being patched, etc. Full deployments should rarely happen as a big bang, so grow it slowly and surely to ensure you don’t risk the perception of deployment success by going too far too fast. Smaller organizations can often move quickly to full deployment, but we strongly suggest starting small – even if it’s only for a day.
When setting up the policies it makes sense to revisit the processes for both patch and configuration management – as they govern what the tool does, what you and your staff do, and what outcomes you can expect. So let’s touch on each process and the associated policy decisions you need to make.
Patch Management Policies
In a perfect world, the patch management engine would just run and you could get back to World of Warcraft. Alas, the world isn’t perfect and patch management isn’t nearly as automated as we would all prefer. You can automate some aspects of the process (including monitoring for new patches), but ultimately you need to define which patches get applied in what order and build the installation packages. The good news is that once this is done the tools generally do a good job of automating installation, confirmation, and tracking. But there is still significant work to do up front.
Put another way, patch management policies are unique for every patch cycle. Of course you can define consistent aspects of the process (such as maintenance windows and user notifications) for every cycle, but every cycle you need to decide what gets patched and what doesn’t.
1. Discovery and Target Definition
Depending on whether you are rolling out a Quick Wins limited deployment, extending an existing deployment, or going all-in with a big bang full deployment, the first step is to load up the system with the devices to be managed. Besides loading up the assets you need to decide what to do when a new device is found to be out of compliance with policy. Do you force a patch deployment right away? You also need to define the frequency of revisiting the asset list (daily, weekly, monthly, etc.), because new devices need some endpoint security management love as well.
2. Obtain Patches
The next step in patch management is actually finding the patches applicable to your environment. Here you define your information sources (patch management vendors, operating system and application developers, etc.); then you build a process to evaluate what has been patched, and more importantly criteria for what gets patched and when. You will need to make sure the operations team has bought into this criteria because they will need to live with it, and that you have a process for out-of-cycle patches – typically for high-risk 0-day vulnerabilities. You will also test the patches to make sure they don’t cause more harm than good. Again, there isn’t a much automation to implement in the patch management tool – it’s more a process to work through each patch cycle.
Next you prepare to deploy the patch, which includes downloading the patch, determining the order of patch installation, building the distribution package, and getting the software to the relays in preparation to hit the switch. In this step you determine such things as:
- The devices that need to be patched and in what priority
- The criticality of the patch, and its timeframe for deployment
- Whether to force a reboot after deployment
- Alerting levels (if a patch fails to deploy, etc.)
- Notification levels (do you tell the user the patch is being installed?)
At this point it’s mostly a question of pushing the button and waiting for the magic to happen. The policies in play here are more about whether to roll back in case the patch fails or breaks something, and handling the exceptions when a patch fails to install. We will talk about reporting later in this post, but that comes into play here as well.
Finally, you need to confirm the patch was completely installed and is now operational. This involves another scan of the device in question, with some reporting to substantiate that the patch was installed within the agreed-upon window of time.
Configuration Management Policies
Similar to patch policies, configuration management policies are inextricably linked to the process you implemented for configuration management. The good news is that much more of configuration management can be automated – because the general standard configurations for each device type shouldn’t vary much from day to day, or even month to month. If proper care is taken in setting up the baselines, you shouldn’t need to babysit the policies often.
1. Establish configuration baselines and/or benchmarks
As we described in Preparation, the first configuration management decisions involve locking in the configuration baselines you will use for each device type. There a number of resources for kickstarting your efforts, including the CIS benchmarks and NIST guidance (PDF), and each vendor has their own ideas on how to configure each device type. Regardless of where you get your baselines, you need consensus – a large part of your operational job will be to manage the inevitable exceptions when someone needs to have their device configured differently (because they are special). Always keep the tradeoffs in mind. The more inclusive the baseline, the fewer exceptions you will have to deal with, but the more chances you have to weaken your security posture due to greater variability.
And remember that you can try a few different policy constructs when developing baselines. You can set a gold standard for your entire organization, then deal with the folks who need something different. Or you could set different baselines for different constituencies. You know, where executives can do whatever they want at one end of the spectrum while call center desktops get draconian VDI system images. Finally, you could set the baseline based on the device type being managed (PC vs. Mac vs. mobile, etc.). Most likely you will end up mixing and matching between these factors, because one size rarely fits all.
2. Discover and define targets
As with the patch management decisions, you need to load the system up with devices to be managed. Besides loading in assets, you need to decide what to do when a device is found out of compliance with your policy. Do you force remediation right away? You also need to define the frequency of revisiting the asset list (daily, weekly, monthly, etc.), to ensure all new devices have their configurations monitored.
3. Assess, alert, and report changes
Once you know what will be managed, you move on to defining the finer points of configuration assessment, such as:
- How often will you evaluate the configuration? Will that vary between devices always on the network and remote devices that connect intermittently?
- Do you need an agent on that specific device type, or can you perform remote assessment? Does that vary for remote devices, or networks contained by low bandwidth?
- How critical are specific configuration changes? For example, does the appearance of a web server on a device require instant kickoff of an incident response?
- Who needs to be alerted when a device is out of policy compliance? Does that vary by device type or user group?
- Do certain types of configuration changes get pumped directly into the trouble ticketing system for operational remediation?
4. Remediate and Confirm
Next you will actually fix the configuration issues found during assessment. There aren’t many policy decisions here, aside from how quickly to confirm each change once it is marked completed in the trouble ticketing system (assuming operations handles the changes). We mentioned the need for integration with the trouble ticket system – you need to close the loop with the ops team, and to ensure changes are both authorized and done correctly.
The driver behind endpoint security management is often compliance. That means you need to produce artifacts on (document) what the patch and configuration management systems have done for in-scope devices. You are likely to kill a few trees on reports showing progress, demonstrating value, and communicating with other stakeholders. Here are a few starter ideas for reports:
- Compliance reports are a no-brainer and included in every endpoint security management product. For example, showing you scanned all endpoints or servers every month, installed missing patches, and ensured all devices were in compliance with the baseline standards, will make every security assessment faster.
- User-based reports can highlight which users are model citizens, and which aren’t. You know, those users who don’t patch their devices, ever (until finally you need to reimage) or have ongoing configuration problems from consistent malware infections. You can’t make these users follow policy but you can call them out to management when they don’t. This kind of report can also be useful for identifying likely candidates to revisit security awareness education.
- Application and vendor reports enable you to get a feel for which vendors issue more patches (and cost you more operations budget), which could be interesting data to have when it’s time to pay for maintenance on some of these applications and operating systems.
- Trend reports are extremely valuable for showing the value of the tool and how the teams are performing operationally. Show patch coverage, devices in compliance with the configuration baselines, mean time to patch (especially for out-of-cycle/critical patches), time to address critical configuration problems, etc. Most organizations which generate these reports achieve large reductions over time in terms of managing endpoints and being more responsive to threats. Never underestimate the political value of a good report showing these trends with colorful graphs.
Your typical endpoint security management platform will ship with hundreds of canned reports, many with little or no value. So ensure you can customize the reports quickly and easily to show the information you need the way you need to show it.
With policies defined, you are ready to flip the switch and go operational. Again, we recommend a gradual iterative process to work toward full deployment coverage for both patch and configuration management, but you will get there, and then your operational process needs to kick in. We will discuss that topic in the next couple posts.