Implementing and Managing Patch and Configuration Management: Patch Management OperationsBy Mike Rothman
Now that we have gone through all the preparation, deployed the technology, and set up policies, we need to operate our patch management environment. That will be our focus in this post. As we discussed in the Policy Definition post, there isn’t a huge amount of monthly leverage to be gained for patch management. You need to do the work of monitoring for new patches, assessing each new patch for deployment, testing the patches prior to deployment, bundling installation packages, and then installing the patches on affected devices. You will be performing each of those activities each month whether you like them or not. We have already delved into those monthy activities within the context of defining policies, so let’s take things a step deeper.
The biggest issue with Patch Management Operations is that a patch may not install properly, for whatever reason. So the first operational task is to ensure the integrity of the process – that the patch was installed and operates properly. As we described in Patch Management Quant in great detail, once the patch is confirmed the tool also needs to clean up any patch residue (temp files, etc.).
In the event the patch doesn’t deploy properly, you go to a clean up step – which involves identifying the failed deployment, determining the reason for the failure, adjusting the deployment parameters, and eventually reinstalling. For instance, here are three typical patch failure reasons which can be isolated fairly easily:
- Relay fail: If you have deployed a hierarchical environment to better utilize bandwidth, your relay points (distribution servers) may not be operating properly. It could be a server failure or a network issue. If an entire site or location doesn’t successfully patch, that’s a strong indication of a distribution problem. It’s not brain surgery to diagnose many of these issues.
- Agent fail: Another likely culprit is failure of an endpoint agent to do what it’s supposed to. If installation failures appear more random this might be the culprit. You will need to analyze the devices to make sure there are no conflicts and that the user didn’t turn off or uninstall the agent.
- Policy fail: As unlikely as it is, you (or your ops folks) might have configured the policies incorrectly. This is reasonably common – you need to set up policies each patch cycle, and nobody is perfect.
There are many other reasons a patch might not deploy properly. The point is to address one-off situations as necessary, but also to make sure there isn’t a systemic problem with your process. You will use this kind of troubleshooting analysis and data to move on to the next step of operating your patch environment: to optimize things.
Optimizing the Environment
Just like any other optimization process, this one starts with a critical review of the current operation. What works? What doesn’t? How long does it take you to patch 75% of your managed devices? 90%? 100%? Is that increasing over time, or decreasing? What types of patches are failing (operating systems, apps, servers, endpoints, or something else)? How does device location (remote vs. on-network) affect success rates? Are certain business units more successful than others? During the review, consider adding new policies and groups. Though be careful since patch management requires a largely manual effort each month, there is a point of diminishing returns to defining very rigid policies to achieve better automation.
If you find the environment reasonably stable, periodic reviews become more about tuning polices than overhauling them. This involves revisiting your deployment and figuring out whether you have the right hierarchy to effectively distribute patches. Do you need more distribution points? Less? Are you optimizing bandwidth? Do you need to install agents to achieve more granular management? Or perhaps remove agents, if you can patch without persistent agents on the devices.
You look for incremental improvement here, so changes should be highly planned-out and structured. This enables you to isolate the effect of each change and reevaluate each aspect iteratively. If you change too much at one time it will be difficult to figure out what worked and what didn’t.
Also pay attention to maintenance of your environment. The servers and distribution points need to be backed up and kept current, along with updating the agents as needed. Obviously you need to test infrastructure software updates – just like any other patch or update – prior to deployment, but the patching system itself could be an attacker’s target, so you need to keep it up to date as well. We tend to be wary of automatic updating for most enterprise security tools – there are too many example of bad updates wreaking havoc for it to feel comfortable. Improvements in quicker implementation can easily be lost if you take down your environment while you try to back out a busted patch.
Finally, you defined a bunch of reports earlier in the process, to run on an ongoing basis. Obviously you need these artifacts for compliance purposes, but pay attention to the operational data they generate yourself. Feed that information back into the process to continually improve your patch management.