There are a lot of grumpy McAfee customers out there today. Yesterday, little Red issued a faulty DAT file update that mistakenly thought svchost.exe was a bad file and blew it away. This, of course, results in all sorts of badness on Windows XP SP3, causing an endless reboot loop and rendering those machines inoperable.

Guess they forgot the primary imperative, do no harm…

To McAfee’s credit, they did own the issue and made numerous apologies. Personally, I think the apology should have come from DeWalt, the CEO, on the blog. But they aren’t making excuses and are working diligently to fix the problem. But that is little consolation for those folks spending the next few days cleaning up machines and implementing the fix.

Yet, there is lots of coverage out there that will explain the issue, how it happened, and how to fix it from LifeHacker or McAfee. You’ll also get some perspective on how this provided an opportunity to test those incident response chops. What I want to talk about is understanding the risk profile of anti-malware updates, and whether & how your internal processes should change in the face of this problem.

First off, no one is immune to this type of catastrophic failure. It happened to be McAfee this time, but anti-malware products work at the lowest layers of the operating system, and a faulty update can really screw things up. Yes, the AV vendors have mature QA processes, which is why you don’t see this stuff happening much at all. But it can, and likely will again at some point.

Yes, you could decide to ditch McAfee, although I’d imagine they’ll be retooling their QA processes to ensure this type of problem doesn’t recur. But that’s a short-term emotional reaction. The real question revolves around how to deal with anti-malware updates. It’s always been about balancing the speed of detection with the risk of unintended consequences (breaking something). So you basically have three choices for how to deal with anti-malware updates:

  1. Automatic updates – This represents the common status quo. The AV vendor issues a release, you get it and install it with no testing or any other mechanisms on your end. To be clear, a vast majority of end users are in this bucket.
  2. Test first – You can take the update and run it through a battery of tests to see if there is a problem before you deploy. This option is pretty resource intensive, because you tend to get multiple updates per day from the vendor; it also extends the window of vulnerability by the length of your testing and acceptance pipeline.
  3. Wait and listen – The last approach is basically to wait a day or two day before installing updates. You peruse the message boards and other sources to see if there are any known issues. If not, you install. This also extends the window of exposure, but would have avoided the McAfee issue.

There is no right answer. Most organizations opt for the quickest protection possible, which means automatic updates to minimize the window of vulnerability. But it gets back to your organization’s threshold for risk. I don’t think the “test first” option is really viable for an organization. There are too many updates. I do think “wait and listen” can make sense for the vast majority of companies out there.

But how does wait and listen work against a zero-day attack? In this case it still works okay, because you can always do a manual test or take the risk of sending out an update before the waiting period is over. And in reality, the signature updates for a 0-day usually take 8-18 hours anyway. But there is a risk you might get nailed in the time between an update arrives and when you deploy it. In that case, hopefully you’ve managed expectations with the senior team regarding this scenario.

I’d be remiss if I didn’t at least mention the need for layers beyond anti-malware. Especially when deciding whether to install an AV update. There are alternative mitigations (at the perimeter or on the network, for example) for most 0-day attacks, which could lessen the impact and spread of an attack. Those can often be made immediately, and are easier to reverse than an install that touches every desktop.

So it’s unfortunate for McAfee and they’ll be cleaning up the mess (in market perception and customer frustration) for a while. And as I told the AP yesterday, fortunately this kind of issue is very rare. But when these things do happen, it’s a train wreck.