As we discussed in the last couple posts, any VM platform must be able to scan infrastructure and scan the application layer. But that’s still mostly tactical stuff. Run the scan, get a report, fix stuff (or not), and move on. When we talk about a strategic and evolved vulnerability management platform, the core technology needs to evolve to serve more than merely tactical goals – it must provide a foundation for a number of additional capabilities. Before we jump into the details we will reiterate the key requirements. You need to be able to scan/assess:
- Critical Assets: This includes the key elements in your critical data path; it requires both scanning and configuration assessment/policy checking for applications, databases, server and network devices, etc.
- Scale: Scalability requirements are largely in the eye of the beholder. You want to be sure the platform’s deployment architecture will provide timely results without consuming all your network bandwidth.
- Accuracy: You don’t have time to mess around, so you don’t want a report with 1,000 vulnerabilities, 400 of them false positives. There is no way to totally avoid false positives (aside from not scanning at all) so accuracy is a key selection criteria.
Yes, that was pretty obvious. With a mature technology like vulnerability management the question is less about what you need to do and more about how – especially when positioning for evolution and advanced capabilities. So let’s first dig into the foundation of any kind of strategy platform: the data model.
Integrated Data Model
What’s the difference between a tactical scanner and an integrated vulnerability/threat management platform? Data sharing, of course. The platform needs the ability to consume and store more than just scan results. You also need configuration data, third party and internal research on vulnerabilities, research on attack paths, and a bunch of other data types we will discuss in the next post on advanced technology. Flexibility and extensibility are key for the data schema. Don’t get stuck with a rigid schema that won’t allow you to add whatever data you need to most effectively prioritize your efforts – whatever data that turns out to be.
Once the data is in the foundation, the next requirement involves analytics. You need to set alerts and thresholds on the data and be able to correlate disparate information sources to glean perspective and help with decision support. We are focused on more effectively prioritizing security team efforts, so your platform needs analytical capabilities to help turn all that data into useful information.
When you start evaluating specific vendor offerings you may get dragged into a religious discussion of storage approaches and technologies. You know – whether a relational backend, or an object store, or even a proprietary flat file system; provides the performance, flexibility, etc. to serve as the foundation of your platform. Understand that it really is a religious discussion. Your analysis efforts need to focus on the scale and flexibility of whatever data model underlies the platform.
Also pay attention to evolution and migration strategies, especially if you plan to stick with your current vendor as they move to a new platform. This transition is akin to a brain transplant, so make sure the vendor has a clear and well-thought-out path to the new platform and data model. Obviously if your vendor stores their data in the cloud it’s not your problem, but don’t put the cart in front of the horse. We will discuss the cloud versus customer premises later in this post.
Discovery
Once you get to platform capabilities, first you need to find out what’s in your environment. That means a discovery process to find devices on your network and make sure everything is accounted for. You want to avoid the “oh crap” moment, when a bunch of unknown devices show up – and you have no idea what they are, what they have access to, or whether they are steaming piles of malware. Or at least shorten the window between something showing up on your network and the “oh crap” discovery moment.
There are a number of techniques for discovery, including actively scanning your entire address space for devices and profiling what you find. That works well enough and tends to be the main way vulnerability management offerings handle discovery, so active discovery is still table stakes for VM offerings. You need to balance the network impact of active discovery against the need to quickly find new devices. Also make sure you can search your networks completely, which means both your IPv4 space and your emerging IPv6 environment. Oh, you don’t have IPv6? Think again. You’d be surprised at the number of devices that ship with IPv6 active by default and if you don’t plan to discover that address space as well, you’ll miss a significant attack surface. You never want to hold a network deployment while your VM vendor gets their act together.
You can supplement active discovery with a passive capability that monitors network traffic and identifies new devices based on network communications. Depending on the sophistication of the passive analysis, devices can be profiled and vulnerabilities can be identified, but the primary goal of passive monitoring is to find new unmanaged devices faster. Once a new device is identified passively, you could then launch an active scan to figure out what it’s doing. Passive discovery is also helpful for devices that use firewalls to block active discovery and vulnerability scanning.
But that’s not all – depending on the breadth of your vulnerability/threat management program you might want to include endpoints and mobile devices in the discovery process. We always want more data, so we are for determining all assets in your environment. That said, for determining what’s important in your environment (see the asset management/risk scoring section below), endpoints tend to be less important than databases with protected data, so prioritize the effort you expend on discovery and assessment.
Finally, another complicating factor for discovery is the cloud. With the ability to spin up and take down instances at will, your platform needs to both track and assess cloud resources, which requires integrating with cloud consoles to make sure your platform knows about new devices and can assess them appropriately. This is an emerging capability, but realistically you’ll see a lot more private and public cloud-based resources in your environment.
Asset Management and Risk Scoring
The key capability of the evolved vulnerability management platform is its ability to help you you prioritize efforts, so any calculation of a risk score largely depends on 1) the ‘importance’ of the asset and 2) how ‘exposed’ it is to attack at any given point in time. Evaluating what’s important is really an asset management function. Of course many operations teams run extensive asset management efforts. The VM platform can and should take advantage of any existing resources and integrate with those tools. But many organizations don’t have an existing asset database (scary as that sounds), so the VM platform may need to serve as the authoritative registry of IT assets. Either way, the platform needs to store and/or access asset information.
Once you have the assets defined in the system the next step is to tag, group and/or categorize them. The more flexible the system the better; every organization groups their assets differently so your platform should support the way you categorize assets – not force you to fit your assets into their vendor-defined buckets. Assign an (admittedly subjective) importance to each group or category of assets. We suggest a simple approach, with 3 or 5 levels of importance. Really important means someone would be fired, while some assets are simply unimportant. You don’t need complexity or fine precision, but you at least need to identify devices which hold (or have access to) critical data.
As you evaluate the vulnerability of each asset through the platform’s various tests you can determine a risk score to drive prioritization.
The point here is flexibility. You want to group assets in a way that makes sense for your organization. You want to derive a risk score based on your calculation of risk, not an black box calculation that may or may not be relevant for your organization. And you need the ability to change everything the next time a significant technology or organizational disruption happens, like cloud computing or a big M&A deal.
To Cloud or Not to Cloud
The next aspect of the core technology underlying the evolved vulnerability/threat management platform is the cloud buzzword. If you thought people got religious about data models and engines, ask a cloud vendor about an on-premise solution or vice-versa. That’s always fun. At the end of the day, this cloud discussion involves two things.
- Scale: You will hear a lot from cloud-based providers about infinite scale and the limitations of customer premise-based offerings. It is true that scalability is the vendor’s problem in a cloud scenario. That offers some advantages, but any solution can scale with a suitable deployment architecture.
- Technology Updates/Change: The other big message you’ll hear from cloud bigots is that cloud platform handle software updates more quickly and transparently than on your own gear. Again, there is truth to this, but every vulnerability management vendor has been sending new rules and tests to its devices for years, so it’s not like they haven’t figured out software distribution.
These two objections to customer premise-based solutions are really much ado about nothing. The ‘decision’ isn’t really a decision at all – what is ‘cloud’ and isn’t nowadays is largely a matter of semantics. Let’s get back to your requirements. You need to be able to test your environment from outside – most attackers are outside your perimeter. That works best with a cloud service. But you also need a presence within your perimeter to scan internal devices, especially those on protected networks. So every cloud service must include an on-site component for internal scans.
That on-site component might be a dedicated appliance, a virtual machine, a dynamic instance downloaded to a device inside the network at scan time, or a combination. Ultimately the deployment model is beside the point – choose the model that best fits your operational processes. There is no point in getting religious about deployment models, so the leading platform vendors will offer hybrid approaches to meet your specific needs. If it’s easier to provision the device once and let ops deal with it, then opt for the internal scanning appliance or dedicate a VM to scanning in your virtual data center.
But don’t get caught up in hype. You need an external component to test your environment from the outside and an internal component for testing inside your perimeter.
To Agent or Not to Agent. To Credential or not to Credential.
You will also hear a lot about agent vs. agent-less scanning. This is also mostly hyperbole and semantics. In order to do any kind of granular scan of a device, you need a persistent agent on the device, the ability to download a temporary agent, or full administrator rights (credentials) to the device to remotely poll it for the things you are looking for (configurations, patches, logs, etc.). As usual, the answer is all of the above. There are advantages to a temporary agent in terms of having to manage software distribution to devices you worry about. But ultimately the scanning model you choose depends on your access to the device, the type of device, and what kinds of data it has access to.
When thinking about credentialed vs. non-credentialed scans, the answer is also both. Non-credentialed scans give you the external attacker’s view, but of course there are limits detail that can be gleaned from a non-credentialed scan. So to gain a full understanding of the security posture of a device you also want a credentialed scan with full access to configurations, patch levels, logs, entitlements, applications, etc.
Keep in mind that you cannot actively scan certain devices. Think brittle control systems which fall over under the onslaught of a vulnerability scan. So it’s probably not in your best interest to scan those devices. Above we mentioned passively discovering assets by monitoring the network. A similar approach can be used to find vulnerabilities on devices you can’t actively scan. Obviously it doesn’t provide the same detail as a credentialed scan, but if the alternative is knocking down the device any data is better than no data.
Security Research
Finally, any vulnerability/threat management platform needs to be driven by research. Things move fast in the attack space and your threat management tools need to stay current. So your vendor needs to make considerable investments in a dedicated team to track the field, observe and analyze new attacks, figure out how to search for those attacks using their tools, ensure the quality of their tests to minimize false positives, and finally get the tests into your hands as quickly as possible. For a more granular view into the process of analyzing attacks and malware check out the Analyze Malware subprocess in Malware Analysis Quant. It provides an idea of what’s involved in profiling malware files and figuring out how to find them in your environment.
To compare research groups evaluate the sophistication of their analysis. Do you understand how to remediate issues that your scanner finds? Can you determine the seriousness of the attack? Do you believe them? Is the data just coming from the vendor or do they integrate third party data? And most importantly, do they provide coverage for the assets in your environment. You know, the OSes, databases, and critical applications that drive your business.
There is a lot to the evolved vulnerability/threat management platform. But we see these capabilities as table stakes. A lot of innovation is happening in this space, and advanced – and in some cases adjacent – technologies will be the focus of the next post. We will dive into capabilities such as attack path analysis, penetration testing, and benchmarking.
Reader interactions
4 Replies to “Vulnerability Management Evolution: Core Technologies”
Mike,
for this article, I would make the same comment as in the previous article “scanning the infrastructure”: accuracy goes along with reliability. Users don’t have the choice and the will to scan machines in production. So the last thing you want is a scanner that crashes your services.
I would also like to make a global comment concerning the way VM solutions should be able to track the entire lifecycle of a vulnerability. An example: say the same machine, a web server is scanned twice by two different users. The first scan is performed again on the full TCP port list and finds 2 vulnerabilities on port TCP 22 and TCP 80.
The second scan is a light scan and only looks for HTTP vulnerabilities.
If you compare the two scan results, you might think that the vulnerability on port TCP 22 that is not reported on the second scan has been fixed. Which is not a good conclusion because TCP 22 has not been tested by scan 2.
So ideally, you expect that the VM solution tells you when a vulnerability is ACTIVE and FIXED by doing a consistent comparison of the scan results.
The vulnerability on port TCP 22 should be flagged as FIXED only when port TCP 22 has been scanned and the vulnerability has been tested and NOT FOUND.
This resonates with the comments made by @Betsy. Your analytics are as good as the quality of the input. Good data becomes strategic and will drive your entire security practice.
In addition to this global comment, I would also like to stress out that the “discovery” functions should take into consideration the new Cloud landscape and the Amazon EC2 use cases as already pointed out in the comments of your second article.
For the paragraph about the “Asset Management and Risk Scoring” I agree with you that the solution has to offer a high level of customization so the inherent complexity specific to any organization can be reflected in the way users can organize their assets in the VM solutions. Users should also expect that the huge amount of data gathered during the scans can be leveraged by the solution itself to help in this process. A very basic example is concerning the grouping of the assets by operating system. Everybody already has faced this question: how do my Windows machines compare to my Unix machines? The operating systems is a data point collected during the scans, so why let the user manually create groups for this? Should the VM solutions be able to dynamically break down the assets by operating systems? And to continue with the example, the “Windows” category includes a lot of different flavors of operating systems: XP, Vista, 7, 2003, 2008 etc… and it should be expected that the solution provides a way to nest these properties together and provide a hierarchical view of the assets.
By extension, any other data point collected during the scans should also be used to create dynamic rules to organize the assets into categories.
In the paragraph “to cloud or not to cloud”, while in theory, enterprise software should be able to scale to any size, some factors that could reduce this ability, while the cloud solutions usually overcome these challenges. For example, a solution that may have been originally designed for X number of assets can become difficult to maintain if the number of assets double or triple over time as the solution grows and is being largely adopted across the organization. Of course, the solutions can be designed at the beginning for the worse case scenario, but it will cost a lot, while with the right cloud solutions, you pay for what you use, and you will most likely never hit any sort of limitation that would require complex migration to a bigger solution.
Also, when it comes to updates, we can separate this into 2 categories:
1. update of the content/vulnerability signature: daily updates are certainly a hard prerequisite and any serious VM solution will offer this without any user intervention.
2. update of the software: this is far more complex, and by nature enterprise software will require more attention and resources from the user than a cloud solution that could offer automatic frequent, seamless updates. Which also guarantees a faster ability to add new features.
But choosing between cloud and enterprise software is not only a technological choice. Cloud also means service, with predictable TCO which includes training and support.
Cloud is also collaborative by nature, and simplifies the delegation of the tasks, especially in distributed environments. And cloud solutions also provide the unique capability to measure the quality of their technology, which is key to reduce the false positives and increase the overall quality of the vulnerability checks.
Cloud solutions can also be considered as a trusted third party in some situations.
Concerning the discussion about agents, I agree that there are use cases where agents are required. But the agent should fix more issues than creating new ones. So in my opinion, it is more about the manageability and the footprint of the agent, rather than arguing for or against agents. At the end of the day, the solution should be flexible enough to offer the user a wide range of options to facilitate the deployment.
Using credentials to do authenticated scans is a highly recommended strategy because it provides far more accurate scan results, but managing those credentials can be challenging, especially when password policies are enforced, so the VM solutions should offer the option to fetch the password from a password vault solution already deployed in the network.
Last, but not least, VM solutions should provide quality vulnerability information, but also help users to prioritize their remediation efforts by providing correlation with exploits, malware, virtual patches, in addition to providing the recommended vendor solutions and workaround.
Mike–
Oh, a knife to my heart when you say the manager of manager concept has failed over and over again. I just have to politely disagree. Here are some candidate counter examples:
The major enterprise system management platforms from IBM, HP, BMC, MSFT and CA take their data from other ‘specialized’ sources some of which are agents some of which are other element managers. I quickly concede that reasonable people (maybe even you) can point to non-MOMish characteristics of these products. So they are not a perfect counterexample. But they sure generate a lot of revenue (if that is measure of non-failure).
Perhaps a better counterexample, not for the infrastructure management world is virtually any BI installation with a data mart that integrates financial, sales, manufacturing, etc data. The BI market is not small and is one of the fastest growing technology market segments. Note that in these installations, the strategy is not to put sales data into the manufacturing system or manufacturing data into the financial system.
Finally, I would say that your suggestion that the vulnerability management system be the integration point is architecturally a MOM approach in which the MOM is the vuln platform. My point is that I don’t think that will ever be the best choice.
Beating this to death a bit more, the functional requirements for delivering analytics are so different than managing a specific security line function. Employing a federation of specialists should be (IMHO) the preferred strategy.
I have one last thought. Integration is at the heart of providing good analytics. The two key pieces are strong data modeling to get the buckets defined and then technical interfaces to get the buckets properly populated in the face of inconsistent, overlapping, and dirty data. In my experience integration is much easier with BI tools than it is with current vulnerability management platforms.
Reasonable people can disagree on this. There are no absolutes.
@Betsy, thanks for the comment and clearly food for thought. Obviously a purpose built intelligence platform provides some unique capabilities, but would require a tremendous amount of customer integration. I think it’ll be an iterative process, but the hope would be the data model/engine over time would look a lot more a BI platform to provide the kind of analytics required.
I just don’t think there is a lot of appetite for customers to build a separate, overlay data mart. They’ll need to get their iteratively leveraging the platforms and tools they’ve got installed. The Manager of Managers concepts has failed over and over again.
So yes, I think we agree where the technology needs to go over time. It’s just those pesky details of how you get there that muck things up…
Mike.
Great article. My heart leapt for joy when I read your words that a sound data model is required as a foundation for integrating data across multiple sources. You seem to favor nominating the vulnerability management solution as the integration point. I have concluded from my own experience that there is a better choice–namely a business intelligence platform. My proposal is to create a formal datamart. Export data to the datamart from any worthwhile data source and use the facilities of the BI tool to perform analytics. The OLAP data model (hypercubes instead of tables) supports hierarchical attributes which really facilitate drill down exploration and what-if scenarios. BI tools standardly have extensive extract-transform-load (ETL) features, sophisticated quantitative analysis capabilities, and very flexible reporting (interactive dashboards and static reporting that can push stuff to Word, PPT, and Excel). IMHO, the vuln platform vendors will never equal what the BI vendors offer. Just food for thought.