By Adrian Lane
As we wrap up our Security Management 2.0 series, we have completed quite a journey. You have undertaken a disciplined and objective process to determine if it’s worth moving to a new security management platform. Assuming that your decision is to move, now it gets real. You need to implement and migrate your existing environment to the new thing, while maintaining service levels and without opening your organization to any additional risk. Walk in the park, right? Let’s address these migration issues, so hopefully you can learn from some of my pain.
I started work at a previous employer two days after an IT consultancy performed a server migration. Coincidentally, at the same time I was helping a friend at a major bank review his data center migration plans. I’ll tell you that the bank had every phase of the change-over planned down in half-day increments, with backup plans in place following months of migration rehearsals. Let’s just say the IT consultancy had less elaborate plans. Bank employees knew their systems were critical and treated the migration as such – IT consultants, not so much. When I walked into the offices at my new job every server was down, removed from their racks, sitting in a pile by the door. The consultancy was assembling the new hardware – and had been for more than a day. Their plan was to finish the hardware in a day or so; when they finished that, they would install the operating systems. Then when they had the identity management system working, they planned to install the applications and import customer data. Out in the hallway, a few dozen very angry sales people paced the halls, idle, 3 weeks before the close of the quarter. It was a bad day for everyone.
The IT consultancy’s contract was terminated that day. After plugging the old servers back in and dispersing the lynch mob outside the server room, I planned out how to migrate to the new servers without any additional downtime. It was not just for the business’s sake, but to ensure my personal safety as well. While I did not go to the same extremes as my friend’s team at a certain giant, I acknowledged my servers were no less critical to our business, and a seamless migration of services was mandatory.
What can we learn from this somewhat transformative experience? A flash cutover never really is. We recommend you start deploying the new SIEM long before you get rid of the old. At best, you’ll deprecate portions of the older system after newer replacement capabilities are online, but you will likely want the older system as a fallback until the new functions have been vetted and tuned. We have learned the importance of this staging process the hard way. Ignore it at your own peril, keeping in mind that your security management platform sustains several key business functions.
We have broken the migration process into two phases: planning and implementation. Your plan needs to be very clear and specific about when things get installed, how data gets migrated, when you cut over from the old systems to the new, and who performs the work.
The Planning step leverages much of the work you have done up to this point in the process of evaluating replacement options – you just need to tune it for the migration.
- Review: First, go back through some of the documents you created earlier in this series. First are the platform evaluation documents, which will help to understand what the current system provides, as well as the key areas of deficiency to address. These documents become the priority list for the migration effort, and form the foundation of the migration task list. Next, leverage what you learned during the Proof of Concept (PoC). When evaluating your new security management platform provider, you conducted a mini-deployment exercise. Use the findings from that exercise – what worked and what didn’t – to feed subsequent planning and address issues it identified.
- Focus on Incremental Success: What do you install first? Do you work top down or bottom up? Do you keep both systems operational throughout the entire migration, or do you shut down portions of the old as each node migrates? We recommend you use your deployment model as a guide. You can learn more about these models by checking out our Understanding and Selecting a SIEM paper. When using a mesh deployment model, it’s often easiest to make sure a single node/location is fully functional before moving on to the next. With ring architectures, it’s best to get the central SIEM platform operational and then gradually add nodes around it. Hierarchal models are best deployed top-down, with the central server first, followed by regional aggregation nodes in order of criticality, then down to the collector level. The point is to make sure the project is broken up to ensure success happens incrementally, and avoid proceeding down any wrong paths.
- Allocate resources: Who is going to do the work? When are they going to do it? How long will it take to deploy the platform, data collector and/or log management support system(s)? This is also the time to engage professional services and enlist the new vendor’s assistance. The vendor presumably does these implementations all day long, so they should have expertise at estimating these timelines. You may also want to engage them to perform some (or all) of the work in tandem with your staff, at least for the first few locations until you get the process down.
- Define the Timeline: Estimate the time it will take to deploy the servers, install the collectors, and implement your policies. Add some time in for testing and verification. There is likely some ‘guesstimation’ on your part, but you have some reasonable metrics to base your plan on, from the PoC and prior experience with SIEM. You did document the PoC, right? Plan the project commencement date and publish to the team. Solicit feedback and adjust before commencing, because you need shared accountability with the operations team(s) to make sure everyone has a vested interest in project success.
- Prep work: We recommend you do as much work as possible before you begin the migration, including construction of the rules and policies you will rely upon to generate alerts and reports. Specify in advance any policies, reports, user accounts, data filters, backup schedules, data encryption, and related services you can now. You already have a rule base, so leverage it to get going. Of course, you’ll tune things as you go, but why reinvent the wheel? Keep in mind, you will always find something you failed to plan for – often an unexpected problem – that sets your schedule behind. Preparation helps spot missing tasks and makes deployment go faster.
Remember that the migration need not (and in fact generally should not) be an all at once exercise – you have the luxury of doing one piece at a time in the order that best suits your requirements.
- Deploy platform(s): This varies based on the deployment model, as discussed above, but typically you install the main security management platforms first. Basic system configuration, identity management and access control integration, and basic network configuration. Once complete, connect to a couple data sources and other aggregation points to make sure the system is operating correctly.
- Deploy supporting services: Deploy the data collectors and make sure event collection is working correctly. If you are using a flat deployment model, now is the time to configure the platform to collect these events for the first set of deployment tasks. If you are using a log management/SIEM hybrid or regional data aggregators, this is the time to install those additional aggregation points and get them feeding data into the primary SIEM system to confirm proper information flow – at a small scale before ramping up event traffic. For those moving to a new platform to get real-time analysis, make sure the events are being collected properly. You are only concerned with getting data into the platform in a timely fashion right now – tune the system later.
- Install policies & reports: This is where you deploy the rules that comb through events and find anomalies. Hopefully you created as many as possible during the PoC and planning stages. For real-time analysis you need to tune rules to optimize performance. Remember that for each rule the system must do exponentially more work, reducing performance and throughput. Look for ways to create rules with fewer comparisons, and balance fine-tuning of rules to specific problems against more generic rules that catch many problems – sometimes you can throw hardware at the problem (with a bigger server) to handle more events, but more efficient policies are always worthwhile.
- Test and verify: Are your reports being generated properly? Are the correct alerts being generated in a timely fashion? Generate copies of the reports and send them to the team for review, and compare against to the existing platform (which is still operational, right?). For alerts and forensic analysis, this is a good time to rerun your “Red Team” drill from the PoC to make sure you catch anomalies and confirm the accuracy of your results. Verify you are getting what you need – now is the time to figure out if there are problems with the system, while you still have a change to find and fix problems and before you start depending on it.
- Stakeholder sign-off: Get it in writing – trust me, this will save aggravation in the future when someone from Ops says: “Hey, where is XYZ that used to be there?” Have the compliance, security, and IT ops teams sign off on completion of the project – they own it now too (remember shared accountability?). Make sure the group is satisfied and/or all issues are documented – if not fully solved – by this point.
- Decommission: Now is the time to retire the older system. You may choose to run the incumbent SIEM for a few months after the new system is in place, just in case. But there are not many reasons to keep the older system around, and plenty of reasons it should be sent packing. Older agents & sensors should be removed, user accounts dedicated to the older platform locked down, and hardware and virtual server real estate reclaimed. Once again, someone will need to be assigned the work with an agreed-on time frame for completion. Trouble-ticketing systems are a handy way to schedule these tasks and provide automated completion reports.
So with all that we finish the Security Management 2.0: Time for a new SIEM? series. Let’s revisit some of the most critical aspects:
- Requirements rule: Take this opportunity to figure out what you really need – not what the vendor says you can do, or what your users read in a trade rag. Defining your requirements is the linchpin of this entire process, so make sure you do that well.
- Do the work: When a project is perceived as a failure, the inclination is to just change in hopes the new thing will be better. Ultimately that may be the right answer, but don’t embark on such a major project based purely on a blind assumption. Evaluate your platform objectively and assess the challengers skeptically. Everything looks great in a PowerPoint deck, but leverage the PoC to see what will really work in your environment.
- Practice inclusion: Any security management technology needs to be leveraged across the organization – if only for dashboards and reports. So make this process as inclusive as possible. That means getting buy-in from not just the senior team and the money folks, but also from operations and administrators. If you plan to capture data from application and database sources include those teams in the process. You need their help, and want them to feel some shared responsibility for making the project succeed.
- Get Quick Wins: Focus your migration on achieving consistent successes. That means starting slowly on areas you know will work well. Get one thing done correctly before moving on to the next one. Remember, most likely this whole process stems from an issue perceived with the incumbent, so make sure the new tool will work well, and that means finishing what you start.
As always, we are interested in your feedback. Let us know via the comments what makes sense and what doesn’t.
Posted at Monday 19th September 2011 10:00 am
(0) Comments •
By Mike Rothman
You have made your decision and recommended it up the food chain, so now the fun part begins. Well, fun for some folks, anyway. For this post we’ll assume you have decided to move to a new platform. We understand some people decide not to move, but use the question of switching as a negotiating tactic. But it bears repeating that it is no bad thing to stay with your existing platform, so long as you have done the work to determine it can meet your requirements. We’re writing this paper for the people who keep telling us how unhappy they are, and how their evolving requirements have not been met. So after asking all the right questions, if the best answer is to stay put, that’s a less disruptive path anyway.
For now, though, let’s just assume the current platform is not going to get there. Now the job is to get the best price for the new offering. Here are a few tips to leverage for the best deal:
- Time the buy: Yes, this is Negotiation 101. Wait until the end of the quarter and squeeze your sales rep for the best deal to get the PO in by the last day of the month. Sometimes it works, sometimes it doesn’t. But it’s worth trying.
- Tell the incumbent they have lost the deal: The next step is to get the incumbent involved. Once you put in a call letting them know you are going in a different direction, they usually respond. Not always, but most times the incumbent will try to save the deal. And then you can go back to the challenger and tell them they need to do a little better, because you got this great offer from their entrenched competition. And just like when buying a car, to use this tactic you must be willing to walk away.
- Look at non-cash add-ons: Sometimes the challenger can’t discount any more. But you can ask for additional professional services, modules, boxes, whatever. Remember, the incremental cost of software is zero, zilch, nada – so vendors can often bundle in a little more to get the deal.
- Revisit service levels: Another non-cash sweetner could be an enhanced level of service. Maybe it’s a dedicated project manager to get your migration done. Maybe it’s the Platinum level of support, even if you pay for Bronze. Given the amount of care and feeding required to keep any security management platform tuned and optimized, having a deeper service relationship could come in handy.
- Dealing with your boss’s boss: One last thing – be prepared for your recommendation to be challenged, especially if the incumbent sells a lot of other gear to your company. The entire process we have laid out prepares you for that call, so just go through the logic of your decision once more, making clear that your recommendation is the best direction for the organization.
Tactics for the Status Quo
But it would be pretty naive to not be prepared in case the decision goes the other way – due to pricing, politics, or any other reason beyond your control. So it you have to make the status quo work and keep the incumbent, here are some ideas flor making lemonade from the proverbial lemon.
- Tell the incumbent they’re losing the deal: If the incumbent doesn’t already know they are at risk, it can’t hurt to tell them. Some vendors (especially the big ones) don’t care, which is probably why you were looking for something new anyway. But others will get the wake-up call and try to make you happy. That’s the time to revisit your platform evaluation and figure out what needs to be fixed.
- Get services: If your issue is not getting proper value from the system, push to have the incumbent provide some professional services to improve the implementation. Maybe send your folks to training. Have their team set up a new set of rules and do knowledge transfer. There are many options, but if you have to make do with what you have, at least force the vendor’s hand to make the systems work better.
- Scale up (at lower prices): If scalability is the issue, confront that directly with the incumbent and request additional hardware and/or licenses to address the issue. Of course, this may not be enough, but every little bit helps, and if moving to a new platform isn’t an option, at least you can ease the problem a bit. Especially when the incumbent knows you were looking at new gear because of a scaling problem.
- Add use cases: Another way to get additional value is to request additional modules be thrown into a renewal or expansion deal. Maybe add the identity module or look at configuration auditing. Or work with the team to add database and/or application monitoring. Again, the more you use the tool, the more value you’ll get, so figure out what the incumbent will do to make you happy.
Honestly, if you must stick with the existing system, you don’t have much flexibility. The incumbent doesn’t need to know that, though, so try to use the specter of migration as leverage. But at the end of the day, it is what it is. Throughout this process you have figured out what you need the tool to do, so now do your best to get there, within your constraints.
Once the deal is done, it’s time to move to the new platform. We will wrap this series by discussing migration and helping structure a plan to get onto the new kit. It will be hard – it always is – but you can leverage everything you learned through your first go-round with the incumbent, as well as this process, to build a very clear map of where you need to go and how to get there. Stay tuned for that.
Posted at Thursday 15th September 2011 11:00 am
(2) Comments •
By Mike Rothman
Given the evolution of both the technology and the attacks, it’s time to revisit your specific requirements and use cases – both current and evolving. You also need to be brutally honest about what your existing product or service does and does not do, as well as your team’s ability to support and maintain it. This is essential – you need a fresh look at the environment to understand what you need today and tomorrow, and what kind of resources and expertise you can bring to bear, unconstrained by what you need and do today. Many of you have laundry lists of things you would like to be able to do with current systems, but can’t. Those are a good place to start, but you also need to consider the trends for your industry and look at what’s coming down the road in terms of security and business challenges that will emerge over the next couple years. Capturing the current and foreseeable needs is what our Security Management 2.0 process is all about.
In order to figure out the best path forward for security management, start with the proverbial blank slate. That means revisiting why you need a security management platform with fresh eyes. It means taking a critical look at use cases and figuring out their relative importance. As we described in our Understanding and Selecting a SIEM/Log Management Platform paper, the main use cases for security management really break down into 3 buckets: Improving security, increasing efficiency, and automating compliance.
When you think about it, security success in today’s environment comes down to a handful of key imperatives. First we need to improve the security of our environment. We are losing ground to the bad guys, and we need to make some inroads on figuring out what’s being attacked more quickly and protecting it. Unfortunately nobody’s selling (working) crystal balls that tell you how and when you will be attacked, so the blank slate strategy entail monitoring more and determining how your detection and response systems will react more quickly.
Next we need to do more with less. It does look like the global economy is improving but we can’t expect to get back to the halcyon days of spend first, ask questions later – ever. And while that may sound like “work smarter, not harder” management double-speak, there are specific automation and divide & conquer strategies that help reduce the burden. With more systems under management, we have more to worry about and less time to spend poring over reports, looking for the proverbial needle in the haystack. Given the number of new attacks – counted by any metric you like – we need to increase the efficiency of resource utilization.
Finally, auditors show up a few times a year, and they want their reports. Summary reports, detail reports, and reports that validate other reports. The entire auditor dance focuses on convincing the audit team that you have the proper security controls implemented and effective. That involves a tremendous amount of data gathering, analysis, and reporting to set up – with continued tweaking required over time. It’s basically a full time job to get ready for the audit, dropped on folks who already have full time jobs. So we must automate those compliance functions to the greatest degree possible.
Increasingly technologies that monitor up the stack are helping in all three areas by collecting additional data types like identity, database activity monitoring, application support, and configuration management – along with different ways of addressing the problems. As attacks target these higher-level functions and require visibility beyond just the core infrastructure, the security management platform needs to detect attacks in the context of the business threat. Don’t forget about the need for advanced forensics, given the folly of thinking you can block every attack. So a security management platform to help React Faster and Better within an incident response context may also be a key requirement moving forward.
You might also be looking for a more integrated user experience across a number of security functions. For example, you may have separate vendors for change detection, vulnerability management, firewall and IDS monitoring, and database activity monitoring. You may be wearing out your swivel chair switching between all the consoles, and simplification via vendor consolidation can be a key driver.
Understand that your general requirements may not have changed dramatically, although you may prioritize the use cases a little differently now. For example, perhaps you first implemented Log Management to crank out some compliance reports. It wouldn’t be the first time we’ve seen that as the primary driver. But you just finished cleaning up a messy security incident your existing SIEM missed. If so, you probably now put a pretty high value on making sure correlation works better.
Once you are pretty clear within your team about the requirements for a security management team, start to discuss the topic a bit with external influencers. You can consult the ops teams, business users, and perhaps the general counsel about their requirements. Doing this confirms the priorities you already know and sets the stage to provide you support if the decision involves moving to a new platform.
Now it’s time to check your ego at the door. Unless you weren’t part of the original selection team – then you can blame the old regime. Okay, we’re kidding. Either way the key to this step involves a brutally honest assessment of how your existing platform meets the needs that drove the initial implementation. This post-mortem type analysis evaluates the platform in terms of each of the main use cases (security, efficiency, compliance automation), as well as some other aspects of real world use.
Even better, you’ll need to determine why the product/service isn’t measuring up. Common reasons we see include:
- Ease of use: Are there issues getting the product/service up and running? Did it require tons of professional services? Were you able to set up sufficiently granular rule sets and reports? This tends to be an issue with the technology platform itself.
- Implementation: Were the rules configured correctly up front? Was the rule base maintained adequately as things changed, or was rule management so painful it tended to lag? Was all the proper data collected by the system to provide a broad view of your infrastructure? These issues tend to be your problems, and you need to own them. While deceiving yourself about how your organization implemented the technology might save a little face, it would only position you for another project failure.
- Scalability: Did your chosen platform just run out of gas when event volume ramped up? Were there architectural or even cost issues that prevented you from deploying a broader infrastructure to meet your needs? Did you have to surround the existing correlation engine with a set of logging devices to control event flow? This might be a technology issue, or it could be a deployment architecture problem. Either way, the existing platform hasn’t scaled to what you need and that’s a big issue.
- Care and feeding: Do you have adequate resources and expertise to optimize the system? Does keeping the back-end database operational require multiple FTEs? Has your staff been gutted to the point you don’t have resources to monitor the system yourself? It’s very important to make a realistic assessment of your team’s ability to support the security management platform moving forward. The best technology in the world doesn’t help much if you can’t keep it up and running with a current rule set.
- Forensics Does ‘drill down’ mean manually looking through raw event logs? Worse, does it always involve going back to the archives to find the events you need? Were events normalized into a useless subset of original data? Despite advancements in detection and alerting, forensic analysis is a common requirement for ascertaining the real severity of detected issues; and making it easier to access important data saves time and frustration.
- Dying on the vine: Has the technology been kept up to date? When was the last major release and did it address some of your issues? Has the vendor told you about the next release’s road map? Have they made good on past promises of new capabilities? After big acquisitions, some products aren’t maintained adequately. Now you have to assess whether things will get better.
- Vendor viability: Did you buy a product from an early leader who has since hit hard times? Did their product roadmap involve driving off the road? Vendor fortunes can change dramatically after you buy their products, and you may need to reassess the vendor’s ongoing viability. It’s a bad day when you have to make a call to get source code delivered from escrow, after creditors locked the vendor’s doors.
Now you see why you need to check your ego at the door and make a brutally honest assessment of your team’s ability to implement and support a security management platform. It would be great if this technology were plug and play, but it isn’t. Regardless of whether you move to a new platform or not, you’ll need to support it. It’s very easy to just blame the vendor if the product hasn’t met expectations, especially if the product has been left to die on the vine. But if there were implementation or maintenance issues on your side, those will still be there even with a modern, up-to-date platform. You can’t blame the vendor for operational failure on your end.
Now that you understand what you need at this point in time, and why your existing platform isn’t meeting your needs, it’s time to evaluate other options. There is a lot of ground to cover, so the next two posts will deal with new features available on these platforms and why some of these new capabilities are worth investigating.
Posted at Thursday 25th August 2011 5:57 pm
(0) Comments •
By Adrian Lane
Our motivation for launching the Security Management 2.0 research project lies in the general dissatisfaction with SIEM implementations – which in some cases have not delivered the expected value. The issues typically result from failure to scale, poor ease of use, excessive effort for care and feeding, or just customer execution failure. Granted some of the discontent is clearly navel-gazing – parsing and analyzing log files as part of your daily job is boring, mundane, and error-prone work you’d rather not do. But dissatisfaction with SIEM is largely legitimate and has gotten worse, as system load has grown and systems have been subjected to additional security requirements, driven by new and creative attack vectors. This all spotlights the fragility and poor architectural choices of some SIEM and Log Management platforms, especially early movers. Given that companies need to collect more – not less – data, review and management just get harder. Exponentially harder.
This post is not to focus on user complaints – that doesn’t help solve problems. Instead let’s focus on the changes in SIEM platforms driving users to revisit their platform decisions. There are over 20 SIEM and Log Management vendors in the market, most of which have been at it for 5-10 years. Each vendor has evolved its products (and services) to meet customer requirements, as well as provide some degree of differentiation against the competition. We have seen new system architectures to maximize performance, increase scalability, leverage hybrid deployments, and broaden collection via CEF and universal collection format support. Usability enhancements include capabilities for data manipulation; addition of contextual data via log/event enrichment; as well as more powerful tools for management, reporting, and visualization. Data analysis enhancements include expanding supported data types to include dozens of variants for monitoring, correlating/alerting, and reporting on change controls; configuration, application, and threat data; content analysis (poor man’s DLP) and user activity monitoring.
With literally hundreds of new features to comb through, it’s important to recognize that not all innovation is valuable to you, and you should keep irrelevancies out of your analysis of benefits of moving to a new platform. Just because the shiny new object has lots of bells and whistles doesn’t mean they are relevant to your decision. Our research shows the most impactful enhancements have been the enhancements in scalability, along with reduced storage and management costs. Specific examples include mesh deployment models – where each device provides full logging and SIEM functionality – moving real-time processing closer to the event sources. As we described in Understanding and Selecting SIEM/Log Management: the right architecture can deliver the trifecta of fast analysis, comprehensive collection/normalization/correlation of events, and single-point administration – but this requires a significant overhaul of early SIEM architectures. Every vendor meets the basic collection and management requirements, but only a few platforms do well at modern scale and scope.
These architectural changes to enhance scalability and extend data types are seriously disruptive for vendors – they typically require a proverbial “brain transplant”: an extensive rebuild of the underlying data model and architecture. But the cost in time, manpower, and disrupted reliability was too high for some early market leaders – as a result some instead opted instead to innovate with sexy new bells and whistles which were easier and faster to develop and show off, but left them behind the rest of the market on real functionality. This is why we all too often see a web console, some additional data sources (such as identity and database activity data) and a plethora of quasi-useful feature enhancements tacked onto a limited scalability centralized server: that option cost less with less vendor risk. It sounds trite, but it is easy to be distracted from the most important SIEM advancements – those that deliver on the core values of analysis and management at scale.
Speaking of scalability issues, coinciding with the increased acceptance (and adoption) of managed security services, we are seeing many organizations look at outsourcing their SIEM. Given the increased scaling requirements of today’s security management platforms, making compute and storage more of a service provider’s problem is very attractive to some organizations. Combined with the commoditization of simple network security event analysis, this has made outsourcing SIEM all the more attractive. Moving to a managed SIEM service also allows customers to save face by addressing the shortcomings of their current product without needing to acknowledge a failed investment. In this model, the customer defines the reports and security controls and the service provider deploys and manages SIEM functions.
Of course, there are limitations to some managed SIEM offerings, so it all gets back to what problem you are trying to solve with your SIEM and/or Log Management deployment. To make things even more complicated, we also see hybrid architectures in early use, where a service provider does the fairly straightforward network (and server) event log analysis/correlation/reporting, while an in-house security management platform handles higher level analysis (identity, database, application logs, etc.) and deeper forensic analysis. We’ll discuss these architectures in more depth later in this series.
But this Security Management 2.0 process must start with the problem(s) you need to solve. Next we’ll talk about how to revisit your security management requirements, ensuring that you take a fresh look at the issues to make the right decision for your organization moving forward.
Posted at Monday 22nd August 2011 1:00 pm
(0) Comments •