Security Management 2.0: MigrationBy Adrian Lane
As we wrap up our Security Management 2.0 series, we have completed quite a journey. You have undertaken a disciplined and objective process to determine if it’s worth moving to a new security management platform. Assuming that your decision is to move, now it gets real. You need to implement and migrate your existing environment to the new thing, while maintaining service levels and without opening your organization to any additional risk. Walk in the park, right? Let’s address these migration issues, so hopefully you can learn from some of my pain.
I started work at a previous employer two days after an IT consultancy performed a server migration. Coincidentally, at the same time I was helping a friend at a major bank review his data center migration plans. I’ll tell you that the bank had every phase of the change-over planned down in half-day increments, with backup plans in place following months of migration rehearsals. Let’s just say the IT consultancy had less elaborate plans. Bank employees knew their systems were critical and treated the migration as such – IT consultants, not so much. When I walked into the offices at my new job every server was down, removed from their racks, sitting in a pile by the door. The consultancy was assembling the new hardware – and had been for more than a day. Their plan was to finish the hardware in a day or so; when they finished that, they would install the operating systems. Then when they had the identity management system working, they planned to install the applications and import customer data. Out in the hallway, a few dozen very angry sales people paced the halls, idle, 3 weeks before the close of the quarter. It was a bad day for everyone.
The IT consultancy’s contract was terminated that day. After plugging the old servers back in and dispersing the lynch mob outside the server room, I planned out how to migrate to the new servers without any additional downtime. It was not just for the business’s sake, but to ensure my personal safety as well. While I did not go to the same extremes as my friend’s team at a certain giant, I acknowledged my servers were no less critical to our business, and a seamless migration of services was mandatory.
What can we learn from this somewhat transformative experience? A flash cutover never really is. We recommend you start deploying the new SIEM long before you get rid of the old. At best, you’ll deprecate portions of the older system after newer replacement capabilities are online, but you will likely want the older system as a fallback until the new functions have been vetted and tuned. We have learned the importance of this staging process the hard way. Ignore it at your own peril, keeping in mind that your security management platform sustains several key business functions.
We have broken the migration process into two phases: planning and implementation. Your plan needs to be very clear and specific about when things get installed, how data gets migrated, when you cut over from the old systems to the new, and who performs the work.
The Planning step leverages much of the work you have done up to this point in the process of evaluating replacement options – you just need to tune it for the migration.
- Review: First, go back through some of the documents you created earlier in this series. First are the platform evaluation documents, which will help to understand what the current system provides, as well as the key areas of deficiency to address. These documents become the priority list for the migration effort, and form the foundation of the migration task list. Next, leverage what you learned during the Proof of Concept (PoC). When evaluating your new security management platform provider, you conducted a mini-deployment exercise. Use the findings from that exercise – what worked and what didn’t – to feed subsequent planning and address issues it identified.
- Focus on Incremental Success: What do you install first? Do you work top down or bottom up? Do you keep both systems operational throughout the entire migration, or do you shut down portions of the old as each node migrates? We recommend you use your deployment model as a guide. You can learn more about these models by checking out our Understanding and Selecting a SIEM paper. When using a mesh deployment model, it’s often easiest to make sure a single node/location is fully functional before moving on to the next. With ring architectures, it’s best to get the central SIEM platform operational and then gradually add nodes around it. Hierarchal models are best deployed top-down, with the central server first, followed by regional aggregation nodes in order of criticality, then down to the collector level. The point is to make sure the project is broken up to ensure success happens incrementally, and avoid proceeding down any wrong paths.
- Allocate resources: Who is going to do the work? When are they going to do it? How long will it take to deploy the platform, data collector and/or log management support system(s)? This is also the time to engage professional services and enlist the new vendor’s assistance. The vendor presumably does these implementations all day long, so they should have expertise at estimating these timelines. You may also want to engage them to perform some (or all) of the work in tandem with your staff, at least for the first few locations until you get the process down.
- Define the Timeline: Estimate the time it will take to deploy the servers, install the collectors, and implement your policies. Add some time in for testing and verification. There is likely some ‘guesstimation’ on your part, but you have some reasonable metrics to base your plan on, from the PoC and prior experience with SIEM. You did document the PoC, right? Plan the project commencement date and publish to the team. Solicit feedback and adjust before commencing, because you need shared accountability with the operations team(s) to make sure everyone has a vested interest in project success.
- Prep work: We recommend you do as much work as possible before you begin the migration, including construction of the rules and policies you will rely upon to generate alerts and reports. Specify in advance any policies, reports, user accounts, data filters, backup schedules, data encryption, and related services you can now. You already have a rule base, so leverage it to get going. Of course, you’ll tune things as you go, but why reinvent the wheel? Keep in mind, you will always find something you failed to plan for – often an unexpected problem – that sets your schedule behind. Preparation helps spot missing tasks and makes deployment go faster.
Remember that the migration need not (and in fact generally should not) be an all at once exercise – you have the luxury of doing one piece at a time in the order that best suits your requirements.
- Deploy platform(s): This varies based on the deployment model, as discussed above, but typically you install the main security management platforms first. Basic system configuration, identity management and access control integration, and basic network configuration. Once complete, connect to a couple data sources and other aggregation points to make sure the system is operating correctly.
- Deploy supporting services: Deploy the data collectors and make sure event collection is working correctly. If you are using a flat deployment model, now is the time to configure the platform to collect these events for the first set of deployment tasks. If you are using a log management/SIEM hybrid or regional data aggregators, this is the time to install those additional aggregation points and get them feeding data into the primary SIEM system to confirm proper information flow – at a small scale before ramping up event traffic. For those moving to a new platform to get real-time analysis, make sure the events are being collected properly. You are only concerned with getting data into the platform in a timely fashion right now – tune the system later.
- Install policies & reports: This is where you deploy the rules that comb through events and find anomalies. Hopefully you created as many as possible during the PoC and planning stages. For real-time analysis you need to tune rules to optimize performance. Remember that for each rule the system must do exponentially more work, reducing performance and throughput. Look for ways to create rules with fewer comparisons, and balance fine-tuning of rules to specific problems against more generic rules that catch many problems – sometimes you can throw hardware at the problem (with a bigger server) to handle more events, but more efficient policies are always worthwhile.
- Test and verify: Are your reports being generated properly? Are the correct alerts being generated in a timely fashion? Generate copies of the reports and send them to the team for review, and compare against to the existing platform (which is still operational, right?). For alerts and forensic analysis, this is a good time to rerun your “Red Team” drill from the PoC to make sure you catch anomalies and confirm the accuracy of your results. Verify you are getting what you need – now is the time to figure out if there are problems with the system, while you still have a change to find and fix problems and before you start depending on it.
- Stakeholder sign-off: Get it in writing – trust me, this will save aggravation in the future when someone from Ops says: “Hey, where is XYZ that used to be there?” Have the compliance, security, and IT ops teams sign off on completion of the project – they own it now too (remember shared accountability?). Make sure the group is satisfied and/or all issues are documented – if not fully solved – by this point.
- Decommission: Now is the time to retire the older system. You may choose to run the incumbent SIEM for a few months after the new system is in place, just in case. But there are not many reasons to keep the older system around, and plenty of reasons it should be sent packing. Older agents & sensors should be removed, user accounts dedicated to the older platform locked down, and hardware and virtual server real estate reclaimed. Once again, someone will need to be assigned the work with an agreed-on time frame for completion. Trouble-ticketing systems are a handy way to schedule these tasks and provide automated completion reports.
So with all that we finish the Security Management 2.0: Time for a new SIEM? series. Let’s revisit some of the most critical aspects:
- Requirements rule: Take this opportunity to figure out what you really need – not what the vendor says you can do, or what your users read in a trade rag. Defining your requirements is the linchpin of this entire process, so make sure you do that well.
- Do the work: When a project is perceived as a failure, the inclination is to just change in hopes the new thing will be better. Ultimately that may be the right answer, but don’t embark on such a major project based purely on a blind assumption. Evaluate your platform objectively and assess the challengers skeptically. Everything looks great in a PowerPoint deck, but leverage the PoC to see what will really work in your environment.
- Practice inclusion: Any security management technology needs to be leveraged across the organization – if only for dashboards and reports. So make this process as inclusive as possible. That means getting buy-in from not just the senior team and the money folks, but also from operations and administrators. If you plan to capture data from application and database sources include those teams in the process. You need their help, and want them to feel some shared responsibility for making the project succeed.
- Get Quick Wins: Focus your migration on achieving consistent successes. That means starting slowly on areas you know will work well. Get one thing done correctly before moving on to the next one. Remember, most likely this whole process stems from an issue perceived with the incumbent, so make sure the new tool will work well, and that means finishing what you start.
As always, we are interested in your feedback. Let us know via the comments what makes sense and what doesn’t.