By Mike Rothman
Note: Based on our ongoing research into the process maps, we decided we needed to update both the Manage Firewall and IDS/IPS process maps. As we built the subprocesses and gathered feedback, it was clear we didn’t make a clear enough distinction between main processes and subprocesses. So we are taking another crack at this process map. As always, your feedback is appreciated.
After posting the Monitor Process Map to define the high-level process for monitoring firewalls, IDS/IPS and servers, we now look at the management processes for these devices. In this post we tackle firewalls.
Remember, the Quant process depends on you to keep us honest. Our primary research and experience in the trenches gives us a good idea, but you pick up additional nuances fighting the battles every day. So if something seems a bit funky, let us know in the comments.
Keep the philosophy of Quant in mind: the high level process framework is intended to cover all the tasks. That doesn’t mean you need to do everything – this should be a fairly exhaustive list, and overkill for most organizations. Individual organizations should pick and choose the appropriate steps for their requirements.
When contrasting the monitor process with management, the first thing that jumps out is that policies drive the use of the device(s), but when you need to make a change the heavy process orientation kicks in. Why? Because making a mistake or unauthorized change can have severe ramifications, such as exposing critical data to the entire Internet. Right, that’s bad. So there are a lot of checks and balances in the change management process to ensure all changes are authorized and tested, and won’t create mayhem through a ripple effect.
In this phase we define what ports, protocols, and (increasingly) applications are allowed to traverse the firewall. Depending on the nature of what is protected and the sophistication of the firewall the policies may also include source and destination addresses, application behavior, and user entitlements.
A firewall rule base can resemble a junk closet – there is lots of stuff in there, but no one can quite remember what everything does. So it is best practice to periodically review firewall policy and prune rules that are obsolete, duplicative, overly exposed, or otherwise not needed. Possible catalysts for policy review include service requests (new application support, etc.), external advisories (to block a certain attack vector or work around a missing patch, etc.), and policy updates resulting from the operational management of the device (change management process described below).
Define/Update Policies & Rules
This involves defining the depth and breadth of the firewall policies – including which ports, protocols, and applications are allowed to traverse the firewall. Time-limited policies may also be deployed to support short-term access for specific applications or user communities. Additionally, policies vary depending on primary use case, which may include perimeter deployment or network segmentation, etc. Logging, alerting, and reporting policies are also defined in this step.
It’s important here to consider the hierarchy of policies that will be implemented on the devices. The chart at right shows a sample hierarchy including organizational policies at the highest level, which may then be supplemented (or even supplanted) by business unit or geographic policies. Those feed the specific policies and/or rules implemented in each location, which then filter down to the specific device. Policy inheritance can be leveraged to dramatically simplify the rule set, but it’s easy to introduce unintended consequences in the process. This is why constant testing of firewall rules is critical to maintaining a strong perimeter security posture.
Initial deployment of the firewall policies should include a QA process to ensure no rule impairs the ability of critical applications to communicate, either internally or externally.
Document Policy Changes
As the planning stage is an ongoing process, documentation is important for operational and compliance purposes. This documentation lists and details whatever changes have been made to the policies.
This phase deals with rule additions, changes, and deletions.
Evaluate Change Request
Based on the activities in the policy management phase, some type of policy/rule change will be requested for implementation. This step involves ensuring the requestor is authorized to request the change, as well assessing the relative priority of the change to slot it into an appropriate change window.
Changes are prioritized based on the nature of the policy update and risk of the catalyst driving the change – which might be an attack, a new 0-day, a changed application, or any of various other things. Then a deployment schedule is built from this prioritization, scheduled maintenance windows, and other factors. This usually involves the participation of multiple stakeholders – ranging from application, network, and system owners to business unit representatives if any downtime or change to application use models is anticipated.
Test and Approve
This step requires you to develop test criteria, perform any required testing, analyze the results, and approve the rule change for release once it meets your requirements. Testing should include monitoring the operation and performance impact of the change on the device. Changes may be implemented in “log-only” mode to understand their impact before committing to production deployment.
With an understanding of the impact of the change(s), the request are either approved or denied. Obviously approval may require a number of stakeholders approving the change. The approval workflow must be understood and agreed upon to avoid serious operational issues.
Prepare the target device(s) for deployment, deliver the change, and install. Verify that changes were properly deployed, including successful installation and operation. This might include use of vulnerability assessment tools or application test scripts to ensure production systems are not disrupted.
The process of making a change requires confirmation from both the operations team (during the Deploy step), and also from another entity (internal or external, but outside ops) as an audit. This is basic separation of duties.
Basically this involves validating the change to ensure the policies were properly updated, as well as to match the change to a specific change request. This closes the loop and ensures there is a trail for every change made.
In some cases, including data breach lockdowns and imminent zero-day attacks, a change to the firewall ruleset must be made immediately. A process to circumvent the broader change process should be established and documented in advance, ensuring proper authorization for such rushed changes, and that there is a roll-back capability in case of unintended consequences.
Health Monitoring and Maintenance
This phase involves ensuring the firewalls are operational and secure by monitoring the devices availability and performance. When necessary this will include upgrading the hardware. Additionally, software patches (for either functionality or security) are implemented in this phase. We’ve broken out this step due to its operational nature. This doesn’t relate to security or compliance directly, but can be a significant cost component of managing these devices and thus will be modeled separately.
For the purposes of this Quant project, we are considering the monitoring and management processes separate, although many organizations (especially the service providers that provide managed services) consider device management a superset of device monitoring.
So our firewall management process flow does not include any steps for incident investigation, response, validation, or management. Please refer to the monitoring process flow for those activities.
We are looking forward to your comments and feedback. Fire away.
Posted at Wednesday 4th August 2010 7:48 pm
(1) Comments •
By Mike Rothman
As I mentioned in the Mailbox Vigil, we don’t put much stock in snail mail anymore. Though we did get a handful of letters from XX1 (oldest daughter) from sleepaway camp, aside from that it’s bills and catalogs. That said, every so often you do get entertained by the mail. A case in point happened when we got back from our summer pilgrimage to the Northern regions this weekend (which is why there was no Incite last week).
On arriving home (after a brutal 15 hour car ride, ugh!) we were greeted by a huge box of mail delivered by our trusty postal worker. Given that the Boss was occupied doing about 100 loads of laundry and I had to jump back into work, we let XX1 express her newfound maturity and sort our mail.
It was pretty funny. She called out every single piece and got genuinely excited by some of the catalogs. She got a thank you note from a friend, a letter from another, and even a few of her own letters to us from camp (which didn’t arrive before we left on holiday). XX2 (her twin) got a thank you note also. But nothing for the boy. I could tell he was moping a bit and I hoped something would come his way.
Finally he heard the magic words: “Sam got a letter.” Reminded me of Blue’s Clues. It was from someone with an address at the local mall. Hmmm. But he dutifully cracked it open and had me read it to him. It was from someone at LensCrafters reminding him that it’s been a year since he’s gotten his glasses and he’s due for a check-up.
He was on the edge of his seat as I read about how many adults have big problems with their eyes and how important it is to get an annual check-up. Guess they didn’t realize the Boy is not yet 7 and also that he sees his Opthamologist every 6 weeks. But that didn’t matter – he got a letter.
So he’s carrying this letter around all day, like he just got a toy from Santa Claus or the Hanukkah fairy. He made me read it to him about 4 times. Now he thinks the sales person at LensCrafters is his pal. Hopefully he won’t want to invite her to his birthday party.
Normally I would have just thrown out the direct mail piece, but I’m glad we let XX1 sort the mail. The Boy provided me with an afternoon of laughter and that was certainly worth whatever it cost to send us the piece.
Photo credits: “surprise in the mailbox” originally uploaded by sean dreilinger
Recent Securosis Posts
- The Cancer within Evidence Based Research Methodologies
- Friday Summary: July 23, 2010
- Death, Irrelevance, and a Pig Roast
- What Do We Learn at Black Hat/DefCon?
- Tokenization Series:
- Various NSO Quant Posts:
Incite 4 U
We’re AV products. Who would try to hack us? – More great stuff from Krebs. This time he subjected himself to installing (and reinstalling) AV products in his VM to see which of them actually use Windows anti-exploitations technologies (like DEP and ASLR). The answer? Not many, though it’s good to see Microsoft eating their own dog food. I like the responses from the AV vendors, starting with F-Secure’s “we’ve been working on performance,” which means they are prioritizing not killing your machine over security – go figure. And Panda shows they have ostriches in Spain as well, as they use their own techniques to protect their software. OK, sure. This is indicative of the issues facing secure software. If the security guys can’t even do it right, we don’t have much hope for everyone else. Sad. – MR
Mid-market basics – She does not blog very often, but when she does, Jennifer Jabbusch gets it right. We here at Securosis are all about simplifying security for end users, and I thought JJ’s recent post on Four Must-Have SMB Security Tools did just that. With all the security pontification about new technologies to supplant firewalls, and how ineffective AV is at detecting bad code, there are a couple tools that are fundamental to data security. As bored as we are talking about them, AV, firewalls, and access controls are the three basics that everyone needs. While I would personally throw in encrypted backups as a must have, those are the core components. But for many SMB firms, these technologies are the starting point. They are not looking at extrusion prevention, behavioral monitoring, or event correlation – just trying to make sure the front door is locked, both physically and electronically. It’s amazing to think, but I run into companies all the time where an 8-year-old copy of Norton AV and a password on the ‘server’ are the security program. I hope to see more basic posts like this that appeal to the mainstream – and SMB is the mainstream – on Dark Reading and other blogs as well. – AL
Jailbreak with a side of shiv – Are you one of those folks who wants to jailbreak your iPhone to install some free apps on it? Even though it removes some of the most important security controls on the device? Well, have I got a deal for you! Just visit jailbreakme.com and the magical web application will jailbreak your phone right from the browser. Of course any jailbreak is the exploitation of a security vulnerability. And in this case it’s a remotely exploitable browser vulnerability, but don’t worry – I’m sure no bad guys will use it now that it’s public. Who would want to remotely hack the most popular cell phone on the planet? – RM
A pig by a different name – SourceFire recently unveiled Razorback, their latest open source framework. Yeah, that’s some kind of hog or something, so evidently they are committed to this pig naming convention. It’s targeting the after-attack time, when it’s about pinpointing root cause and profiling behavior to catch attackers. I think they should have called it Bacon, since this helps after the pig is dead. Maybe that’s why I don’t do marketing anymore. Razorback is designed to coordinate the information coming from a heterogenous set of threat management tools. This is actually a great idea. I’ve long said that if vendors can’t be big (as in Cisco or Oracle big), they need to act big. Realizing enterprises will have more stuff than SourceFire, pulling in that data, and doing something with it, makes a lot of sense. The base framework is open source, but don’t be surprised to see a commercial version in the near term. Someone has to pay Marty, after all. – MR
Disclosure Debate, Round 37 – Right before Black Hat Google updated its vulnerability disclosure policy (for when its researchers find new vulns). They are giving vendors a 60-day window to patch any “critical” vulnerability before disclosing (not that they have the best history for timely response). Now TippingPoint, probably the biggest purchaser of independently discovered vulnerabilities, is moving to a 6-month window. Whichever side you take on the disclosure debate, assuming these companies follow through with their statements, the debate itself may not be advancing but the practical implications certainly are. Many vendors sit on vulnerabilities for extended periods – sometimes years. Of the 3 (minor) vulnerabilities I have ever disclosed, 2 weren’t patched for over a year. While I’m against disclosing anything without giving a vendor the chance to patch, and patch timetables need to account for the complexities of maintaining major software, and it’s unacceptable for vendors to sit on these things – leaving customers at risk while hoping for the best. I wonder how many “exemptions” we’ll see to these policies. – RM
The future of reputation: malware fingerprinting – Since I was in the email security business, I’ve been fascinated with reputation. You know, how intent can be analyzed based on IP address and other tells from inbound messages/packets. The technology is entrenched within email and web filtering and we are seeing it increasingly integrated into perimeter gateways as well. Yeah, it’s another of those nebulous cloud services. When I read the coverage of Greg Hoglund’s Black Hat talk on fingerprinting malware code, I instantly thought of how cool it would be to integrate these fingerprints into the reputation system. So if you saw an executable fly by, you could know it came from the Mariposa guys and block it. Yeah, that’s a way off, but since we can’t get ahead of the threat, at least we can try to block stuff with questionable heritage. – MR
Papers, please – I don’t understand why Bejtlich has such a problem with Project Vigilant. The Phoenix Examiner thinks it’s legit, and that should be enough. Just because he has not heard of them doesn’t mean they’re not. It means they are too sooper sekrit to go around publicizing themselves. Guess Richard forgot about Security by Obscurity. Plus, there is documented proof of organizations with hundreds of members on the front lines every day, but I bet Mr. Bejtlich – if that even is his real name – doesn’t know them either. Project Vigilant has like 500 people; that’s a lot, right? Look around at your next ISSA or ISACA chapter meeting and tell me if you have that many people. You can’t fake that sort of thing. Bejtlich says “If they have been active for 14 years, why does no one I’ve asked know who these guys are?” Ignorance is no excuse for undermining Project Vigilant. Who’s to say Chet Uber is not the real deal? And with a name like ‘Uber’, he doesn’t even need a handle. You know, like “The Chief” or “Fearless Leader”. Plus Uber has a cool logo with a winged-V thingy … way cooler that that mixed-message Taijitu symbol. Who’s Bejtlich to question Uber when he’s out there, giving it 110%, fighting terror. It’s not like there is a vetting process to fight terror. Even if there was, that’s for losers. Chuck Norris would not have a vetting process. He’s already killed all the members of the vetting committee. Fightin’ terror! Jugiter Viglio, baby! – AL
Is the cost of a breach more than the cost to protect against it? – More survey nonsense from Ponemon. Evidently breach recovery costs are somewhere between $1 million and $53 million with a median of $3.8 million. And my morning coffee costs somewhere between 10 cents and a zillion dollars, with a median price of $2.25. But the numbers don’t matter, it’s the fact that a breach will cost you money. We all know that. The real question is whether the cost to clean up an uncertain event (the breach happening to you) is more than the cost to protect against it. Given the anecdotal evidence that revenue visibility for security vendors is poor for the rest of the year, I’m expecting a lot more organizations to roll the dice with clean-up. And it’s not clear they are wrong, says the Devil’s Advocate. – MR
Posted at Wednesday 4th August 2010 7:00 am
(0) Comments •
By Mike Rothman
Actually I learned nothing because I wasn’t there. Total calendar fail on my part, as a family vacation was scheduled during Black Hat week. You know how it goes. The Boss says, “how is the week of July 26 for our week at the beach?” BH is usually in early August, so I didn’t think twice.
But much as I missed seeing my peeps and tweeps at Black Hat, a week of R&R wasn’t all bad. Though I was sort of following the Tweeter and did see the coverage and bloggage of the major sessions. So what did we learn this year?
- SSL is bad: Our friend RSnake and Josh Sokol showed that SSL ain’t all that. Too bad 99% of the laypeople out there see the lock and figure all is good. Actually, 10% of laypeople know what the lock means. The other 89% wonder how the Estonians made off with their life savings.
- SCADA systems are porous: OK, I’m being kind. SCADA is a steaming pile of security FAIL. But we already knew that. Thanks to a Red Tiger, we now know there are close to 40,000 vulnerabilities in SCADA systems, so we have a number. At least these systems aren’t running anything important, right?
- Auto-complete is not your friend: As a Mac guy I never really relied on auto-complete, since I can use TextExpander. But lots of folks do and Big J got big press when he showed it’s bad in Safari and also then proved IE is exposed as well.
- Facebook spiders: Yes, an enterprising fellow named Ron Bowes realized that most folks have set their Facebook privacy settings, ah, incorrectly. So he was able to download about 100 million names, phone numbers, and email addresses with a Ruby script. Then he had the nerve to put it up on BitTorrent. Information wants to be free, after all. (This wasn’t a session at BH, but cool nonetheless.)
- ATM jackpot: Barnaby Jack showed once again that he can hit the jackpot at will since war dialing still workss (yay WarGames!), and you can get pretty much anything on the Internet (like a key to open many ATM devices). Anyhow, great demo and I’m sure organized crime is very interested in those attack vectors.
- I can haz your cell tower: Chis Paget showed how he could spoof a cell tower for $1,500. And we thought the WiFi Evil Twin was bad. This is cool stuff.
I could probably go on for a week, since all the smart kids go to Vegas in the summer to show how smart they are. And to be clear, they are smart. But do you, Mr. or Ms. Security Practitioner, care about these attacks and this research? The answer is yes. And no.
First of all, you can see the future at Black Hat. Most of the research is not weaponized and a good portion of it isn’t really feasible to weaponize. An increasing amount is attack-ready, but for the most part you get to see what will be important at some point in the future. Maybe. For that reason, at least paying attention to the research is important.
But tactically what happens in Vegas is unlikely have any impact on day-to-day operations any time soon. Note that I used the word ‘tactical’, because most of us spend our days fighting fires and get precious few minutes a day – if any – to think strategically about what we need to do tomorrow. Forget about thinking about how to protect against attacks discussed at Black Hat. That’s probably somewhere around 17,502 on the To-Do list.
Of course, if your ethical compass is a bit misdirected or your revenues need to be laundered through 5 banks in 3 countries before the funds hit your account, then the future is now and Black Hat is your business plan for the next few years. But that’s another story for another day.
Posted at Tuesday 3rd August 2010 8:44 pm
(3) Comments •
This hit Slashdot today, and I expect the mainstream press to pick it up fairly soon. Chris Paget will be intercepting cell phone communications at Defcon during a live demonstration.
I suspect this may be the single most spectacular presentation during all of this year’s Defcon and Black Hat. Yes, people will be cracking SCADA and jackpotting ATMs, but nothing strikes closer to the heart than showing major insecurities with the single most influential piece of technology in society. Globally I think cell phones are even more important than television.
Chris is taking some major precautions to stay out of jail. He’s working hand in hand with the Electronic Frontier Foundation on the legal side, and there will be plenty of warnings on-site and no information from any calls recorded or stored.
I suspect he’s setting up a microcell under his control and intercepting communications in a man in the middle attack, but we’ll have to wait until his demo to get all the details.
For years the mobile phone companies have said this kind of interception is impractical or impossible. I guess we’ll all find out this weekend…
Posted at Tuesday 27th July 2010 12:38 am
(1) Comments •
Posted at Monday 26th July 2010 5:16 pm
(1) Comments •
By Adrian Lane
We have covered the internals of token servers and talked about architecture and integration of token services. Now we need to look at some of the different deployment models and how they match up to different types of businesses. Protecting medical records in multi-company environments is a very different challenge than processing credit cards for thousands of merchants.
Central Token Server
The most common deployment model we see today is a single token server that sits between application servers and the back end transaction servers. The token server issues one or more tokens for each instance of sensitive information that it recieves. For most applications it becomes a reference library, storing sensitive information within a repository and providing an index back to the real data as needed. The token service is placed in line with existing transaction systems, adding a new substitution step between business applications and back-end data processing.
As mentioned in previous posts, this model is excellent for security as it consolidates all the credit card data into a single highly secure server; additionally, it is very simple to deploy as all services reside in a single location. And limiting the number of locations where sensitive data is stored and accessed both improves security and reduces auditing, as there are fewer systems to review.
A central token server works well for small businesses with consolidated operations, but does not scale well for larger distributed organizations. Nor does it provide the reliability and uptime demanded by always-on Internet businesses. For example:
- Latency: The creation of a new token, lookup of existing customers, and data integrity checks are computationally complex. Most vendors have worked hard to alleviate this problem, but some still have latency issues that make them inappropriate for financial/point of sale usage.
- Failover: If the central token server breaks down, or is unavailable because of a network outage, all processing of sensitive data (such as orders) stops. Back-end processes that require tokens halt.
- Geography: Remote offices, especially those in remote geographic locations, suffer from network latency, routing issues, and Internet outages. Remote token lookups are slow, and both business applications and back-end processes suffer disproportionately in the event of disaster or prolonged network outages.
To overcome issues in performance, failover, and network communications, several other deployment variations are available from tokenization vendors.
Distributed Token Servers
With distributed token servers, the token databases are copies and shared among multiple sites. Each has a copy of the tokens and encrypted data. In this model, each site is a peer of the others, with full functionality.
This model solves some of the performance issues with network latency for token lookup, as well as failover concerns. Since each token server is a mirror, if any single token server goes down, the others can share its load. Token generation overhead is mitigated, as multiple servers assist in token generation and distribution of requests balances the load. Distributed servers are costly but appropriate for financial transaction processing.
While this model offers the best option for uptime and performance, synchronization between servers requires careful consideration. Multiple copies means synchronization issues, and carefully timed updates of data between locations, along with key management so encrypted credit card numbers can be accessed. Finally, with multiple databases all serving tokens, you increase the number of repositories that must be secured, maintained, and audited increases substantially.
Partitioned Token Servers
In a partitioned deployment, a single token server is designated as ‘active’, and one or more additional token servers are ‘passive’ backups. In this model if the active server crashes or is unavailable a passive server becomes active until the primary connection can be re-established. The partitioned model improves on the central model by replicating the (single, primary) server configuration. These replicas are normally at the same location as the primary, but they may also be distributed to other locations. This differs from the distributed model in that only one server is active at a time, and they are not all peers of one another.
Conceptually partitioned servers support a hybrid model where each server is active and used by a particular subset of endpoints and transaction servers, as well as as a backup for other token servers. In this case each token server is assigned a primary responsibility, but can take on secondary roles if another token server goes down. While the option exists, we are unaware of any customers using it today.
The partitioned model solves failover issues: if a token server fails, the passive server takes over. Synchronization is easier with this model as the passive server need only mirror the active server, and bi-directional synchronization is not required. Token servers leverage the mirroring capabilities built into the relational database engines, as part of their back ends, to provide this capability.
Next we will move on to use cases.
Posted at Monday 26th July 2010 3:08 pm
(0) Comments •
Our last post covered the core functions of the tokenization server. Today we’ll finish our discussion of token servers by covering the externals: the primary architectural models, how other applications communicate with the server(s), and supporting systems management functions.
There are three basic ways to build a token server:
- Stand-alone token server with a supporting back-end database.
- Embedded/integrated within another software application.
- Fully implemented within a database.
Most of the commercial tokenization solutions are stand-alone software applications that connect to a dedicated database for storage, with at least one vendor bundling their offering into an appliance. All the cryptographic processes are handled within the application (outside the database), and the database provides storage and supporting security functions. Token servers use standard Database Management Systems, such as Oracle and SQL Server, but locked down very tightly for security. These may be on the same physical (or virtual) system, on separate systems, or integrated into a load-balanced cluster. In this model (stand-alone server with DB back-end) the token server manages all the database tasks and communications with outside applications. Direct connections to the underlying database are restricted, and cryptographic operations occur within the tokenization server rather than the database.
In an embedded configuration the tokenization software is embedded into the application and supporting database. Rather than introducing a token proxy into the workflow of credit card processing, existing application functions are modified to implement tokens. To users of the system there is very little difference in behavior between embedded token services and a stand-alone token server, but on the back end there are two significant differences. First, this deployment model usually involves some code changes to the host application to support storage and use of the tokens. Second, each token is only useful for one instance of the application. Token server code, key management, and storage of the sensitive data and tokens all occur within the application. The tightly coupled nature of this model makes it very efficient for small organizations, but does not support sharing tokens across multiple systems, and large distributed organizations may find performance inadequate.
Finally, it’s technically possible to manage tokenization completely within the database without the need for external software. This option relies on stored procedures, native encryption, and carefully designed database security and access controls. Used this way, tokenization is very similar to most data masking technologies. The database automatically parses incoming queries to identify and encrypt sensitive data. The stored procedure creates a random token – usually from a sequence generator within the database – and returns the token as the result of the user query. Finally all the data is stored in a database row. Separate stored procedures are used to access encrypted data. This model was common before the advent of commercial third party tokenization tools, but has fallen into disuse due to its lack for advanced security features and failure to leverage external cryptographic libraries & key management services.
There are a few more architectural considerations:
- External key management and cryptographic operations are typically an option with any of these architectural models. This allows you to use more-secure hardware security modules if desired.
- Large deployments may require synchronization of multiple token servers in different, physically dispersed data centers. This support must be a feature of the token server, and is not available in all products. We will discuss this more when we get to usage and deployment models.
- Even when using a stand-alone token server, you may also deploy software plug-ins to integrate and manage additional databases that connect to the token server. This doesn’t convert the database into a token server, as we described in our second option above, but supports communications for distributed systems that need access to either the token or the protected data.
Since tokenization must be integrated with a variety of databases and applications, there are three ways to communicate with the token server:
- Application API calls: Applications make direct calls to the tokenization server procedural interface. While at least one tokenization server requires applications to explicitly access the tokenization functions, this is now a rarity. Because of the complexity of the cryptographic processes and the need for precise use of the tokenization server; vendors now supply software agents, modules, or libraries to support the integration of token services. These reside on the same platform as the calling application. Rather than recoding applications to use the API directly, these supporting modules accept existing communication methods and data formats. This reduces code changes to existing applications, and provides better security – especially for application developers who are not security experts. These modules then format the data for the tokenization API calls and establish secure communications with the tokenization server. This is generally the most secure option, as the code includes any required local cryptographic functions – such as encrypting a new piece of data with the token server’s public key.
- Proxy Agents: Software agents that intercept database calls (for example, by replacing an ODBC or JDBC component). In this model the process or application that sends sensitive information may be entirely unaware of the token process. It sends data as it normally does, and the proxy agent intercepts the request. The agent replaces sensitive data with a token and then forwards the altered data stream. These reside on the token server or its supporting application server. This model minimizes application changes, as you only need to replace the application/database connection and the new software automatically manages tokenization. But it does create potential bottlenecks and failover issues, as it runs in-line with existing transaction processing systems.
- Standard database queries: The tokenization server intercepts and interprets the requests. This is potentially the least secure option, especially for ingesting content to be tokenized.
While it sounds complex, there are really only two functions to implement:
- Send new data to be tokenized and retrieve the token.
- When authorized, exchange the token for the protected data.
The server itself should handle pretty much everything else.
Finally, as with any major application, the token server includes various management functions. But due to security needs, these tend to have additional requirements:
- User management, including authentication, access, and authorization – for user, application, and database connections. Additionally, most tokenization solutions include extensive separation of duties controls to limit administrative access to the protected data.
- Backup and recovery for the stored data, system configuration and, if encryption is managed on the token server, encryption keys. The protected data is always kept encrypted for backup operations.
- Logging and reporting – especially logging of system changes, administrative access, and encryption key operations (such as key rotation). These reports are often required to meet compliance needs, especially for PCI.
In our next post we’ll go into more detail on token server deployment models, which will provide more context for all of this.
Posted at Friday 23rd July 2010 4:22 pm
(0) Comments •
By Mike Rothman
There is nothing like a good old-fashioned mud-slinging battle. As long as you aren’t the one covered in mud, that is. I read about the Death of Snort and started laughing. The first thing they teach you in marketing school is when no one knows who you are, go up to the biggest guy in the room and kick them in the nuts. You’ll get your ass kicked, but at least everyone will know who you are.
That’s exactly what the folks at OISF (who drive the Suricata project) did, and they got Ellen Messmer of NetworkWorld to bite on it. Then she got Marty Roesch to fuel the fire and the end result is much more airtime than Suricata deserves. Not that it isn’t interesting technology, but to say it’s going to displace Snort any time soon is crap. To go out with a story about Snort being dead is disingenuous. But given the need to drive page views, the folks at NWW were more than willing to provide airtime. Suricata uses Snort signatures (for the most part) to drive its rule base. They’d better hope it’s not dead.
But it brings up a larger issue of when a technology really is dead. In reality, there are few examples of products really dying. If you ended up with some ConSentry gear, then you know the pain of product death. But most products are around around ad infinitum, even if they aren’t evolved. So those products aren’t really dead, they just become irrelevant. Take Cisco MARS as an example. Cisco isn’t killing it, it’s just not being used as a multi-purpose SIEM, which is how it was positioned for years. Irrelevant in the SIEM discussion, yes. Dead, no.
Ultimately, competition is good. Suricata will likely push the Snort team to advance their technology faster than in the absence of an alternative. But it’s a bit early to load Snort onto the barbie – even if it is the other white meat. Yet, it usually gets back to the reality that you can’t believe everything you read. Actually you probably shouldn’t believe much that you read. Except our stuff, of course.
Photo credit: “Roasted pig (large)” originally uploaded by vnoel
Posted at Friday 23rd July 2010 2:00 pm
(0) Comments •
By Adrian Lane
A couple weeks ago I was sitting on the edge of the hotel bed in Boulder, Colorado, watching the immaculate television. A US-made 30” CRT television in “standard definition”. That’s cathode ray tube for those who don’t remember, and ‘standard’ is the marketing term for ‘low’. This thing was freaking horrible, yet it was perfect. The color was correct. And while the contrast ratio was not great, it was not terrible either. Then it dawned on me that the problem was not the picture, as this is the quality we used to get from televisions. Viewing an old set, operating exactly the same way they always did, I knew the problem was me. High def has so much more information, but the experience of watching the game is the same now as it was then. It hit me just how much our brains were filling in missing information, and we did not mind this sort of performance 10 years ago because it was the best available. We did not really see the names on the backs of football jerseys during those Sunday games, we just thought we did. Heck, we probably did not often make out the numbers either, but somehow we knew who was who. We knew where our favorite players on the field were, and the red streak on the bottom of the screen pounding a blue colored blob must be number 42. Our brain filled in and sharpened the picture for us.
Rich and I had been discussing experience bias, recency bias, and cognitive dissonance during out trip to Denver. We were talking about our recent survey and how to interpret the numbers without falling into bias traps. It was an interesting discussion of how people detect patterns, but like many of our conversations devolved into how political and religious convictions can cloud judgement. But not until I was sitting there, watching television in the hotel; did I realize how much our prior experiences and knowledge shape perception, derived value, and interpreted results. Mostly for the good, but unquestionably some bad.
Rich also sent me a link to a Michael Shermer video just after that, in which Shermer discusses patterns and self deception. You can watch the video and say “sure, I see patterns, and sometimes what I see is not there”, but I don’t think videos like this demonstrate how pervasive this built in feature is, and how it applies to every situation we find ourself in.
The television example of this phenomena was more shocking than some others that have popped into my head since. I have been investing in and listening to high-end audio products such as headphones for years. But I never think about the illusion of a ‘soundstage’ right in front of me, I just think of it as being there. I know the guitar player is on the right edge of the stage, and the drummer is in the back, slightly to the left. I can clearly hear the singer when she turns her head to look at fellow band members during the song. None of that is really in front of me, but there is something in the bits of the digital facsimile on my hard drive that lets my brain recognize all these things, placing the scene right there in front of me.
I guess the hard part is recognizing when and how it alters our perception.
On to the Summary:
Webcasts, Podcasts, Outside Writing, and Conferences
Favorite Securosis Posts
Other Securosis Posts
Favorite Outside Posts
Project Quant Posts
Research Reports and Presentations
Top News and Posts
Blog Comment of the Week
Remember, for every comment selected, Securosis makes a $25 donation to Hackers for Charity. This week’s best comment goes to Jay Jacobs, in response to FireStarter: an Encrypted Value Is Not a Token.
@Adrian – I must be missing the point, my apologies, perhaps I’m just approaching this from too much of a cryptonerd perspective. Though, I’d like to think I’m not being overly theoretical.
To extend your example, any merchant that wants to gain access to the de-tokenized content, we will need to make a de-tokenization interface available to them. They will have the ability to get at the credit card/PAN of every token they have. From the crypto side, if releasing keys to merchants is unacceptable, require that merchants return ciphertext to be decrypted so the key is not shared… What’s the difference between those two?
Let’s say my cryptosystem leverages a networked HSM. Clients connect and authenticate, send in an account number and get back ciphertext. In order to reverse that operation, a client would have to connect and authenticate, send in cipher text and receive back an account number. Is it not safe to assume that the ciphertext can be passed around safely? Why should systems that only deal in that ciphertext be in scope for PCI when an equivalent token is considered out of scope?
Conversely, how do clients authenticate into a tokenization system? Because the security of the tokens (from an attackers perspective) is basically shifted to that authentication method. What if it’s a password stored next to the tokens? What if it’s mutual SSL authentication using asymmetric keys? Are we just back to needing good key management and access control?
My whole point is that, from my view point, I think encrypting data is getting a bad wrap when the problem is poorly implemented security controls. I don’t see any reason to believe that we can’t have poorly implemented tokenization systems.
If we can’t control access into a cryptosystem, I don’t see why we’d do any better controlling access to a token system. With PCI DSS saying tokenization is “better”, my guess is we’ll see a whole bunch of mediocre token systems that will eventually lead us to realize that hey, we can build just as craptastic tokenization systems as we have cryptosystems.
Posted at Friday 23rd July 2010 6:12 am
(0) Comments •
By Mike Rothman
Now that we’ve been through all the high-level process steps and associated subprocesses for monitoring firewalls, IDS/IPS, and servers; the next step is to start similarly digging into the processes for managing firewalls and IDS/IPS.
But before we begin let’s revisit all the processes and subprocesses for monitoring. We put all the high-level and subprocesses into one graphic, as a central spot for links into each step.
As with all our research, we appreciate any feedback you have on this process and the subprocess steps. It’s critical that we get this right, since we start developing metrics and building a cost model directly from these steps. So if you see something you don’t agree with, or perhaps do things a bit differently, let us know.
Posted at Thursday 22nd July 2010 4:47 pm
(4) Comments •
By Adrian Lane
In our previous post we covered token creation, a core feature of token servers. Now we’ll discuss the remaining behind-the-scenes features of token servers: securing data, validating users, and returning original content when necessary. Many of these services are completely invisible to end users of token systems, and for day to day use you don’t need to worry about the details. But how the token server works internally has significant effects on performance, scalability, and security. You need to assess these functions during selection to ensure you don’t run into problems down the road.
For simplicity we will use credit card numbers as our primary example in this post, but any type of data can be tokenized. To better understand the functions performed by the token server, let’s recap the two basic service requests. The token server accepts sensitive data (e.g., credit card numbers) from authenticated applications and users, responds by returning a new or existing token, and stores the encrypted value when creating new tokens. This comprises 99% of all token server requests. The token server also returns decrypted information to approved applications when presented a token with acceptable authorization credentials.
Authentication is core to the security of token servers, which need to authenticate connected applications as well as specific users. To rebuff potential attacks, token servers perform bidirectional authentication of all applications prior to servicing requests. The first step in this process is to set up a mutually authenticated SSL/TLS session, and validate that the connection is started with a trusted certificate from an approved application. Any strong authentication should be sufficient, and some implementations may layer additional requirements on top.
The second phase of authentication is to validate the user who issues a request. In some cases this may be a system/application administrator using specific administrative privileges, or it may be one of many service accounts assigned privileges to request tokens or to request a given unencrypted credit card number. The token server provides separation of duties through these user roles – serving requests only from only approved users, through allowed applications, from authorized locations. The token server may further restrict transactions – perhaps only allowing a limited subset of database queries.
Although technically the sensitive data doesn’t might not be encrypted by the token server in the token database, in practice every implementation we are aware of encrypts the content. That means that prior to being written to disk and stored in the database, the data must be encrypted with an industry-accepted ‘strong’ encryption cipher. After the token is generated, the token server encrypts the credit card with a specific encryption key used only by that server. The data is then stored in the database, and thus written to disk along with the token, for safekeeping.
Every current tokenization server is built on a relational database. These servers logically group tokens, credit cards, and related information in a database row – storing these related items together. At this point, one of two encryption options is applied: either field level or transparent data encryption. In field level encryption just the row (or specific fields within it) are encrypted. This allows a token server to store data from different applications (e.g., credit cards from a specific merchant) in the same database, using different encryption keys. Some token systems leverage transparent database encryption (TDE), which encrypts the entire database under a single key. In these cases the database performs the encryption on all data prior to being written to disk. Both forms of encryption protect data from indirect exposure such as someone examining disks or backup media, but field level encryption enables greater granularity, with a potential performance cost.
The token server will have bundles the encryption, hashing, and random number generation features – both to create tokens and to encrypt network sessions and stored data.
Finally, some implementations use asymmetric encryption to protect the data as it is collected within the application (or on a point of sale device) and sent to the server. The data is encrypted with the server’s public key. The connection session will still typically be encrypted with SSL/TLS as well, but to support authentication rather than for any claimed security increase from double encryption. The token server becomes the back end point of decryption, using the private key to regenerate the plaintext prior to generating the proxy token.
Any time you have encryption, you need key management. Key services may be provided directly from the vendor of the token services in a separate application, or by hardware security modules (HSM), if supported. Either way, keys are kept separate from the encrypted data and algorithms, providing security in case the token server is compromised, as well as helping enforce separation of duties between system administrators. Each token server will have one or more unique keys – not shared by other token servers – to encrypt credit card numbers and other sensitive data. Symmetric keys are used, meaning the same key is used for both encryption and decryption. Communication between the token and key servers is mutually authenticated and encrypted.
Tokenization systems also need to manage any asymmetric keys for connected applications and devices.
As with any encryption, the key management server/device/functions must support secure key storage, rotation, and backup/restore.
Token storage is one of the more complicated aspects of token servers. How tokens are used to reference sensitive data or previous transactions is a major performance concern. Some applications require additional security precautions around the generation and storage of tokens, so tokens are not stored in a directly reference-able format. Use cases such as financial transactions with either single-use or multi-use tokens can require convoluted storage strategies to balance security of the data against referential performance. Let’s dig into some of these issues:
- Multi-token environments: Some systems provide a single token to reference every instance of a particular piece of sensitive data. So a credit card used at a specific merchant site will be represented by a single token regardless of the number of transactions performed. This one to one mapping of data to token is easy from a storage standpoint, but fails to support some business requirements. There are many use cases for creating more than one token to represent a single piece of sensitive data, such asnonymizing patient data across different medical record systems and credit cards used in multiple transactions with different merchants. Most token servers support the multiple-token model, enabling an arbitrary number of tokens to map to a given piece of data entity.
- Token lookup: Looking up a token in a token server is fairly straightforward: the sensitive data acts as the primary key by which data is indexed. But as the stored data is encrypted, incoming data must first be encrypted prior to performing the lookup. For most systems this is fast and efficient. For high volume servers used for processing credit card numbers the lookup table becomes huge, and token references take significant time to process. The volatility of the system makes traditional indexing unrealistic, so data is commonly lumped together by hash, grouped by merchant ID or some other scheme. In the worst case the token does not exist and must be created. The process is to encrypt the sensitive data, perform the lookup, create a new token if one does not already exist, and (possibly) perform token validation (e.g., LUHN checks). Since not all schemes work well for each use case, you will need to investigate whether the vendor’s performance is sufficient for your application. This is a case where pre-generated sequences or random numbers are used for their performance advantage over tokens based upon hashing or encryption.
- Token collisions: Token servers deployed for credit card processing have several constraints: they must keep the same basic format as the original credit card, expose the real last four digits, and pass LUHN checks. This creates an issue, as the number of tokens that meet these criteria are limited. The number of LUHN-valid 12-digit numbers creates a high likelihood of the same token being created and issued – especially in multi-token implementations. Investigate what precautions your vendor takes to avoid or mitigate token collisions.
In our next post we will discuss how token servers communicate with other applications, and the supporting IT services they rely upon.
Posted at Thursday 22nd July 2010 1:18 pm
(0) Comments •
Alex Hutton has a wonderful must-read post on the Verizon security blog on Evidence Based Risk Management.
Alex and I (along with others including Andrew Jaquith at Forrester, as well as Adam Shostack and Jeff Jones at Microsoft) are major proponents of improving security research and metrics to better inform the decisions we make on a day to day basis. Not just generic background data, but the kinds of numbers that can help answer questions like “Which security controls are most effective under XYZ circumstances?”
You might think we already have a lot of that information, but once you dig in the scarcity of good data is shocking. For example we have theoretical models on password cracking – but absolutely no validated real-world data on how password lengths, strengths, and forced rotation correlate with the success of actual attacks. There’s a ton of anecdotal information and reports of password cracking times – especially within the penetration testing community – but I have yet to see a single large data set correlating password practices against actual exploits.
I call this concept outcomes based security, which I now realize is just one aspect/subset of what Alex defines as Evidence Based Risk Management.
We often compare the practice of security with the practice of medicine. Practitioners of both fields attempt to limit negative outcomes within complex systems where external agents are effectively impossible to completely control or predict. When you get down to it, doctors are biological risk managers. Both fields are also challenged by having to make critical decisions with often incomplete information. Finally, while science is technically the basis of both fields, the pace and scope of scientific information is often insufficient to completely inform decisions.
My career in medicine started in 1990 when I first became certified as an EMT, and continued as I moved on to working as a full time paramedic. Because of this background, some of my early IT jobs also involved work in the medical field (including one involving Alex’s boss about 10 years ago). Early on I was introduced to the concepts of Evidence Based Medicine that Alex details in his post.
The basic concept is that we should collect vast amounts of data on patients, treatments, and outcomes – and use that to feed large epidemiological studies to better inform physicians. We could, for example, see under which circumstances medication X resulted in outcome Y on a wide enough scale to account for variables such as patient age, gender, medical history, other illnesses, other medications, etc.
You would probably be shocked at how little the practice of medicine is informed by hard data. For example if you ever meet a doctor who promotes holistic medicine, acupuncture, or chiropractic, they are making decisions based on anecdotes rather than scientific evidence – all those treatments have been discredited, with some minor exceptions for limited application of chiropractic… probably not what you used it for.
Alex proposes an evidence-based approach – similar to the one medicine is in the midst of slowly adopting – for security. Thanks to the Verizon Data Breach Investigations Report, Trustwave’s data breach report, and little pockets of other similar information, we are slowly gaining more fundamental data to inform our security decisions.
But EBRM faces the same near-crippling challenge as Evidence Based Medicine. In health care the biggest obstacle to EBM is the physicians themselves. Many rebel against the use of the electronic medical records systems needed to collect the data – sometimes for legitimate reasons like crappy software, and at other times due to a simple desire to retain direct control over information. The reason we have HIPAA isn’t to protect your health care data from a breach, but because the government had to step in and legislate that doctors must release and share your healthcare information – which they often considered their own intellectual property.
Not only do many physicians oppose sharing information – at least using the required tools – but they oppose any restrictions on their personal practice of medicine. Some of this is a legitimate concern – such as insurance companies restricting treatments to save money – but in other cases they just don’t want anyone telling them what to do – even optional guidance. Medical professionals are just as subject to cognitive bias as the rest of us, and as a low-level medical provider myself I know that algorithms and checklists alone are never sufficient in managing patients – a lot of judgment is involved.
But it is extremely difficult to balance personal experience and practices with evidence, especially when said evidence seems counterintuitive or conflicts with existing beliefs.
We face these exact same challenges in security:
- Organizations and individual practitioners often oppose the collection and dissemination of the raw data (even anonymized) needed to learn from experience and advance based practices.
- Individual practitioners, regulatory and standards bodies, and business constituents need to be willing to adjust or override their personal beliefs in the face of hard evidence, and support evolution in security practices based on hard evidence rather than personal experience.
Right now I consider the lack of data our biggest challenge, which is why we try to participate as much as possible in metrics projects, including our own. It’s also why I have an extremely strong bias towards outcome-based metrics rather than general risk/threat metrics. I’m much more interested in which controls work best under which circumstances, and how to make the implementation of said controls as effective and efficient as possible.
We are at the very beginning of EBRM. Despite all our research on security tools, technologies, vulnerabilities, exploits, and processes, the practice of security cannot progress beyond the equivalent of witch doctors until we collectively unite behind information collection, sharing, and analysis as the primary sources informing our security decisions.
Seriously, wouldn’t you really like to know when 90-day password rotation actually reduces risk vs. merely annoying users and wasting time?
Posted at Wednesday 21st July 2010 6:07 pm
(8) Comments •
By Mike Rothman
Now that we’ve decomposed each steps in the Monitoring process with gory detail, we need to wrap things up by talking about monitoring and maintaining device health. The definition of a device varies – depending on your ability to invest in tools you might have all sorts of different methods for collecting, storing, and analyzing events/logs.
One of the most commonly overlooked aspects of implementing a monitoring process is the effort required to actually keep things up and running. It includes stuff like up/down checks, patching, and upgrading hardware as necessary. All these functions take time, and if you are trying to really understand what it costs to monitor your environment, they must be modeled and factored into your analysis.
Here we make sure our equipment is operational and working in peak form. OK, perhaps not peak form, but definitely collecting data. Losing data has consequences for the monitoring process so we need to ensure all collectors, aggregators, and analyzers are operating – as well as patched and upgraded to ensure reliability at our scale.
This process is pretty straightforward, so let’s go through it:
- Availability checking: As with any other critical IT device you need to check availability. Is it up? You likely have an IT management system to do this up/down analysis, but there are also low-cost and no-cost ways to check availability of devices (it’s amazing what you can do with a scripting language…).
- Test data collection: The next step is to make sure the data being collected is as expected. You should have spot checks scheduled on a sample set of collectors to ensure collection works as required. You defined test cases during the Collect step, which you can leverage on an ongoing basis to ensure the accuracy and integrity of collected data.
- Update/Patch Software: The collectors run some kind of software that runs on some kind of operating system. That operating system needs to be updated every so often to address security issues, software defects, and other issues. Obviously if the collector is a purpose-built device (appliance), you may not need to specifically patch the underlying OS. At times you’ll also need to update the collection application, which is included in this step. We explored this involved process in Patch Management Quant, so we won’t go into detail again here.
- Upgrade hardware: Many of the monitoring systems nowadays use hardware-based collectors and purpose-built appliances for data aggregation, storage, and analysis. Thanks to Moore’s Law and the exponentially increasing amount of data we have to deal with, every couple years your monitoring devices hit the scalability wall. When this happens you need to research and procure new hardware, and then install it with minimum downtime for the monitoring system.
Device Type Variances
Many firewalls and IDS/IPS are deployed as appliances, meaning you will manage a separate collection mechanism for them. Many of the server monitoring techniques involve installing agents on the devices, and those may need to be patched and updated.
For this research we are talking specifically about monitoring, so we aren’t worried about keeping the actual devices up and running here (we deal with that in the NSOQ Manage process for Firewalls and IDS/IPS) – only about maintaining the collection devices. So depending on your collection architecture, you should not have much variation across the collection and analysis devices.
We go into such gory detail on these processes is because someone has to do the work. Most organizations don’t factor health monitoring & maintenance into their analysis, which skews the cost model. If you use an outsourced monitoring service you can be sure the service provider is maintaining the collectors, so you need to weigh those costs to get an apples-to-apples comparison of build vs. buy. Through the Network Security Operations Quant process we’re attempting to provide the tools to understand the cost of monitoring your environment. This will help you streamline operations and reduce cost, or make a more informed decision on whether outsourcing is right for your organization.
Posted at Wednesday 21st July 2010 2:13 pm
(3) Comments •
By Mike Rothman
Back when I went to sleepaway camp as a kid I always looked forward to Visiting Day. Mostly for the food, because after a couple weeks of camp food anything my folks brought up was a big improvement. But I admit it was great to see the same families year after year (especially the family that brought enough KFC to feed the entire camp) and to enjoy a day of R&R with your own family before getting back to the serious business of camping.
So I was really excited this past weekend when the shoe was on the other foot, and I got to be the parent visiting XX1 at her camp. First off I hadn’t seen the camp, so I had no context when I saw pictures of her doing this or that. But most of all, we were looking forward to seeing our oldest girl. She’s been gone 3 weeks now, and the Boss and I really missed her.
I have to say I was very impressed with the camp. There were a ton of activities for pretty much everyone. Back in my day, we’d entertain ourselves with a ketchup cap playing a game called Skully. Now these kids have go-karts, an adventure course, a zipline (from a terrifying looking 50 foot perch), ATVs and dirt bikes, waterskiing, and a bunch of other stuff. In the arts center they had an iMac-based video production and editing rig (yes, XX1 starred in a short video with her group), ceramics (including their own wheels and kiln), digital photography, and tons of other stuff. For boys there was rocketry and woodworking (including tabletop lathes and jigsaws). Made me want to go back to camp. Don’t tell Rich and Adrian if I drop offline for couple weeks, okay?
Everything was pretty clean and her bunk was well organized, as you can see from the picture. Just like her room at home…not! Obviously the counselors help out and make sure everything is tidy, but with the daily inspections and work wheel (to assign chores every day), she’s got to do her part of keeping things clean and orderly. Maybe we’ll even be able to keep that momentum when she returns home.
Most of all, it was great to see our young girl maturing in front of our eyes. After only 3 weeks away, she is far more confident and sure of herself. It was great to see. Her counselors are from New Zealand and Mexico, so she’s gotten a view of other parts of the world and learned about other cultures, and is now excited to explore what the world has to offer. It’s been a transformative experience for her, and we couldn’t be happier.
I really pushed to send her to camp as early as possible because I firmly believe kids have to learn to fend for themselves in the world without the ever-present influence of their folks. The only way to do that is away from home. Camp provides a safe environment for kids to figure out how to get along (in close quarters) with other kids, and to do activities they can’t at home. That was based on my experience, and I’m glad to see it’s happening for my daughter as well. In fact, XX2 will go next year (2 years younger than XX1 is now) and she couldn’t be more excited after visiting.
But there’s more! An unforeseen benefit of camp accrues to us. Not just having one less kid to deal with over the summer – which definitely helps. But sending the kids to camp each summer will force us (well, really the Boss) to let go and get comfortable with the reality that at some point our kids will grow, leave the nest, and fly on their own. Many families don’t deal with this transition until college and it’s very disruptive and painful. In another 9 years we’ll be ready, because we are letting our kids fly every summer. And from where I sit, that’s a great thing.
Photo credits: “XX1 bunk” originally uploaded by Mike Rothman
Recent Securosis Posts
Wow. Busy week on the blog. Nice.
- Pricing Cyber-Policies
- FireStarter: An Encrypted Value is Not a Token!
- Tokenization: The Tokens
- Comments on Visa’s Tokenization Best Practices
- Friday Summary: July 15, 2010
- Tokenization Architecture – The Basics
- Color-blind Swans and Incident Response
- Home Business Payment Security
- Simple Ideas to Start Improving the Economics of Cybersecurity
- Various NSO Quant Posts on the Monitor Subprocesses:
Incite 4 U
We have a failure to communicate! – Chris makes a great point on the How is that Assurance Evidence? blog about the biggest problem we security folks face on a daily basis. It ain’t mis-configured devices or other typical user stupidity. It’s our fundamental inability to communicate. He’s exactly right, and it manifests in the lack of having any funds in the credibility bank, obviously impacting our ability to drive our security agendas. Holding a senior level security job is no longer about the technology. Not by a long shot. It’s about evangelizing the security program and persuading colleagues to think security first and to do the right thing. Bravo, Chris. Always good to get a reminder that all the security kung-fu in the world doesn’t mean crap unless the business thinks it’s important to protect the data. – MR
Cyber RF – I was reading Steven Bellovin’s post on Cyberwar, and the only thing that came to mind was Sun Tsu’s quote, “Victorious warriors win first and then go to war, while defeated warriors go to war first and then seek to win.” Don’t think I am one of those guys behind the ‘Cyberwar’ bandwagon, or who likes using war metaphors for football – this subject makes me want to gag. Like most posts on this subject, there is an interesting mixture of stuff I agree with, and an equal blend of stuff I totally disagree with. But the reason I loathe the term ‘Cyberwar’ finally dawned on me: it’s not war – it’s about winning through trickery. It’s about screwing someone over for whatever reason. It’s about stealing, undermining, propagandizing, damaging and every other underhanded trick you use before you do something else underhanded. The term ‘Cyberwar’ creates a myopic over-dramatization that conjures images of guns, bombs, and dolphins with lasers strapped to their heads, when it’s really about getting what you want – whatever that may be. I prefer the term ‘Cyber – Ratfscking’, from a root term coined by Nixon staffers and perfected under the W administration. Sure, we could use plain old terms like ‘war’, ‘espionage’, and ‘theft, but they do not capture the serendipity of old tricks in a new medium. And I really don’t think the threats have been exaggerated at all, because stealing half a billion dollars in R&D from a rival nation, or changing the outcome of an election, is incredibly damaging and/or useful. But focusing on ‘war’ removes the stigma of politics from the discussion, and makes it sound like a military issue when it’s a more generalized iteration of screwing over your fellow man. – AL
The SLA hammer hits the thumb – I once received a fair bit of guff over stating that your only consistent cloud computing security control is a good, well-written contract with enforceable service level agreements. It turns out even that isn’t always enough – at least if you are in Texas and hosting with IBM. Big Blue is about to lose an $863M contract with the state of Texas due to a string of massive failures. This was a massive project to merge 28 state agencies into two secure data, centers which has been nothing but a nightmare for the agencies involved. But what the heck, the 7-year contract started in 2006 and it only took 4 years to reach the “we really mean it this time” final 30-day warning. Needless to say, I have a Google alert set for 30 days from now to see what really happens. – RM
Defining risk – Jack Jones puts up an interesting thought generator when he asks “What is a risk anyway?” This is a reasonable question we should collectively spend more time on. Risk is one of those words that gets poked, prodded, and manipulated in all sorts of ways for all sorts of purposes. The term is so muddled that no one really knows what it means. But we are expected to reduce, transfer, or mitigate risk systematically, in a way that can easily be substantiated for our auditors. No wonder we security folks are a grumpy bunch! How the hell can we do that? Jack has some ideas but mostly it’s about not trying to “characterize risks in terms of likelihood or consequence” (both of which are subjective), and focus on getting the terminology right. Good advice. – MR
No SCADA to see here – Almost any time I post something on SCADA security, someone who works in that part of the industry responds with, “there’s no problem – our systems are all proprietary and bad guys can’t possibly figure out how to shut-the-grid-down/trigger-a-flood/blow-up-a-manufacturing-plant. Not every SCADA engineer thinks like that, but definitely more than we’d like (zero would be the right number). I wonder how they feel about the new Windows malware variant that spreads via USB, and appears to target a specific SCADA system? Not that this attack is worth a 60 Minutes special, but it is yet another sign that someone seems to be targeting our infrastructure – or at minimum performing reconnaissance to learn how to break it. – RM
Buy that network person a beer – As an old networking guy, it’s a little discouraging to see the constant (and ever-present) friction between the security and networking teams. But that’s not going to change any time soon, so I have to accept it. Branden Williams makes a great point about how VLANs (and network segmentation in general) can help reduce scope for PCI – excellent for the security folks. Obviously the devil’s in the details, but clearly you have to keep devices accessing PAN on a separate network, which could mean a lot of things. But less scope is good, so if you don’t have a good relationship with the network team maybe it’s time to fix that. You should make a peace offering. I hear network folks like beer. Or maybe that was just me. – MR
Warm and fuzzy – The Microsoft blog had an article on Writing Fuzzable Code a couple weeks back that I am still trying to wrap my head around. OK, so fuzzing is an art when done right. Sure, to the average QA tester it just looks like you are hurling garbage at the application with a perverse desire to crash it – perhaps so you can heckle the programming team for their incompetence. Seriously, it’s a valuable approach to security testing and a wonderful way to flush out bad programming assumptions and execution. But the Man-in-the-middle approach they discuss is a bit of an oddball. A large percentage of firms capture network activity and replay those sessions with altered parameters and commands for fuzzing and stress testing. Sure, modification of data on the fly is an interesting way to create dynamic tests and keep the test cases up to date, but I am not certain there is enough value to justify fuzzing both producer and the consumer as part of a single test. I am still unsure whether their goal was to harden the QA scripts or the communication protocols between two applications. Or perhaps the answer is both. This scenario creates a real-world debugging problem, though – transaction processing communications can get out of synch and crash at some indeterminate time later. The issue may be due to a transaction processing error, the communication dialog, or a plain old unhandled exception. I guess my point is that this seems to save time in test case generation at the expense of being much more difficult to debug. If anyone out there has real-world experience with this form of testing (either inside or outside Microsoft) I would love to hear about your experiences. I guess Microsoft decided on the more thorough (but difficult) test model, but I’m afraid that in most cases the problems will multiply fast, and the advantage in thoroughness (over testing the producer and consumer sides separately) is not enough to justify the inevitable debugging problems. And I’m afraid that for most organizations this level of ambition will make the whole fuzzing process miserable and substantially less useful. – AL
Clarifying the final rule – Thanks to HIPAA, healthcare is one of the anchor verticals for security, so I was surprised to see very little coverage of HHS’ issuance of the final rule for meaningful use. Ed over at SecurityCurve did the legwork and has two posts (Part 1 & Part 2) clarifying what it means and what it doesn’t. The new rules are really about electronic health records (EHR), and HHS has basically declared that the existing HIPAA guidelines are sufficient. They are mandating somewhat better assessment and risk management processes, but that seems pretty squishy. Basically it gets back to enforcement. EHR is a huge can of security worms waiting to be exploited, and unless there is a firm commitment to make examples of some organizations playing fast and loose with EHR, this is more of a ho-hum. But if they do, we could finally get some forward motion on healthcare security. – MR
Posted at Wednesday 21st July 2010 6:59 am
(2) Comments •
By Mike Rothman
In our last Network Security Operations Quant post we discussed the Analyze step. Its output is an alert, which means some set of conditions have been met which may indicate an attack. Great – that means we need to figure out whether there is a real issue or not. That’s the Validate subprocess. Once an alert is validated as an attack, someone will need to deal with it, so we need the Escalate step. These two subprocesses are interdependent so it makes more sense to deal with them together in one post.
In this step you work to understand what happened to generate the alert, assess the risk to your organization, and consider ways to remediate the issue. Pretty straightforward, right? In concept yes, but in reality generally not. Validation requires security professionals to jump into detective mode to piece together the data and build a story about the attack. Think CSI, but for real and without a cool lab. Add the pressure of a potentially company-damaging breach and you can see that validation is not for the faint of heart.
Let’s jump into the subprocesses and understand how this is done. Keep in mind that every security professional may go through these steps in a different order – depending on systems, instrumentation, preferences, and skill set.
- Alert reduction: The first action after an alert fires is to verify the data. Does it make sense? Is it a plausible attack? You need to eliminate the obvious false positives (and then get that feedback to the team generating alerts), basically applying a sniff test to the alert. A typical attack touches many parts of the technology infrastructure, so you’ll have a number of alerts triggered by the same attack. Part of the art of incident investigation is to understand which alerts are related and which are not. Mechanically, once you figure out which alerts may or may not be related, you’ve got to merge the alerts in whatever system you’re using to track the alerts – even if it’s Excel.
- Identify root cause: If you have a legitimate alert, now you need to dig into the specific device(s) under attack and begin forensic analysis, to understand what is happening and the extent of the damage at the device level. This may involve log analysis, configuration checks, malware reverse engineering, memory analysis, and a host of other techniques. The focus of this analysis is to establish the root cause of the attack, so you can start figuring out what is required to fix it – whether it’s a configuration change, firewall or IDS/IPS change, a workaround, or something else. Feedback on the effectiveness of the alert – and how to make it better – then goes back to whoever manages the monitoring rules & policies.
- Determine extent of compromise: Now that you understand the attack, you need to figure out if this was an isolated situation or whether you have badness proliferating through the environment by means analyzing other devices for indications of a similar attack. This can be largely automated with scanners, but not entirely. When you find another potentially compromised device you validate once again – now much quicker, since you know what you’re looking for.
- Document: As with the other steps, documentation is critical to make sure other folks know what’s happened, that the ops teams know enough to fix the issue, and that your organization can learn from the attack (post-mortem). So here you’ll close the alert and write up the findings in sufficient detail to inform other folks of what happened, how you suggest they fix it, and what defenses need to change to make sure this doesn’t happen again.
Large Company vs. Small Company Considerations
Finding the time to do real alert validation is a major challenge, especially for smaller companies that don’t have resources or expertise for heavy malware or log analysis. The best advice we can give is to have a structured and repeatable process for validating alerts. That means defining the tools and the steps involved in investigating an alert in pretty granular detail. Why go to this level? Because a documented process will help you figure out when you are in deep water (over your head on the forensic analysis), and help you decide whether to continue digging or just re-image the machine/device and move on. Maybe it’s clear some kind of Trojan got installed on an endpoint device. Does it matter which one, and what it does to the device? Not if you are just going to re-image it and start over. Of course that doesn’t enable you to really understand how your defenses need to change to defend against this successful attack, but you should at least be able to update rules and policies to more quickly identify the effects of the attack next time, even if you don’t fully understand the root cause.
Larger companies also need this level of documentation for how alerts are validated because they tend to have tiers of analysts. The lower-level analysts run through a series of steps to try to identify the issue. If they don’t come up with a good answer, they pass the question along to a group of more highly trained analysts who can dig deeper. Finally, you may have a handful of Ninja-type forensic specialists (but hopefully at least one) who tackle the nastiest stuff. The number of levels doesn’t matter, just that you have the responsibilities and handoffs between tiers defined.
Now that you’ve identified the attack and what’s involved, it needs to be fixed. This is what escalation is about. Every company has a different idea of who needs to do what, so the escalation path is a large part of defining your policies. Don’t forget the criticality of communicating these policies and managing expectations around responsiveness. Make sure everyone understands how and when something will fall into their lap, and what to do when it happens.
The Escalate subprocess includes:
- Open trouble ticket: You need a mechanism for notifying someone of their task/responsibility. That seems obvious but many security processes fail because separate teams don’t communicate effectively, and plenty of things falls through the cracks. We aren’t saying you need a fancy enterprise-class help desk system, but you do need some mechanism to track what’s happening, who is responsible, and status – and to eventually close out issues. Be sure to provide enough information in the ticket to ensure the responsible party can do their job. Coming back to you over and over again to get essential details is horribly inefficient.
- Route appropriately: Once the ticket is opened it needs to be sent somewhere. The routing rules & responsibilities are defined (and agreed upon) in the Planning phase, so none of this should be a surprise. You find the responsible party, send them the information, and follow up to make sure they got the ticket and understand what needs to be done. Yes, this step can be entirely automated with those fancy (or even not-so-fancy) help desk systems.
- Close alert: Accountability is critical to the ultimate success of any security program. So if the security team just sends an alert over the transom to the ops team and then washes their hands, we assure you things will fall between the cracks. The security team needs to follow up and ensure each issue is taken to closure. Again, a lot of this process can be automated, but we’ve found the most effective solution is to make sure someone’s behind is on the line for making sure the alert gets closed.
Large Company vs. Small Company Considerations
The biggest differences you’ll see in this step between large and small companies is the number of moving pieces. Smaller companies tend to have a handful of security/ops folks at best, so there isn’t a lot of alert handoff & escalation, because in most cases the person doing the analysis is fixing the issue and then probably tuning the monitoring system as well.
Larger companies tend to have lots of folks involved in remediation and the like – so in this case documentation, managing expectations, and follow-up are imperative to successfully closing any issue.
But we don’t want to minimize the importance of documentation when closing tickets – even at smaller companies – because in some cases you’ll need to work with external parties or even auditors. If you (or another individual) are responsible for fixing something you validated above, we still recommend filling out ticket and documenting the feedback on rule changes (even if you’re effectively writing notes to yourself). We realize this is extra work, but it’s amazing how quickly the details fade – especially when you are dealing with many different issues every day – and this documentation helps ensure you don’t make the same mistake twice.
So that’s our Monitoring process, all eight steps in gory detail: Enumerate/Scope, Define Policies, Collect/Store, Analyze, and Validate/Escalate. But we aren’t done yet. However you decide to monitor your environment, some gear is involved. You need to maintain that gear, and that takes time and costs money. We need to model that as well, so you understand what it really costs to monitor your environment.
Posted at Tuesday 20th July 2010 2:05 pm
(2) Comments •