As we discussed previously in The Trouble with WAFs, there are many reasons WAFs frustrate both security and application developers. But thanks to the ‘gift’ of PCI, many organizations have a WAF in-house, and now they want to use it (more) effectively. Which is a good thing, by the way. We also pointed out that many of the WAF issues our research has discovered were not problems with technology. There is entirely too much failure to effectively manage WAF.
So your friends at Securosis will map out a clear and pragmatic 3-phase approach to WAF management. Now for the caveats. There are no silver bullets. Not profiling apps. Not integration with vulnerability reporting and intelligence services. Not anything. Effectively managing your WAF requires an ongoing and significant commitment. In every aspect of the process, you will see the need to revisit everything, over and over again. We live in a dynamic world – which means a static ruleset won’t cut it. The sooner you accept that, the sooner you can achieve a singularity with your WAF. We will stop preaching now.
Manage Policies
At a high level you need to think of the WAF policy/rule base as a living, breathing entity. Applications evolve and change – typically on a daily basis – so WAF rules also need to evolve and change in lockstep. But before you can worry about evolving your rule base, you need to build it in the first place. We have identified 3 steps for doing that:
- Baseline Application Traffic: The first step in deploying a WAF is usually to let it observe your application traffic during a training period, so it can develop a reference baseline of ‘normal’ application behavior for all the applications on your network. This initial discovery process and associated baseline provides the basis for the initial ruleset, basically a whitelist of acceptable actions for each application.
- Understand the Application: The baseline represents the first draft of your rules. Then you apply a large dose of common sense to see which rules don’t make sense and what’s missing. You can do this by building threat models for dangerous edge cases and other situations to ensure nothing is missed.
- Protect against Attacks: Finally you will want to address typical attack patterns. This is similar to how an Intrusion Prevention System works at the network layer. This will block common but dangerous attacks such as SQLi and XSS.
Now you have your initial rule set, but it’s not time for Tetris yet. This milestone is only the beginning. We will going into detail on the issues and tradeoffs of policy management later in this series – for now we just want to capture the high-level approach. You need to constantly revisit the ruleset – both to deal with new attacks (based on what you get from your vendor’s research team and public vulnerability reporting organizations such as CERT), and to handle application changes. Which makes a good segue to the next step.
Application Lifecycle Integration
Let’s be candid – developers don’t like security folks, and vice-versa. Sure that’s a generalization, but it’s generally true. Worse, developers don’t like security tools that barrage them with huge amounts of stuff they’re supposed to fix – especially when the ‘spam’ includes many noisy inconsequential issues and/or totally bogus results. The security guy wielding a WAF is an outsider, and his reports are full of indigestible data, so they are likely to get stored in the circular file. It’s not that developers don’t believe there are issues – they know there’s tons of stuff that ought to be fixed, because they have been asked many times to take shortcuts to deliver code on deadline. And they know the backlog of functional stuff they would like to fix – over and above the threats reported by the WAF, dynamic app scans, and pen testers – is simply to large to deal with. Web-borne threat? Take a number.
Security folks wonder why the developers can’t build secure code, and developers feel security folks have no appreciation of their process or the pressure to ship working code. We said “working code” – not necessarily secure code, which is a big part of the problem. Now add Operations into the mix – they are responsible for making sure the systems run smoothly, and they really don’t want yet another system to manage on their network. They worry about performance, failover, ease of management and – at least as much as developers do – user experience.
This next step in the WAF management process involves collaboration between the proverbial irresistible force and immovable object to protect applications. Communication between groups is a starting point – providing filtered, prioritized, and digestible information to dev-ops is another hurdle to address. Further complicating matters are evolving development processes, various new development tools, and application deployment practices, which WAF products need to integrate with. Obviously you work with the developers to identify and eliminate security defects as early in the process as possible. But the security team needs to be realistic – adversely impacting a developer’s work process can have a dramatic negative impact on the quality and amount of code that gets shipped. And nobody likes that.
We have identified a set of critical success factors for integrating with the DLC (development lifecycle):
- Executive Sponsorship: If a developer can say ‘no’ to the security team, at some point they will. Either security is important or it isn’t. To move past a compliance WAF, security folks need the CIO or CEO to agree that the velocity of feature evolution must give way to addressing critical security flaws. Once management has made that commitment, developers can justify improving security as part of their job.
- Establish Expectations: Agree on what makes a critical issue, and how critical issues will be addressed among the pile of competing critical requirements. Set guidelines in advance so there are no arguments when issues arise.
- Security/Developer Integration Points: There need to be logical (and documented) steps within the application lifecycle for security checks, assessments, etc. This includes design and development integration points, as well as pre-launch validation steps with operations. Without formally security involvement, inevitably these steps will be skipped – especially at crunch time. Go back to point 1 above.
- Automation: Even with executive sponsorship and clearly documented integration points, without sufficient automation of security functions (scanning, testing, regression, etc.), inevitably things will get skipped, especially at crunch time. Are you seeing a pattern here? The person managing the WAF needs to filter out the noise before it gets to the development team, and provide actionable information.
- Feedback Loops: Finally, you need to be able to adapt both the application and the process based on feedback at all points in the collaboration. Nothing turns off either side faster than feeling unappreciated. Sure, that’s soft and straight out of a new-age success manual, but it’s still true.
If you find yourself lacking in any of those aspects, your likelihood of keeping your WAF current and effective is remote.
Securing the WAF
As we described in the series overview, the third part of the WAF Management process is to actually protect the WAF. We are talking both about setting up the device properly, and setting it up to thwart evasion techniques and avoid information leakage. Keep these security considerations in mind:
- Securing the Device: Unplug the device and bad things happen, right? Obviously controlling physical access is a must – as with any computing device. Increasingly you will also need to protect the WAF against a variety of denial of service (DoS) tactics that can render it useless.
- Provisioning/Managing Entitlements: Once the device is secure you need to ensure that only authorized folks can mess around with the ruleset and manage the device. You could look at technology like Privileged User Monitoring or just aggressively manage entitlements to the devices, but you need to ensure a savvy attacker cannot slip an ‘any-to-any’ rule into your device.
- Preventing Evasion: You need to pay attention to network architecture to ensure an attacker can’t just bypass the WAF to get a direct pipe into the application. WAF evasion is an emerging area of interest for security researchers, which means you need to stay abreast of new tactics to ensure your WAF has the opportunity to do its job.
- Rule Obscurity: Yes, security folks grimace when they hear the term ‘obscurity’, but know that sophisticated attackers will poke and prod your applications before trying to break in. They will use all sorts of tactics to learn as much about your WAF as they can, so they can understand how to evade it.
This isn’t an exhaustive list but it’s a start, with a few of the considerations for securing a WAF. But it doesn’t touch on the importance of WAF deployment architecture to all of the above. For example, implementing a managed WAF service takes the DoS attack out of play. Actually, it doesn’t eliminate the risk – but it becomes the service provider’s problem. There are also differences in performance characteristics and the provisioning process, depending on whether the device is on-premise or hosted by a service provider.
Another key to securing the WAF is ongoing testing to ensure that nothing changes unexpectedly. Just as the applications you protect are dynamic, so is the product used to protect it. Part of this process must be an ongoing commitment to penetration testing both the application and WAF. We will touch on all this in detail later in this series.
As you can see from the illustration, each step in the process spawns other ongoing processes. How better to build a process to protect applications than with nested loops and recursion? Okay, we’re kidding, but the end result is the same – every aspect of managing WAFs is an ongoing process. This is totally the antithesis of set it and forget it technology. And that is really the point of this series. To get optimal value from your WAF, you need to go in with your collective eyes open to the amount of effort required to both get and keep the WAF running.
Our next post will dig into the first step in the process: Manage Policies.
Reader interactions
3 Replies to “Pragmatic WAF Management: the WAF Management Process”
Dear Adrian:
1:00PM EST on 27th works well… Please send me a calendar invite.
On a separate note, and in relation to the blog entry:
You may want to re-read my comment and re-orient your reply :-), because my comment has nothing to do with complexities of natural-language parsing.
As a matter of fact, no-where in my statement am I do I talk about “natural language parsing” which is a completely different subject. Perhaps I should re-state the thrust:
I am examining WAFs using first-principles of state-machines automata that make up the detection engines today. WAFs as they are developed today are no more complex than a set of Push-down Automata with a selection operator. In other words, it’s a bunch of regular language parsers (regular expressions matchers) strung together with a selector logic. They hover somewhere between PDA and CF within the Chomsky Language Hierarchy. The CLH being:
• Regular Language — Push-down Automata
• Context-Free Language — PDA with Stack
• Context Sensitive Languages
• Turing Machines.
and subclasses in between.
Cyber attacks are CSL or TM deltas. That is to say they are complex stringlets (partial sentential forms) that cause the web-app software to jump into hidden (unintended) states or pass through to a valid state through an unintended path.
Now the issue with WAFs detection is that they are built using a set of regular expressions. Or that their programming can be mathematically reduced to this set. However, what they are trying to detect is dimensionally above them.
It’s analogous trying to guess the volume of a hyper-cube (more than 3 dimensions) by only examining its 2d shadow. It ain’t gonna work.
Worse yet, there is a theoretical limit on how many attacks you can classify at BEST using this approach. Which is that 70.7% upper limit!
Which means that all WAFs will have a guaranteed miss rate of 29.3% (~30%). That’s like having a safe (6 sided cube) with two sides missing and putting your valuables inside it.
Now I KNOW that YOU KNOW the difference between State-machine, Language Hierarchy, Automata theory etc.; and Natural Language Parsing (e.g. parsing English Language); however your comment to my comment may reflect otherwise to readers. I thought I’d just clarify between us so that you can rehash your reply with this further clarification.
I am happy to post parts of this clarification, as and when you see fit.
BTW this is what LangSec is: The study of cyber-security using the first-principles of Automata theory, and my work on Generalized Model of Cyber Attacks is essentially a mathematical treaties of Cyber Attacks. I can get into that with you during our phone conversation if you like. However, caffeine is a mandatory prerequisite for any such conversations! You have been duly warned hehe.
Warmest regards,
Ahmed
Ahmed,
Thanks for the comments. You are correct — to date we’ve not addressed why WAF is not a ‘set-and-forget’ product, but we cover this in the next post on policy management. What we won’t cover is the fundamental deficiencies with rules based upon (stateless) natural language parsing. While I personally find this study fascinating, I have to admit it won’t help the majority of people get their jobs done, and probably won’t enable them to do their job better. The good news comments are _a great place_ to discuss this type of concern!
As far as the Halting Problem, if you are implying that WAF’s can never offer complete security because they cannot possibly account for all input values, my response is as follows: WAF’s aren’t looking to solve all possible threats, rather address the most common stuff and hopefully cause the attacker to jump through enough hoops that the act of evasion itself is noticeable or triggers other defenses. The better the WAF, the more it forces attacker perfection. I’ve not spoken with a vendor nor customer that carries the expectation of perfection, or that theoretically WAFs can or should be perfect, rather they don’t want to succumb to stupid and trivial attacks. Our goal with this paper is to hopefully get WAF’s setup to be better so it offers better security than the application it’s protecting. After-all, it’s general application suckage that prompted the creation — and use — of WAF’s. I like to say “It’s better than the alternative”, and Mike likes to say “Sometimes good enough is good enough.”
And we should point out that white listing is not based upon natural language parsing and does dramatically reduce the attack surface we need to secure. That certainly makes the number of policies we need to write much smaller, and it helps reduce some of the complexity in how to describe attacks. Reducing the problems set to a finite number makes things easier, both practically and theoretically.
I am in total agreement that the the language parsing and grammar checking methods of attack pattern detection sucks. It’s laborious to use these methods to describe the attacks, in some cases it is not flexible enough to describe attacks, and because they simply cannot account for all vectors (time, sequence, context, application state, etc) an attacker will look to exploit. Information needed to make a well informed security decision is outside what’s captured in most web requests; as most comparisons are point-in-time inspections, they’re limited to straight regular expression checking plus a few other tricks like reputation and geolocation. So I get that theoretically we are fighting a losing battle. In practice, attackers have limited skill, time and resources as well. I don’t look at this as black or white, perfect vs. flawed, rather is it good enough to address someones appetite for risk.
Thanks again,
Adrian
First of all, Very cool! I think that fundamental principles of configuring WAFs is very important and understanding how to run any security device is important. There is an issue with the way current WAFs are built though, that may not be covered by your axioms.
I think that the causes you give for failure of WAFs are valid but are an incomplete picture. In particular, the root cause that you conclude with, namely that WAFs don’t work due to operational failure rather than a technology failure. I humbly disagree. Let me try to explain:
You pointed out that “WAF is not a set and forget product, but for compliance purposes it is often used that way — resulting in mediocre protection.”
However, you don’t provide any examination of why a WAF is not a set and forget technology. Or for that matter why WAFs require so much manual intervention which, in turn, leads to operational bottlenecks.
The basic reason is that current approach to web-application security is trying to do something that is not possible because the approach is an attempt to violate the Halting Problem (actually it’s an attempt to violate Rice’s Theorem) but analogy works.
WAFs, generally, use some form of regular language to describe the patterns that they’re trying to hunt down, whereas the attacks that occurred on both applications are typically at least as complex as non-contextual languages, and on the outset outright Turing machine deltas. What I mean by Turing machine deltas is that the cyber attack itself introduces either a new state or the hidden condition that generates a new sentential form inside the web app, something a regular language parser would never be able to figure out. And pretty much all WAFs use some form of pattern matching which can be mathematically reduced to a group of regular languages, with hardcoded non-contextual overlay that selects between them.
It can also be mathematically demonstrated, that no matter how well an individual WAF administrator understands spoofing, fraud, non-repudiation, denial of service attacks, and application misuse; that she cannot use a tool to encode that knowledge which is fundamentally incapable of capturing that knowledge due to its theoretical design limitations. So in this case, it’s a violation of Rice’s Theorem.
Of course, if we say that all attacks are app input-based and that the input all falls within Regular Language scope and that the WAF, administrator completely understands the hidden turing machines of her web applications we can successfully use a WAF. The problem with this is that even the web-app developers cannot claim to know whether or not and they have hidden (inadvertent, weird, or malicious) Turing machines inside their code because 1) they don’t understand covert channels (basis for most SQL injection for example) and 2) they don’t control anything about the foundation of their App Stack (buffer overflow in an extension library of a PHP script) … But let’s leave that alone for now.
The fact that all current generation WAFs engines can demonstrably be reduced to a collection of regular languages groups, which one can be subclass as a non-contextual language. Namely, a set of regular expressions with selection operation on them forming the group. The WAF administrator needs to (1) keep up to date with all possible known attacks, albeit with help of the WAF vendors through automation, and constantly update the WAF with new patterns for known attacks. (2) Hope that none of the attacks fall out of the WAF schema.
Given this, it’s fairly trivial to show that there exists at least one attack (sentential form) that falls out of the detection scope of a WAF. In other words something a WAF will misclassify.
Not so trivial would be the proof of the general case, but it is proovable that any subset of a Turing detection engine will have an asymptotic limit at around 1/sqrt(2) as the upper limit on successfully classifying whether an input forms an attack on an app provided we fully know all of the machines within the target web apps, and that each machine’s language is fully qualified in just two categories. This asymptote is a calculable statistical consequence of Rice’s Theorem. A number which is corroborated by observing 20 years of data on antiviral programs trying to detect computer viruses. Their success rate is achieving practical maximum at about 68%.
So I would submit to you that the reasons regular WAFs don’t work is not just based in operational flaws, but that the approach itself has a fundamental design flaw. In that a WAF is trying to qualify a hidden machine ( the one that makes up the bugs inside the app) by examining input, formally a sentential form for one of multiple machines in your APP, both good and bad, and determining whether or not that sentential form creates an undesired state transition or a previously not-programmed state machine substrate on which is its own catalyst that yields an undesired state within the app, or on which another sentential form can act maliciously to yield the same result.
With this fundamental issue of trying to violate the defining principles of Computer Science, all we can do is create perfect operational approach to asymptotic failure, which as I said earlier can demonstrated to be ~ 30%. The 30% ≈ (1 – 0.707) comes from the RMS value of normalized detection of malicious sentential forms in general; assuming no malicious intent by the programmer.
So to summarize, it’s not just operational flaws, it’s fundamental flaws within the current design of WAFs.
Best regards,
Ahmed
Trustifier