This is the second post in our series on Protecting What Matters: Introducing Data Guardrails and Behavioral Analytics. Our first post, Introducing Data Guardrails and Behavioral Analytics: Understand the Mission, introduced the concepts and outlined the major categories of insider risk. This post defines the concepts.
Data security has long been the most challenging domain of information security, despite being the centerpiece of our entire practice. We only call it “data security” because “information security” was already taken. Data security must not impede use of the data itself. By contrast it’s easy to protect archival data (encrypt it and lock the keys up in a safe). But protecting unstructured data in active use by our organizations? Not so easy. That’s why we started this research by focusing on insider risks, including external attackers leveraging insider access. Recognizing someone performing an authorized action, but with malicious intent, is a nuance lost on most security tools.
How Data Guardrails and Data Behavioral Analytics are Different
Both data guardrails and data behavioral analytics strive to improve data security by combining content knowledge (classification) with context and usage. Data guardrails leverage this knowledge in deterministic models and processes to minimize the friction of security while still improving defenses. For example, if a user attempts to make a file in a sensitive repository public, a guardrail could require them to record a justification and then send a notification to Security to approve the request. Guardrails are rule sets that keep users “within the lines” of authorized activity, based on what they are doing.
Data behavioral analytics extends the analysis to include current and historical activity, and uses tools such as artificial intelligence/machine learning and social graphs to identify unusual patterns which bypass other data security controls. Analytics reduces these gaps by looking not only at content and simple context (as DLP might), but also adding in history of how that data, and data like it, has been used within the current context. A simple example is a user accessing an unusual volume of data in a short period, which could indicate malicious intent or a compromised account. A more complicated situation would identify sensitive intellectual property on an accounting team device, even though they do not need to collaborate with the engineering team. This higher order decision making requires an understanding of data usage and connections within your environment.
Central to these concepts is the reality of distributed data actively used widely by many employees. Security can’t effectively lock everything down with strict rules covering every use case without fundamentally breaking business processes. But with integrated views of data and its intersection with users, we can build data guardrails and informed data behavioral analytical models, to identify and reduce misuse without negatively impacting legitimate activity. Data guardrails enforce predictable rules aligned with authorized business processes, while data behavioral analytics look for edge cases and less predictable anomalies.
How Data Guardrails and Data Behavioral Analytics Work
The easiest way to understand the difference between data guardrails and data behavioral analytics is that guardrails rely on pre-built deterministic rules (which can be as simple as “if this then that”), while analytics rely on AI, machine learning, and other heuristic technologies which look at patterns and deviations.
To be effective both rely on the following foundational capabilities:
- A centralized view of data. Both approaches assume a broad understanding of data and usage – without a central view you can’t build the rules or models.
- Access to data context. Context includes multiple characteristics including location, size, data type (if available), tags, who has access, who created the data, and all available metadata.
- Access to user context, including privileges (entitlements), groups, roles, business unit, etc.
- The ability to monitor activity and enforce rules. Guardrails, by nature, are preventative controls which require enforcement capabilities. Data behavioral analytics can be used only for detection, but are far more effective at preventing data loss if they can block actions.
The two technologies then work differently while reinforcing each other:
- Data guardrails are sets of rules which look for specific deviations from policy, then take action to restore compliance. To expand our earlier example:
- A user shares a file located in cloud storage publicly. Let’s assume the user has the proper privileges to make files public. The file is in a cloud service so we also assume centralized monitoring/visibility, as well as the capability to enforce rules on that file.
- The file is located in an engineering team’s repository (directory) for new plans and projects. Even without tagging, this location alone indicates a potentially sensitive file.
- The system sees the request to make the file public, but because of the context (location or tag), it prompts the user to enter a justification to allow the action, which gets logged for the security team to review. Alternatively, the guardrail could require approval from a manager before allowing the action.
Guardrails are not blockers because the user can still share the file. Prompting for user justification both prevents mistakes and loops in security review for accountability, allowing the business to move fast while minimizing risk. You could also look for large file movements based on pre-determined thresholds. A guardrail would only kick in if the policy thresholds are violated, and then use enforcement actions aligned with business processes (such as approvals and notifications) rather than simply blocking activity and calling in the security goons.
- Data behavioral analytics use historical information and activity (typically with training sets of known-good and known-bad activity), which produce artificial intelligence models to identify anomalies. We don’t want to be too narrow in our description, because there are a wide variety of approaches to building models.
- Historical activity, ongoing monitoring, and ongoing modeling are all essential – no matter the mathematical details.
- By definition we focus on the behavior of data as the core of these models, rather than user activity; this represents a subtle but critical distinction from User Behavioral Analytics (UBA). UBA tracks activity on a per-user basis. Data behavioral analytics (the acronym DBA is already taken, so we’ll skip making up a new TLA), instead looks at activity at the source of the data. How has that data been used? By which user populations? What types of activity happen using the data? When? We don’t ignore user activity, but we track usage of data.
- For example we could ask, “Has a file of this type ever been made public by a user in this group?” UBA would ask “Has this particular user ever made a file public?” Focusing on the data offers a chance potential to catch a broader range of data usage anomalies.
- At risk fo stating the obvious, the better the data, the better the model. As with most security-related data science, don’t assume more data inevitably produces better models. It’s about the quality of the data. For example social graphs of communication patterns among users could be a valuable feed to detect situations like files moving between teams who do not usually collaborate. That’s worth a look, even if you wouldn’t want to block the activity outright.
Data guardrails handle known risks, and are especially effective at reducing user error and identifying account abuse resulting from tricking authorized users into unauthorized actions. Guardrails may even help reduce account takeovers, because attackers cannot misuse data if their action violate a guardrail. Data behavioral analytics then supplements guardrails for unpredictable situations and those where a bad actor tries to circumvent guardrails, including malicious misuse and account takeovers.
Now you have a better understanding of the requirements and capabilities of data guardrails and data behavioral analytics. Our next post will focus on some quick wins to justify including these capabilities in your data security strategy.