This is the third (and final) post in our series on Protecting What Matters: Introducing Data Guardrails and Behavioral Analytics. Our first post, Introducing Data Guardrails and Behavioral Analytics: Understand the Mission we introduced the concepts and outlined the major categories of insider risk. In the second post we delved into and defined the terms. And as we wrap up the series, we’ll bring it together via a scenario showing how these concepts would work in practice

As we wrap up the Data Guardrails and Behavioral Analytics series, let’s go through a quick scenario to provide a perspective on how these concepts apply to a simplistic example. Our example company is a small pharmaceutical company. As with all pharma companies, much of their value lies in intellectual property, which makes that the most significant target for attackers. Thanks to fast growth and a highly competitive market, the business isn’t waiting for perfect infrastructure and controls before launching products and doing partnerships. Being a new company without legacy infrastructure (or mindset), a majority of the infrastructure has been built in the cloud and they take a cloud-first approach.

In fact, the CEO has been recognized for their innovative use of cloud-based analytics to accelerate the process of identifying new drugs. As excited as the CEO is about these new computing models, the board is very concerned about both external attacks and insider threats as their proprietary data resides in dozens of service providers. So, the security team feels pressure to do something to address the issue.

The CISO is very experienced, but is still coming to grips with the changes in mindset, controls and operational motions inherent to a cloud-first approach. Defaulting to the standard data security playbook represents the path of least resistance, but she’s savvy enough to know that would create significant gaps in both visibility and control of the company’s critical intellectual property. The approach of using Data Guardrails and Data Behavioral Analytics presents an opportunity to both define a hard set of policies for data usage and protection, as well as watch for anomalous behaviors potentially indicating malicious intent. So let’s see how she would lead her organization thru a process to define Data Guardrails and Behavioral Analytics.

Finding the Data

As we mentioned in the previous post, what’s unique about data guardrails and behavioral analytics is combining content knowledge (classification) with context and usage. Thus, the first steps we’ll take is classifying the sensitive data within the enterprise.

This involves undertaking an internal discovery of data resources. The technology to do this is mature and well understood, although they need to ensure discovery extends to cloud-based resources. Additionally, they need to talk to the senior leaders of the business to make sure they understand how business strategy impacts application architecture and therefore the location of sensitive data.

Internal private research data and clinical trials make up most of the company’s intellectual property. This data can be both structured and unstructured, complicating the discovery process. This is somewhat eased as the organization has embraced cloud storage to centralize the unstructured data and embrace SaaS wherever possible for front office functions. A lot of the emerging analytics use cases continue to provide a challenge to protect, given the relatively immature operational processes in their cloud environments.

As with everything else security, visibility comes before control, and this discovery and classification process needs to be the first thing done to get the data security process moving. To be clear, having a lot of the data in a cloud service addressable via an API doesn’t help keep the classification data current. This remains one of the bigger challenges to data security, and as such requires specific activities (and the associated resources allocated) to keep the classification up to date as the process rolls into production.

Defining Data Guardrails

As we’ve mentioned previously, guardrails are rule sets that keep users within the lines of authorized activity. Thus, the CISO starts by defining the authorized actions and then enforcing those policies where the data resides. For simplicity’s sake, we’ll break the guardrails into three main categories:

  • Access: These guardrails have to do with enforcing access to the data. For instance, files relating to recruiting participants in a clinical trial need to be heavily restricted to the group tasked with recruitment. If someone were to open up access to a broader group, or perhaps tag the folder as public, the guardrail would remove that access and restrict it to the proper group.
  • Action: She will also want to define guardrails on who can do what with the data. It’s important to prevent someone from deleting data or copying it out of the analytics application, thus these guardrails ensure the integrity of the data by preventing misuse, whether intentional/malicious or accidental.
  • Operational: The final category of guardrails controls the operational integrity and resilience of the data. Enterprising data scientists can load up new analytics environments quickly and easily, but may not take the necessary precautions to ensure data back up or required logging/monitoring happens. Guardrails to implement automatic back-ups and monitoring can be set up as part of every new analytics environment.

The key in designing guardrails is to think of them as enablers, not blockers. The effectiveness of exception handling typically is the difference between a success and failure in implementing guardrails. To illuminate this, let’s consider a joint venture the organization has with a smaller biotech company. A guardrail exists to restrict access to the data related to this product to a group of 10 internal researchers. Yet clearly researchers from the joint venture partner need access as well, so you’ll need to expand the access rules of the guardrail. But you also may want to enforce multi-factor authentication on those external users or possibly implement a location guardrail to restrict external access to only IP addresses within the partner’s network.

As you can see, you have a lot of granularity in how you deploy the guardrails. But stay focused on getting quick wins up front, so don’t try to boil the ocean and implement every conceivable guardrail on Day 1. Focus on the most sensitive data and establish and refine the exception handling process. Then systematically add more guardrails as the process matures and you learn what has the most impact on reducing attack surface.

Refining Data Behavioral Analytics

Once the guardrails are in place, you have a low bar of data security implemented. You can be confident scads of data won’t be extracted and copied, or unauthorized groups won’t access data they shouldn’t. By establishing authorized activities, and stopping things that aren’t specifically authorized, a large part of the attack surface is eliminated.

That being said, authorized users can create a lot of damage either maliciously or accidentally. Behavioral analytics steps in where guardrails end by reducing the risks of negative activities that fall outside of the pre-defined rules. Thus, we want to pair data guardrails with an analysis of data usage to identify patterns of typical use and then look for non-normal data usage and behavior. This requires telemetry, analysis and tuning. Let’s use unstructured data as the means to describe the approach.

Getting back to our pharma example, the cloud storage provider tracks who does what to every bit of data in their environment. This telemetry becomes the basis of their Data Behavioral Analytics program. In order to accurately train the analytics model, they need data on not just known-good activity, but also activity that they know violates the policies. Keep in mind the importance of data quality, as opposed to mere data quantity. When building your own program make sure to gather data on user context and entitlements, so you can track how the data has been used, when and by which user populations.

Of course, you could just look for anomalous patterns on all of the telemetry, but that can create a lot of noise. So we recommend you start by identifying a type of behavior you want to detect. For instance, mass exfiltration of clinical trial data. So you’d identify which specific files/folders have that data, and look at the different patterns of activity. A quick analysis shows that a group of researchers in Asia have been accessing those folders, but at non-working hours in their local geography. That raises an alarm and causes you to investigate. It turns out that one of the researchers collaborates with another team in Europe, and thus has been working non-standard hours, resulting in the anomalous data access. In this case it’s legitimate, but this approach both alerts you to potential misuse, and also sends the message that the security team looks for this kind of activity as a bit of a deterrent.

If you use an off the shelf product much of this may be defined for you as starting points. Clusters of user activity based on groups, social graphs, hours and locations, and similar pattern feeds tend to be useful in a wide range of behavioral analytics use cases. You will likely still want to tune these over time to more refined use cases that reflect your own organization’s needs and patterns.

As with any analytical technique, there will be tuning required over time as things change in your environment that necessarily impact the accuracy and relevance of the analytics. So we’ll reiterate again the importance of sufficiently staffing your program to manage the alerts and ensure the thresholds walk that fine line between signal and noise.

Between the data guardrails to handle known risks and enforce authorized use policies and the data behavioral analytics to detect situations you couldn’t have predicted or malicious activity, leveraging these new approaches brings data security into the modern age.

As always, we’ll be factoring in comments and feedback on the blog series, so if you see something you don’t like or don’t agree with, let us know. We’ll be refining the content and packaging it up into a white paper, which will appear in the research library within a couple of weeks.