Many companies are currently in various phases of projects to comply with the European Union’s General Data Privacy Regulation (GDPR) ahead of the May 2018 enforcement deadline. Many vendors and service providers speak generally about GDPR and often, in my view, over simplify solutions to issues that are raised. Rather than try to address the whole of the regulation, I want to speak specifically about a practical issue that most companies will, at some point, need to address.
GDPR covers two categories of personal information, Personally Identifiable Information (PII) and Sensitive Personal Information (SPI). The two types of information are very different from each other and require separate approaches in order to identify them accurately, as they flow through systems and protect them adequately under the regulation.
Personally Identifiable Information (PII)
The first category of information that GDPR protects is familiar around the world as PII. This covers all types of information that are generally accepted as personally identifiable, such as names and national identifiers like Social Security Numbers (SSN) and European identifiers such as UK Driver’s License Numbers or Italy’s Codice Fiscal, for example. GDPR also expands the definition of Personally Identifiable Information to things like email addresses, corporate issued or otherwise, and IP addresses.
While the definition of PII has been expanded to include new types of Identifiable Information, the identifiers still have commonality in the fact that they generally follow defined formats and are relatively easy to program into a content analytics system through the use of regular expression. This means that they can largely be identified and protected through Data Loss Prevention, or simply DLP technologies, whether enterprise class DLP technologies or DLP capabilities integrated into other products like firewalls, Cloud Access Security Brokers, or Web Gateways.
With respect to GDPR, there are a few key capabilities that need to be considered. First, the sections related to data security stipulate that the organization has reasonable controls to monitor the flow of data throughout the environment. In my interpretation, this means that an organization must have the ability to monitor the use of personal information at the endpoint, in transit via web and email channels, and where it is stored throughout the environment. It also should include visibility into how information is stored in cloud applications and infrastructures as well as how information is transferred between cloud environments. Second, as a practical matter, I cannot imagine a scenario in which an organization could comply with Right to be Forgotten or guarantee a right to erasure without the capability to find that identity throughout all of their systems, including cloud applications, and remove it. Therefore, a DLP capability, while not making an organization compliant in and of itself, is a required element in order to achieve compliance.
It should be said that building a proper DLP program for the purposes of complying with the relevant articles of GDPR requires planning, coordination between business units, and a good deal of care and feeding. However, protecting PII has been a best practice for over a decade and many people have experience building such programs with DLP technologies. Protecting Sensitive Personal Information is a far greater operational challenge.
Sensitive Personal Information (SPI)
Sensitive Personal Information (SPI) refers to information that does not identify an individual, but is related to an individual, and communicates information that is private or could potentially harm an individual should it be made public. This includes things like biometric data, genetic information, sex, trade union membership, sexual orientation, etc. The challenge with traditional data security tools like DLP in protecting SPI is that many of those things exist in common usage without being related to an individual, and it is very difficult to program a content analytics engine to find the information that is in scope with GDPR without finding large volumes of information that is not in scope as well. The most elegant solution in my experience is to add a Data Classification program to the overall protection platform and integrate it with the DLP Program.
Data Classification allows a user to select a classification from a list. Many people are familiar with classification schemas used by governments and militaries, which classify information by levels of secrecy. For example, classifications may include public, sensitive, secret, top secret, etc. However, the most effective Data Classification tools are very flexible. They allow for multiple levels of classification as well as customizable fields. Therefore, for unstructured data, an organization could develop a classification schema that had simple drop down menus that ask the user whether a document contains PII and SPI with yes or no choices to both questions. Then, the Data Classification software will apply metadata tags to those documents which can then be leveraged by security tools like DLP systems to apply rules to the information based on those tags. This is a far more efficient and effective method of protecting SPI.
Data Classification programs can be used to communicate effectively in a human readable fashion as well. Many people may interact with PII and SPI on a frequent basis and not really think about the potential sensitivity of the information they handle. A large part of the spirit of GDPR is to cause people to think about the information they are handling and to handle it with due care. Complying with the spirit of the regulation will require a culture change in some organizations, which can be aided considerably by building a Data Classification program. This way, users can easily identify when they are handling sensitive information and perhaps they will handle such information with more care as they go about their daily routine. Many of the solutions also have the ability to communicate with the end user through tooltips or pop up messaging to reinforce the behavioral change.
Breaches of personal data can happen in a variety of ways. Those that garner the most attention are large scale breaches often caused by technical misconfiguration or a lack of due care on an industrial scale, but far more frequently, information is compromised on a small scale due to carelessness or a general lack of awareness. In these cases, Data Classification can help significantly.
Conclusion
Many organizations have what I call GDPR fatigue, which is that there have been so many technology and service providers using fear to sell products and services without addressing specific solutions to challenges posed by GDPR that many organizations have stopped listening. I do not look at GDPR as a reason for fear, but rather establishing rights of data subjects and responsibilities of organizations who gather and store information about data subjects.
With respect to many GDPR articles, compliance is relatively straight-forward. However, the basis of compliance is understanding how to identify and protect Personally Identifiable Information and Sensitive Personal Information. Therefore, programs to enable that identification and protection are the foundational elements of compliance from a tools and capabilities perspective. Data Loss Prevention and Data Classification form a powerful combination for protecting both PII and SPI. The challenge then becomes one of leveraging those capabilities properly to fulfill controller and processor obligations and protect data subject rights.
This blog was written by guest author Jeremy Wittkop, Chief Technology Officer at InteliSecure.