Organizations of all sizes face a similar challenge: the ticking time bomb of old data.
2020 is expected to be a period of exceptionally rapid data growth, with the growth in demand outstripping growth in storage supply solutions, according to Harvard Business Analytics.
Adding to the complexity: 80% of all data organizations generate daily is unstructured, that is, stored but not easily found.
Whether documents, customer reviews, call center transcripts or multimedia files, unstructured data can contain personally identifiable information (PII) and corporate IP that could put a company at risk not to mention the scores of data that’s simply redundant, obsolete, or trivial.
The elephant in the room: Unstructured data
“Companies don’t know what information is out there – they know they have a problem; it’s the elephant in the room,” says Fortra’s Data Classification Suite Chief Architect Ayman Wassif, explaining that finding solutions is hard because it requires companies to dedicate resources and effort to monitoring and ensuring that they handle data properly.
Why should companies care? Privacy laws around the world are growing exponentially.
Knowing not only where your hidden data is located, but also what it contains will allow you to avoid compliance problems while establishing a comprehensive solution for safeguarding all the untapped and often unknown data that lives in your enterprise systems.
According to IDC, 90% of the unstructured data are never analyzed.
Such data is known as dark data.
As Gartner defines it, dark data is part of an organization’s information assets and, because there’s so much of it, it can help unlock business value. As a leading data classification, identification and security automation provider, Fortra’s Data Classification Suite has seen organizations struggle with the influx of this unstructured data.
Knowing where dark data hides can be eye opening, and ignoring it is potentially costly, as Wassif notes in these real-world examples.
In one instance, Fortra’s Data Classification Suite software detected source codes posted in a financial company’s public folder after a user had forgotten to remove the files after sharing it with a colleague.
The IP, in the wrong hands of a competitor or hacker, could have hurt the company’s competitive position or even resulted in unauthorized network access. In a second instance, an employee at one company writing an RFP shared an answer with a friend at a competing firm. The unauthorized exchange of confidential information was captured and shared with the customer, prompting the first company to lose a bid worth billions of dollars.
In a third data policy breach, a client’s users were sharing movie downloads on a corporate drive, putting the firm at risk since it was copyrighted and pirated material.
As these examples illustrate, organizations should begin with employee education and awareness – ensuring that all levels of an organization know their company’s privacy and information-sharing policies, including what they can put on publicly shared drives and folders.
Related Reading: Data Classification: The First Step to Protecting Unstructured Data
Illuminate your data at rest
But it can’t stop there.
Technology can be a critical ally in getting visibility into your unstructured data dilemma. Fortra’s Data Classification Suite’s Illuminate 2020 gives users the ability to search their organization’s data at rest for PII. The software works with Fortra’s Data Classification Suite’s Accelerator for Privacy solution to better control data regardless of where it’s stored. Previous versions of the software only scanned data at creation.
Jamie Manuel, Vice President of Product Management and Marketing at Fortra’s Data Classification Suite, notes that Illuminate allows organizations to more easily cull volumes of enterprise data to find the pieces that can put organizations at risk, thereby driving better regulatory compliance.
“The good thing about our solution is you don’t need to configure it yourself; it automatically knows what to search for in terms of PII, so it can help speed up your searches to find what data might pertain to those privacy regulations,” says Manuel.
So, how well can the software identify what’s personal and what’s not?
Knowing where dark data hides can be eye opening. Ignoring it is potentially costly.
Fortra’s Data Classification Suite put Illuminate to the test against enterprise users, with surprising results. When it comes to identifying PII, Illuminate software was 22% more accurate than people. Extrapolating the findings, the experiment finds that for every 100 files of personal information, humans misidentified 22 of them.
“That’s a big deal,” says Manuel.
Illuminate scans locations where users store data, including on-premise shares, Box, Dropbox, OneDrive, and Microsoft SharePoint and SharePoint Online. It examines and automatically classifies the files it discovers, ensuring appropriate data protection is applied.
And, as it scans files, Illuminate gathers extensive information about each file, building a data inventory for users to run analytics to identify risk areas.
Best practices for protecting your data in the cloud
Clearly, as data proliferates, organizations will continue to rely on cloud Infrastructure to manage their information.
According to SpiceWorks’ report, Data Storage Trends in 2020 and Beyond, businesses expect double-digit growth in the next two years for many storage technologies.
By 2022, an additional 20% of companies plan to use cloud storage infrastructure, and there will be significant gains for high-capacity hard disk drives (17%), all-flash storage (14%), and cloud file-sharing services (10%).
Too often, organizations want to reclassify their data after it’s been publicly stored – and shared – in the cloud. The issue is once it’s on a public folder, “you have zero knowledge of where it’s copied after that. It’s could be everywhere,” Wassif says.
Instead, organizations should scan and classify their data first before ever migrating it to the cloud. That way, they can protect sensitive data “right” the first time, through encryption and other methods, advises Fortra’s Data Classification Suite executives.
Prior to using cloud-storage solutions, users should be aware of what permissions are required for sharing information. For example, OneDrive is private until a user starts sharing information, whereas SharePoint information is public to all group members with access.
“A key best practice before you migrate data up to the cloud is to do discovery of what data you have,” adds Manuel. “There are many solutions to protect information and do the right sharing of information – But you want to prevent yourself from putting things in the wrong place with the idea, ‘I’ll fix it later.'”
While there are excellent technology solutions to help manage data, companies need to recognize that they are the true custodians of their organization’s data. Wassif urges caution when working with cloud storage vendors.
“The vendor will guarantee service and a certain uptime,” he notes. “But the minute you allow external people to view your data, additional protections need to be in place that can analyze what is being shared and when, and can assess if it’s the right way of sharing.”
Investing in Machine Learning
Illuminate, like many of Fortra’s Data Classification Suite’s products, analyzes stored content, using tools, machine learning and smart pattern recognition to understand how sensitive information is and apply the right rules to it.
The key, according to Fortra’s Data Classification Suite officials, is to not stop with just the machine learning engine but to train the model so that detection becomes more accurate.
“That’s the swim lane we’re in now – how can we have the best accuracy and best detection out there?” Wassif concludes.