Our website uses cookies

Cookies enable us to provide the best experience possible and help us understand how visitors use our website. By browsing Infosecurity Magazine, you agree to our use of cookies.

Okay, I understand Learn more

#HowTo: Avoid Common Data Discovery Pitfalls

A data classification process is an important component of any data security, risk management and compliance strategy, as it makes it easier to locate and retrieve data. Yet, many companies struggle with the process. 

Here at Digital Guardian, we invited a panel of data scientists and security experts to identify the most common pitfalls around discovering and properly classifying data, and how they can be avoided. Here’s what they told us.

Lack of goal setting
Among the most common issues with data discovery and classification is the lack of goal setting from the outset. Too often, the objective is to capture more data, and the assumption is that will help influence decision making. However, the actual decisions that need further influence are frequently not considered early enough. This leads to outcomes that may have no significant business value relative to the time spent on the data.

Before setting out on data discovery and classification, make sure there are clear goals in mind of what the data is going to help achieve.

‘Paralysis by analysis’
Organizations often run into the problem of 'paralysis by analysis.' Too often, analysts get far too caught up in data. People put time and effort into collecting, cleaning, and centralizing it, but then what? Data on its own is just raw information. Obsessing over it like this is a mistake, and organizations need to shift toward an obsession with taking this information and transforming it into knowledge. Only then can it help organizations generate wisdom.

Failing to realize the value of data discovery
In itself, data discovery and classification hold no intrinsic value. Organizations can't expect to adequately improve data security and compliance solely through locating and labelling data. They will only start to see real value when it is used in conjunction with other data security practices.

For example, once an organization has found out where their most at-risk data resides within its infrastructure, what do they do next? Can they determine who has access to that data, who's making changes to it, what those changes are, and whether the surrounding environment is secure?

Data discovery and classification is powerful and necessary, but it shouldn’t live in a silo. Combining this functionality with permissions analysis, user and entity behavior analytics, and change auditing will enable the true value to emerge.

Poor data quality hinders ability to deliver customer-centric value
To avoid this pitfall, plan search and segmentation fields ahead of time to ensure the search criteria delivers the expected value. Develop a governance plan with a clear delineation of who is responsible for entering, validating, and maintaining the data; and establish user protocols regarding where data gets entered, and when.

Also, select a CRM tool that is easy to access and use in every situation where users communicate with customers and prospects (e.g., email, on social, on the web, and while mobile), that consolidates all business contacts and automates data entry and data enrichment. Data enrichment can either be provided via a third-party tool like ZoomInfo or DiscoverOrg or offered in-the-box with a small business CRM, such as Nimble.

Trying to solve problems beyond human scale
In the enterprise, there are problems simply beyond human scale. Some organizations are sitting on petabytes of data they don’t even know about, to say nothing about the volumes of data being created every day. Discovery and classification become impossible due to the sheer amount of data organizations have in their possession.

AI-powered auto-classification, trained on a small subset of properly recognized data, is possible today. Machine learning tools are the only way organizations can hope to make significant headway. It's not perfect, but it's a process that improves over time as the machine learns what defines a document specific to an organization. It means the companies that start the process now are in a better position to leverage more of their data in the future and provide cleaner data fuel for future predictive AI-powered analytics and decision making.

Giving valuable data away
Data discovery eventually leads to data storage requirements and the creation of irresistible honeypots for malicious third parties. Combine that with chronic levels of employee-centric data breaches, and it can create a recipe for disaster.

While doing data discovery looks like work and may seem like a productive use of time to the user, it is easy for casual users to spend time analyzing data without purpose. Data is the new oil - don't give it away.

Across such a broad and detailed discipline, the pitfalls and remedies will shift as new approaches and products emerge, but avoiding common and foundational issues will allow organizations to maximize their investment in data discovery.

What’s Hot on Infosecurity Magazine?