Unstructured Data
How important is it?
In a context of continuous digital acceleration processes by the majority of companies, and the increasing adoption of digital channels usage by consumers, the amount of data generated daily, both from companies and consumers are just simply overwhelming.
According to some insights, between 80 to 90 percent of that data generated every day can be defined as unstructured data. But what does that mean and why unstructured data is so important in today´s business environment?
By definition, unstructured data is data that is not organized in a pre-defined fashion or lacks a specific data model or metadata.
Structured data, on another hand, is data that has clear, definable relationships between the data points, with a pre-defined model containing it.
Structured data can easily be fit into a relational database, and examples of different features in a structured dataset include items like names, dates, credit card numbers, transactions amounts, transactions dates.
In an artificial intelligence and machine learning context, structured data is easier to train a machine learning system on, because the patterns within the data are explicit.
Unstructured data does not have, in fact, any defined relationships between data points and is stored typically in a non-relational database, or a data lake, depending on the structure of the non-relational database. Unstructured data is difficult to analyze and withdraw value of unstructured data often involve the analysis of little datasets.
But what are the typical unstructured data sources? A couple of examples are social media, medical records and business documents.
Social media has become an important part of our lifestyle, and for many, it is the preferred channel when it comes to view, create or share information. Used cross-cuttingly in society — individuals, companies, governments and other organizations — social media generates a huge amount of data every day. This has led to a proliferation of data that could be in various formats: text, images, videos, audio, geo-locations and even sentiments. As both structured and unstructured types of data are created from social media use, it has an enormous potential for providing rich insights into perceptions, customer’s behavior, trends, news and even, if misused, to change the outcome of events.
Medical records are another excellent example. As healthcare generates large volumes of unstructured data, both human and machine generated, in form of medical reports and data collected by medical devices, such as cardiac monitors. Once these data are not, in most cases, still being used to their full potential, there is considerable interest in applying artificial intelligence to improve diagnostics, patient care and research, creating space for paradigm transformation from “sick care” to true healthcare.
Corporate documents: The majority of documents that are used in organizations, such as emails, PowerPoint presentations, and reports are unstructured. These documents are a relevant part of an organization’s knowledge base, but often devalued, considering that they are not part of any structured information system. An insurance company can find a fraud pattern by analyzing client’s emails related to claims or a bank risk team can anticipate the increased risk of a corporate client by analyzing market reports.
Digital imaging technologies, pattern recognition, natural language processing and machine learning help understand unstructured natural language text. These techniques help to understand the contents of large volumes of documents, compare huge volumes of images to find a pattern in a CAT scan or in an MRI and accelerate diagnosis. Unstructured data has enabled organizations to better position themselves for fraud analysis that prevents financial impacts and mitigates risk, to implement and manage clever loyalty programs that identify and target the consumer and customer’s segmentation based on sentiment and behavior analysis.
For organizations, the ability to analyze and interpret unstructured data may be the key to create a competitive advantage and ensure relevant insights into its business, customers’ competitors, leveraging its business with enriching information.
As unstructured data gains importance and volume each day, at Closer we are already looking at other techniques of machine learning to leverage the power of unstructured data.
The amount of unstructured data is much larger than that of structured data. As mentioned, several studies point up to more than 80% of all enterprise data is unstructured. This means that companies are missing a lot if unstructured data is not taken into account and treated properly.
There are several techniques to “structure” unstructured data. After these application, one can make use of the latest trends in machine learning. E.g., machine one-shot learning algorithms, unlike supervised learning, where thousands of examples are necessary to train a model, n-shot learning aims at providing the model only one or just a few examples. This new concept aims at capturing maximum possible information from a low-data regime. It can also be used where new classes need to be added on the fly.
For organizations, the ability to analyse and interpret unstructured data may be the key to create a competitive advantage and ensure relevant insights into its business, customers and competitors, leveraging its business with information enrichment.
At Closer, we continue to follow our path to the future, always looking for The Closer Way To Challenge Complexity.