Big Data is not about Big Data
Rui Rosa, Closer Consulting
When we talk about Big Data, it is still common to hear that Big Data is about processing large volumes of data. Although this might be true and reflected in many Big Data definitions, I would like to state that this is an outdated concept and probably the last concern for anyone that is about to start a Big Data project.
To justify my claim that “Big Data is not about Big Data”, let me first remind that long before the term Big Data was massified (let’s consider McKinsey’s Report on “Big Data: The next frontier for innovation, competition, and productivity”, in 2011) we have already had environments that processed massive amounts of data.
Take the example of the 82 Terabytes DW built in 2000 or the 12.1 Petabytes DW record by SAP in 2014. Processing large volumes of data is nothing new, it is not the key factor in Big Data and especially, it is not the most important thing you should consider when starting to think on a Big Data project.
So, what is Big Data after all and what are the key factors we must consider when defining it?
At a larger scale, we may consider Big Data as a technological disruption and management revolution to create new real-time data-driven platforms to support the digital transformation we are facing.
Let´s go a little deeper on the idea of Big Data as a technological disruption and management revolution. To do so, we might think of it in a sequence of connected and incremental perspectives or developments as detailed in the next points. Note that in any of them, the issue of Big Data (meaning large volumes of data) is the key challenge.
Firstly, we might think on Big Data from a data perspective. In this context, Big Data is classified as the 3Vs: volume, variety, velocity. We already know from the starting of our reasoning that processing large volumes of data was solved a long time ago with classical technologies. The critical issue for any implementation of the 3V’s concept is on how to deal with unstructured and real-time data. Semi-structured and unstructured data will be more than 60 or 70% of the total data to process and it is mainly generated outside the enterprise (meaning that you don’t have control over the input formats).
Complementary, processing real-time data is a crucial factor in dealing with omnichannel customer integration and IoT implementations. Remember that we already have more devices than people connected to the web and demanding real-time data processing capabilities (it is important to note that the design of the future IT architectures will have to serve both people/customers and devices/machines). Unstructured and especially real-time data is the key issue in Big Data in a data perspective.
Secondly, we may also think about Big Data from a technological perspective. Here, we must consider two main market disruptions. Firstly, the Big Data technologies (Hadoop, Spark, etc.) were designed from ground up considering parallel data processing and not serial data processing capabilities (as the classical relational database systems). Secondly, the Big Data technologies were based on an open-source development (and not proprietary development) model that must be integrated in a coherent deployment environment through specific distributors (not the traditional database market vendors). With this new assumption, if you are a Chief Information Officer and must plan a Big Data platform, consider the challenges of having to adopt a completely new technological environment, that is not well understood by your traditional database vendor and that will have to be integrated with your legacy applications because you just can’t ignore your current IT architecture.
Furthermore, you will have to redesign your IT architecture to accommodate these new technologies and process structured and unstructured data in real-time. It is like doing your car’s maintenance while traveling. This is undoubtedly a big challenge for any organization – the adoption of a new technological paradigm.
Thirdly, do not think about Big Data without thinking about insights and Big Data analytics. Addressing these topics separately would be a huge waste of time and resources. Data is a core corporate asset and differentiator only if you put “Analytics at Work” (Thomas Davenport). Now, as we all learned before, creating an analytical infrastructure and culture is not an easy task. Again, Big Data analytics is a much more significant challenge than just processing large volumes of data.
Finally, and considering all that was said in the previous topics, we can also think about Big Data on the perspective of having to evolve our corporate roles and responsibilities (with Big Data architects, data engineers, data scientists etc.) and digital transformation process (new business and process models) to cope with the market demands. And as you guess again, this is a “Management Revolution” (HBR - Andrew McAfee, Erik Brynjolfsson) that is much harder to execute than processing large volumes of data.
Big Data is about creating the technology, data and competencies foundation to enable new data-driven digital processes and business models.
In conclusion, and as we learned from our experience and tried to justify, Big Data is not about Big Data. It is about adopting new disruptive technologies and executing a management revolution towards the digital transformation. In this context, Big Data is not something you try on your own to see if it works. You will have many challenges and pitfalls ahead. It is critical that you choose the right projects and the right partners for your journey.
Do you want to know more? Schedule a meeting with us here.
We will be glad to share our experience and assist you on your journey.