What big data is and where it comes from

In statistics and computer science, the locution big data – “big masses of data”, or in Italian megadata – generically indicates a collection of informative data so extensive in terms of volume, speed and variety as to require specific technologies and analytical methods for the extraction of value or knowledge. So the literature explains what big data is and what it’s for, using terms that might sound too technical for the uninitiated. In fact, it is one of the most profound and pervasive evolutions of the digital world, destined to last over time and to profoundly affect our daily lives and the productive activities of companies.

An influence that we can perceive every day, and that has practically radically modified many of the basic activities of our existence. As well as the world that surrounds us. That’s why, especially in the last twenty years, we hear more and more often about megadata in the printed and online press, and even more so in the pages dedicated to marketing and IT. In this guide we will discover together their value, what they are used for and where big data can come from.

Big Data: what they are and what they are used for

Big data is not only a powerful trend but, as we have already mentioned, it is also destined to last over time. Moreover, it will constantly improve from the application point of view. The term, you will have understood, is used in reference to the ability – proper to data science – to analyze, extrapolate and relate a huge amount of heterogeneous data, structured and unstructured. All thanks to sophisticated statistical and computer processing methods, with the aim of discovering links and correlations between different phenomena and consequently predicting future ones.

To give a few examples, from a business point of view, big data can be used for various purposes, including to measure the performance of an organization or a business process. In everyday life, however, to fully understand what big data is, we can think of when we interact on social networks, browsing on any website, or the most modern smartphones that are practically always interconnected, without forgetting the credit cards used for purchases, television, the storage needed for computer applications, the smart infrastructure of cities, up to the sensors mounted on buildings and on public and private transport.

In all these cases, we are faced with a seriously impressive amount of data generated, and obviously much higher than that of a few decades ago. Today, thanks to big data they can be analyzed in real time. In addition, even humans have become over time sources of data, just as a not inconsiderable amount of them are created along the value chain of any industry. In 2011, Teradata states that “A big data system exceeds/overcomes/exceeds the hardware and software systems commonly used to capture, manage and process data in a reasonable time frame for even a massive community/population of users.”

A further proposed characterization of big data was given by the McKinsey Global Institute: “A big data system refers to datasets whose size/volume is so large that it exceeds the ability of relational database systems to capture, store, manage and analyze.” In reality, the mere definition of big data is not sufficient to offer a complete and optimal picture of such a relevant phenomenon. In fact, it doesn’t mean just talking about big data: the process of data collection and management has also changed, and the technologies supporting the data life cycle and its valorization have evolved.

The great revolution we refer to when talking about big data is therefore above all the ability to use all this information to process, analyze and find objective feedback on various issues. This translates into what can be done with all this amount of data, i.e. algorithms capable of dealing with so many variables in a short time and for that matter with few computational resources available – maybe even a simple laptop to access the platform being analyzed. Big data, to put it more simply, presupposes new and more refined abilities to link information together to provide a truly visual approach to data, suggesting patterns and models of interpretation that were previously unimaginable.

Big data, then, is generally defined by three Vs. The first, since it is very big data, is Volume, that is, both the amount of data (structured or unstructured) generated every second from heterogeneous sources – to name a few, we can think of sensors, logs, email, GPS, social media and traditional databases. We also have Variety, which refers to the different types of data that are generated, accumulated and used, followed finally by Velocity – as big data is produced in real time. Over time, a fourth V was introduced, that of Truthfulness, and then a fifth, that of Value. In short, it’s a revolution that is done and – not – finished and that closely touches the life of every single person without anyone noticing it.

The different uses of Big Data

Analyzing large amounts of data allows you to generate new knowledge useful to make more informed decisions, not only in business. Now that we know what big data is and what it’s used for, it’s just as necessary to be aware of how it’s used in different sectors. All this is made possible and completely affordable by technologies that allow you to manage unstructured data and process large volumes of data in real time, but also thanks to the spread of more sophisticated algorithms and methodologies of analysis widely innovative.

These tools can and must independently extrapolate the information hidden in the data. Translating precisely into potentially infinite applications, visible every day in the modern world. First of all in marketing, megadata find their most useful and widespread use, being widely employed in the construction of so-called recommendation methods, such as those used by entertainment and eCommerce giants – Netflix and Amazon, to name a few – to make purchase proposals based on the interests of a specific customer over those of millions of others. Nothing more than a more precise and timely personalization of the offer than what companies put in place just a handful of years ago.

The perception and subsequent reduction of fraud is another example of how big data can be used in everyday life and can consequently create productive value and improve any kind of experience for users of a service or platform. Leading credit card companies, such as Visa or American Express, not surprisingly analyze billions of transactions every day from all over the world to identify unusual movements and patterns, in order to significantly lower the number and incidence of fraud in real time.

There is also no shortage of uses of big data in so-called predictive maintenance. This term refers to companies that leverage data collected on operations to analyze performance and predict the possible existence of future problems before they happen. Experts have observed that companies that are leaders in big data are able to generate an average of 12% more profit than companies that do not leverage the value of these data stars of our time.

In the public sphere, then, there are all sorts of other applications for big data: in recent years, police forces have been using large amounts of real-time data to predict where and how many crimes are more likely to occur; more precise studies by the associations in charge of the correlation between health and the quality of the air we breathe have multiplied; there is also the possibility of carrying out genomic analysis to improve the drought resistance of rice crops; or even the creation of models to analyze data from living beings in life sciences and medical research, both diagnostic and pharmacological.

Of course, in all of these areas it is absolutely vital that the legitimate use of big data is regulated in the face of the incredible value that resides in it. Illegal or overly intrusive use of data can, in less severe cases, undermine customer confidence in companies. In more serious cases, however, it can cause damage to citizens – who may be patients, voters and consumers – what is defined as the weakest link in the value chain. As pointed out by business literature and legislation, the protections of individuals include the right to privacy and individual freedoms: in order to ensure these protections, the control and sanctioning activities of the relevant Government Agencies need to be strengthened and adjusted with more advanced regulatory and financial tools.