2
Starting from a simple methodical approach on data protection, data analysis has
become a real discipline, leading to the development of real methodologies generating
models. The model is in fact the translation into a mathematical form of a system placed
under study. Once there is a mathematical or logical form that can describe system
responses under different levels of precision, you can then make predictions about
its development or response to certain inputs. Thus the aim of data analysis is not the
model, but the quality of its predictive power.
The predictive power of a model depends not only on the quality of the modeling
techniques but also on the ability to choose a good dataset upon which to build the
entire data analysis process. So the search for data, their extraction, and their subsequent
preparation, while representing preliminary activities of an analysis, also belong to data
analysis itself, because of their importance in the success of the results.
So far we have spoken of data, their handling, and their processing through
calculation procedures. In parallel to all stages of processing of data analysis, various
methods of data visualization have been developed. In fact, to understand the data, both
individually and in terms of the role they play in the entire dataset, there is no better
system than to develop the techniques of graphic representation capable of transforming
information, sometimes implicitly hidden, in figures, which help you more easily
understand their meaning. Over the years lots of display modes have been developed for
different modes of data display: the charts.
At the end of the data analysis process, you will have a model and a set of graphical
displays and then you will be able to predict the responses of the system under study;
after that, you will move to the test phase. The model will be tested using another set
of data for which you know the system response. These data are, however, not used to
define the predictive model. Depending on the ability of the model to replicate real
observed responses, you will have an error calculation and knowledge of the validity of
the model and its operating limits.
These results can be compared with any other models to understand if the newly
created one is more efficient than the existing ones. Once you have assessed that, you
can move to the last phase of data analysis—deployment. This consists of implementing
the results produced by the analysis, namely, implementing the decisions to be taken
based on the predictions generated by the model and the associated risks.
Data analysis is well suited to many professional activities. So, knowledge of it
and how it can be put into practice is relevant. It allows you to test hypotheses and to
understand more deeply the systems analyzed.
Chapter 1 an IntroduCtIon todata analysIs