How to approach Data Analyitics Visualization
The general purpose of Data Visualization and Analytics tools is to explain reality as best as possible through the reading and interpretation of data.
Among the reasons why Data Visualization is essential:
– Making data easier to understand and remember;
– Discovering new trends;
– Quickly visualize relationships and patterns;
– Tell stories by curating data into an easier-to-understand form;
– Increase the ability to act quickly on results
The increasing availability of data and tools for collection and processing requires the application of a Data Driven methodology with two levels of Data Analysis:
– Descriptive Analysis: representation and description of reality explained by data;
– Predictive Analysis: based on solutions to perform data analysis in order to draw development scenarios in the future.
In order to extract information from Big Data, it is necessary to make use of a number of analysis methodologies, but also suitable systems for their representation to investigate complex phenomena of reality.
The design (design) of the data visualization solution is a key phase that allows through data visualization tools to bring order to reality.
Compared to classic tables that collect data in long lists of rows and columns, data visualization shows new visual practices to guide the user in semantic exploration of large data sets.
Raw data is considered as an “amorphous matter” that needs a precise design in order to be structured into information, noting that data, per se, is neither information nor an objective tool of knowledge. Understanding data visualization as a powerful process of “sense-making: analysis, processing and interpretation (by the designer) of a data set.”
The main goal of these representations is to create a new language that is easily understood and best reproduces the complex phenomenon of Big Data.
Interactivity allows for dynamic representations in which the user can interact with them according to his or her own analysis interests.
This becomes possible through the use of variables as parameters and relationships with other data sets that influence their structure (and meaning).
This type of visualization finds its greatest realization in online interactive Data Visualization.
The tool is designed to provide a snapshot of key trends that contribute to the impact assessment of operational processes and procedures aimed at improving them by adopting a data-driven methodological approach.
The designed tool in addition to providing indicators of historical trends, thanks to the use of statistical algorithms combined with Machine Learning and Deep Learning features, enables predictive analysis to identify the probability of future outcomes based on historical data.
The methodology we apply in dashboard design is based on the following principles:
- Data visualization should answer strategic and fundamental questions, provide real value, and help solve concrete problems;
- Data visualization must be compatible with the audience’s skills and enable users to visualize and process data quickly and easily.
- Data visualization is designed using the principles of data journalism, which allow relevant information to be enhanced in aggregate mode and drill down through drill down tools to granular data and drill through for analysis of different related dimensions.
- The dashboard is implemented with different levels of aggregation of data and information constituting specific reports:
– Strategic reports designed for management that provide comprehensive analysis on metrics and KPI’s;
– Analytical reports designed to answer frequently asked questions with the main objective of providing information in a direct and granular manner;
– Highly interactive operational reports that enable research and experimentation with data to address stakeholder needs;
- The visual functions used are defined in order to adequately display the data and visualize it in the most effective and efficient way by enhancing the trends, KPI’s and essential correlations of the dataset.
- Figure explanation labels and textual tools (smart narratives) are used for each graph to provide informative elements about what cannot be represented graphically, and/or summarize the main information contained in the visual object itself.
The design of the visual objects of the Dashboards, in addition to the best practices described is based on the preattentive attributes i.e., the visual properties we notice without using conscious effort to determine what information captures our attention.
Brain processes occur at an extremely fast rate (200 ms) after exposure to a visual stimulus and do not require sequential searching; therefore, they are a very powerful tool for each of us: they determine what is noticed before anything else.
Four preattentive visual properties have been scientifically defined:
– Shape (orientation, line length, line width, size, shape, curvature, closure, marks);
– Color (intensity, hue);
– Spatial positioning (2D position);
Color is probably the most powerful attribute at our disposal; therefore, from a data visualization perspective, its use assumes strategic value.
For each visual object, the functionality of:
– Roll-up: reduction of the level of detail by aggregation of data and a level increase of a dimension hierarchy, aggregation is a process that involves a reduction in the amount of data, through a synthesis that changes the scale and increases the stability;
– Drill-down: the opposite operation to Roll Up and allows the detail of a selected data set to be analyzed;
– Slicing: a range of a certain size of analysis is selected and the cube is filtered, accordingly, obtaining a “slice.” This is done by selecting any element in the dashboard; all visual objects are filtered to show the numbers for that element. In particular, numerical indicators are recalculated and in the diagrams the portions of interest are highlighted.
– Drill Through: creates landing pages in the report to analyze a specific entity, starting from a data point in the source report to get the filtered details in that context.
Our solutions involve the use of OLAP (Online Analytical Processing) cubes, or a data structure that overcomes the limitations of relational databases by providing rapid data analysis tools.
OLAP cubes can visualize and sum large amounts of data, while also providing users with searchable access to any data point so that data can be distributed, sectioned and deleted as needed to handle the widest variety of queries relevant to a user’s area of interest.
Data are presented in a format in which they are categorized into hierarchies and categories to enable deeper analysis. Dimensions can have natural hierarchies to allow users to drill down to granular levels of detail.
Each axis of the cube represents a possible dimension of analysis; each dimension can be viewed at multiple levels of detail identified by attributes structured in hierarchies.
Each multidimensional cube focuses on a fact relevant to the analytical and decision-making process.
In summary, the cube represents a set of events, described quantitatively by numerical measures.
The definition of the data of interest is placed at the basis of model design using Dimensional Fact Model (DFM) tools, i.e., a schema that allows intuitive representation of what analyses the cube allows.
Each cube is composed of a number of dimensions that serve as data search coordinates, arranged in the branches of the schema, and measures, data aggregation criteria.
In the Microsoft environment, cube modeling is done using Visual Studio.
In OLAP cubes, data are historicized and structured into aggregations.
In order to ensure maximum performance, data storage in OLAP systems will be done through columnstore index, that is, data are indexed and stored in groups that correspond to table columns.
This structure facilitates queries for reading large amounts of data and with high computational complexity.
In addition, having the data in the same columns have the same format, compared to grouping by rows there is an improvement in terms of memory space usage due to better compression ensuring resource savings and facilitating data sorting processes.
Cubes form the basis for the design of dashboards and are key players in the data warehouse setup.
Dashboards allow even non-expert users to explore the structured data in OLAP cubes. The goal is to enable any user to perform high value-added analysis even without having advanced analytical skills
The main innovation proposed is the application of artificial intelligence algorithms in data analysis tools, through two main lines of application:
– To the analyses themselves, to generate of predictive models based on datasets, and time series.
– To the purposes of creating more autonomous analysis tools.
In the latter case, the predictive powers of the algorithms are harnessed to anticipate some of the BI design tasks, in a set of capabilities brought together under the term “Augmented Analytics” that is, a statistical data analysis approach based on the use of Machine Learning and Natural Language Processing algorithms aimed at automating the analysis processes normally performed by specialists
The use of Machine Learning (ML) consists of the development of algorithms developed in Python and R languages that can generate themselves (learn) through experience and data.
Ensuring the availability of “Automated Insights” features that even with insignificant user input generate a detailed analysis picture by selecting variables and graphical tools automatically. To learn from experience, algorithms take into account the results they obtain: in the case of Power BI, after using these features the user can send feedback directly from the program on how much he or she appreciated the algorithm’s output.