Translation of technical results into business insights that support strategic and operations decisions.
Compared to its predecessors, Gather and Manage, the Analyse phase is much more focused on business requirements and questions. It is most important that the data scientist have business knowledge and experience to engage with the business leaders and their teams to encourage brave and realistic business questions that will drive the analysis. Companies are also using data analysis to drive innovation.
There are frameworks that can help with the formulation of the questions. There are reasons behind events that have to be explained so that it can either be repeated or avoided. There are insights that decision makers wish they knew a month ago, that is now reality. Lastly, there is always the question that is most difficult for leaders, which is what is my best next action.
The data scientists have to translate these business questions into questions data and science can answer.
Subject Matter Experts
In most complex businesses, the translation of the meaning of the raw data requires subject matter experts (SME). Data scientists will rely on these SMEs to give meaning to the data, explain anomalies (which can be data that should be excluded or focused on). Experts can also provide the basic relationships between the data, so that a clear understanding is created of the following:
- Causation - identification of a chain of events in data where one observation is caused by another
- Correlation - identification of common attributes between seemingly unrelated data
Types of Analysis
As mentioned previously, the Analyse phase is a translation of business questions into analytical methods. These methods can be classified into four categories:
- Descriptive Analytics - understanding what happened
- Diagnostic Analysis - finding the reasons why something happened
- Predictive Analysis - using past events to suggest what might happen next
- Prescriptive Analysis - using past actions to suggest what is the best next action
Within each of the types of analysis mentioned above, are multiple different algorithms that can be used. It requires skill and experimentation to find the best process that provides the best results. Some of these algorithms have been invented decades ago and others very recently. There are lots of scientists working in this field now so it is constantly expanding.
The algorithms can be classified into the following groups:
- Clustering - identifies structure in the data
- Anomaly Detection - identifies elements in the data different from most others
- Regression - predicts values based on past observations
- Two-class Classification - classifies data into two categories based on past events
- Multi-class Classification - classifies data into multiple categories, also based on past events