Analyse

Translation of technical results into business insights that support strategic and operations decisions.

Business Questions

Compared to its predecessors, Gather and Manage, the Analyse phase is much more focused on business requirements and questions. It is most important that the data scientist have business knowledge and experience to engage with the business leaders and their teams to encourage brave and realistic business questions that will drive the analysis. Companies are also using data analysis to drive innovation.

There are frameworks that can help with the formulation of the questions. There are reasons behind events that have to be explained so that it can either be repeated or avoided. There are insights that decision makers wish they knew a month ago, that is now reality. Lastly, there is always the question that is most difficult for leaders, which is what is my best next action.

The data scientists have to translate these business questions into questions data and science can answer.

Subject Matter Experts

In most complex businesses, the translation of the meaning of the raw data requires subject matter experts (SME). Data scientists will rely on these SMEs to give meaning to the data, explain anomalies (which can be data that should be excluded or focused on). Experts can also provide the basic relationships between the data, so that a clear understanding is created of the following:

  • Causation - identification of a chain of events in data where one observation is caused by another
  • Correlation - identification of common attributes between seemingly unrelated data
These insights will help data scientists formulate hypothesis and models that can be tested using the data. This process is experimental and works on elimination. Some ideas will pan out and others will not, but each result is a step towards a discovery.

Types of Analysis

As mentioned previously, the Analyse phase is a translation of business questions into analytical methods. These methods can be classified into four categories:

  • Descriptive Analytics - understanding what happened
  • Diagnostic Analysis - finding the reasons why something happened
  • Predictive Analysis - using past events to suggest what might happen next
  • Prescriptive Analysis - using past actions to suggest what is the best next action
Data Scientists will use these categories to select the best type of analysis that might best answer the questions.

Algorithm Selection

Within each of the types of analysis mentioned above, are multiple different algorithms that can be used. It requires skill and experimentation to find the best process that provides the best results. Some of these algorithms have been invented decades ago and others very recently. There are lots of scientists working in this field now so it is constantly expanding.

The algorithms can be classified into the following groups:

  • Clustering - identifies structure in the data
  • Anomaly Detection - identifies elements in the data different from most others
  • Regression - predicts values based on past observations
  • Two-class Classification - classifies data into two categories based on past events
  • Multi-class Classification - classifies data into multiple categories, also based on past events
By coupling these mathematical tools with the correct analytical methods, data scientists can produce mathematical results. These results are mathematically meaningful but does not answer the business questions yet. In the Insights phase we show how this translation works.