Covariance / Correlation

Modified on Tue, 25 Feb, 2020 at 2:38 PM

In Brief

Function: The Covariance Correlation block provides the covariances and correlations between each field. The covariance is a measure of the amount that one field differs from another. If they are similar a higher positive value would be expected, if they are opposites a strong negative value would be expected. Correlation matrix is almost identical to the covariance matrix, except that the values are limited between -1 and 1.

Typical Scenario: A user obtained a new dataset and wants to perform an explorative analysis. Calculating covariances and correlations between fields in the data set can yield a quick overview how the data is spread and whether there might be relationships between fields.

Case Study

A bank has data on a series of customers who they are trying to subscribe to a bank term deposit. This data is available in demo data under the name "Banking". They want to quickly learn more about how these characteristic interact and use the Covariance Correlation block to do so. In order to obtain a quick overview of the spread of the data, he wants to perform a covariance analysis and sets up the following workflow:

Workflow

Example Covariance Correlation workflow

Input Data

The data has data on 8 thousand clients, with information on their various characteristics such as education, housing, marital status and employment.

Banking data set, describing current banking clients

Options

After the input data is connected to the Covariance Correlation block, the next step is to configure the block. Clicking on the block icon reveals the following option:

Options for Covariance Correlation block

All numeric fields: If this box is checked, all numeric fields are automatically selected in the "Fields to Use" option below.

Fields to Use: Select here the fields for which you want to perform pairwise covariance and correlation statistics.

Output

After the block has been executed, the following output data is generated:

Output data for covariance analysis on the Banking data set