Network Analysis

Modified on Wed, 26 Feb, 2020 at 2:10 PM

In Brief

Function: Given links which form a network of connected entities, the block reveals which entities in the network are important.

Typical Use Case: A company is analysing social network data such as tweets and wants to discover which people and companies are most influential in a particular area.

Case Study

We are interested in building a collection of important Fifa Worldcup tweets for subsequent brand analysis. We don't know which tweets from which users to download in order to optimise our data gathering effort and therefore start with downloading tweets matching worldcup related keywords. From these tweets we can extract a list of users. In order to improve our database of users from which we will acquire more tweets, we use the Network Analysis block. The block assigns a score to each user, giving important users a higher score. After sorting the users by importance, we add the most important ones to our database, download their tweets and continue the cycle until we have acquired enough data.

Workflow

We construct a workflow by connecting the Twitter block to the De-Tokenise block which is connected to the Network Analysis block. The Twitter block is set up with the keyword worldcup, and the De-Tokenise block is set up so that we split apart comma-separated entries in the field "User mentions ids".

Example Network Analysis workflow

Input Data

The input data consists of at least two fields. One field for the origin of a connection, and one field for the target. In our case we chose to form connections between users that mentioned other users in their tweets. The user creating the tweet will be the origin, while the user being mentioned the target.

Worldcup connections data

Options

The next step is configuring the Network Analysis block. Clicking on the block reveals the following options:

Network Analysis options

Link from: Select here the field which represents the origin of a link.

Link to: Select here the field which represents the target of a link.

Link weight: It is possible to specify a weight for each link. Links with high weights will contribute to a larger extend to the importance of the target entity

Allow nodes to be part of multiple clusters: The block assigns one cluster to each entity. However, it is possible that an entity is actually part of multiple overlapping clusters. If this box is unchecked, you would only get the strongest cluster association. If the box is checked, you get all associations (which means that an entity might appear multiple times in the output).

Output

Scores: The scores output contains the entities from the field specified in "Link to" together with a score from 0 to 10 specifying the entity's importance. The higher the score the more important an entity is. Additionally the number of incoming and outgoing links is presented as well.

Clusters: In addition to performing a ranking of all entities, the block also performs a cluster analysis. The output consists of records specifying the entities from the field specified in "Link to" which are assigned a cluster id. Entities with the same cluster id share significantly more connections between themselves as they share with entities from other clusters. In addition the likelihood that an entity is actually part of the cluster is given, as well as an un-normalised raw weight.

Example output for Network Analysis