In Brief

Function: Rebalances the data to have the given proportion of the given value

Typical Use Case: A client wishes to process a data set that has a known bias. Simply using the bias raw data will result in false results, hence the data needs to be preprocessed and the bias removed.


Case study

Workflow

We create a simple workflow by connecting the bond prices demo data set to the oversampling block.

Oversampling Example Workflow

Oversampling Example Workflow

Input Data

Here we have a dataset of bond prices. As we see from the graph it is heavily biased for non-perpetual bonds.

Unbalanced Bond PricesUnbalanced Bond Prices

Options

We wish to remove this bias. Hence, we create an oversample block in which we define that 50% of the resulting distribution should consist of perpetual bonds.

Oversampling optionsOversampling options

The field to oversample: Choose here the field whose values will be oversampled.

Value in the field to oversample: Enter the value which will be oversampled.

Percentage of the output to contain this value: Choose here a percentage of all output records which will contain the value selected in the option above.

Output

The resulting distribution is nicely balanced and the bias removed:

Balanced Bond PricesBalanced Bond Prices