Oversampling

Modified on Wed, 26 Feb 2020 at 11:05 AM

In Brief

Function: Rebalances the data to have the given proportion of the given value

Typical Use Case: A client wishes to process a data set that has a known bias. Simply using the bias raw data will result in false results, hence the data needs to be preprocessed and the bias removed.


Case study

Workflow

We create a simple workflow by connecting the bond prices demo data set to the oversampling block.

Oversampling Example Workflow

Oversampling Example Workflow

Input Data

Here we have a dataset of bond prices. As we see from the graph it is heavily biased for non-perpetual bonds.

Unbalanced Bond PricesUnbalanced Bond Prices

Options

We wish to remove this bias. Hence, we create an oversample block in which we define that 50% of the resulting distribution should consist of perpetual bonds.

Oversampling optionsOversampling options

The field to oversample: Choose here the field whose values will be oversampled.

Value in the field to oversample: Enter the value which will be oversampled.

Percentage of the output to contain this value: Choose here a percentage of all output records which will contain the value selected in the option above.

Output

The resulting distribution is nicely balanced and the bias removed:

Balanced Bond PricesBalanced Bond Prices

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article