Function: Predicts the effect of an a/b experiment
Typical Use Case: The user has collected data of repetitive events such as weekly sales in a set of stores. In some of the stores they apply a change and would like to analyse whether the change affects the sales positively or negatively.
Description: The block analyses the control data and tries to find a good prediction of the treatment data (up to the point of the experiment) using the control data. This prediction is then extended for the treatment stores over the time period of the experiment, yielding a synthetic time-line of the treatment stores as if the experiment had not occurred. Now it is possible to compare the synthetic treatment with the real treatment to analyse whether the experiment was successful or not.
A sales manager for a supermarket chain consisting of around 100 supermarkets has collected weekly sales data over the last couple of years for each of the stores. To increase profit he wonders whether the layout of the stores should be changed so that it takes customers longer to traverse the stores from entrance to exit. He orders the change of layout in some of the stores (the treatment stores) and keeps the layout for most of the other stores the same (the control stores). He then continues to collect sales data during this experiment to identify if the change in layout affects profits positively or negatively.
We create a workflow by dragging the weekly sales demo data onto the workspace. The data consists of a list of sales per week, the store in which the sales were made and an identifier whether or not the store belonged to the control or the treatment set.
A/B testing workflow. Data is split into a control and treatment set and then analysed
Weekly sales data
The weekly sales data contains a field "Store Type" which contains the values CONTROL and EXPERIMENT. Using the record filter block, we can split the data into control and experiment sets.
Record filter to split data
The respective sets are then connected to the a/b testing block inputs.
Once connected, we can configure the block:
Options for Prediction block
Response: Select the field with values you are trying to improve with the a/b experiment. In our testcase this is gross revenue.
Predictor Fields: Select fields that can be used to predict the response field. The response can be a part of that. These fields will be used to create a prediction of the response of the treatment set up to the point of the experiment. If similar stores share similar patterns in e.g. revenue, then it is very reasonable to predict treatment set revenue with control set revenue.
Store Identifier Field: Select here the field used to distinguish between different stores.
Date Field: Select the field that contains the dates of the time-series.
Equidistant Dates: This will make the block assume that the time between each date is the same (necessary for accurate estimation of the 'synthetic treatment'). If not checked the block will test if the time series is has a interval of a second, minute, hour, day, weekdays, month, quarter or year.
Test period start: Select the date at which the a/b experiment started.
Experiment Has Specific End Date: If not selected, the block will assume that the experiment was performed from the start date until the last record in your data-set. If selected, you can specify an end date.
Missing value behaviour: Missing values might or might not represent problems with your data. If missing data is encountered by the Reduce Dimensions block, they cannot be used for dimensionality reduction and are ignored by default. They will still be present in the output data, but their fields for reduced dimensions will be left empty. If you would like to entirely remove these records from the output data, checking this option will do so.
The output data consists of the treatment stores over the full time-range of the data. The treatment field contains the real observed data in the treatment stores, while the control field contains the synthetic (predicted) data using the control stores, including upper and lower bounds for this prediction. Point effect is the treatment field minus the control field and specifies for each record what was the effect of the experiment. Cumulative Effect is the cumulation of the point effects over the time-range of the experiment. Test period indicates whether a record is from a date before (or after) the experiment, or from a date during the experiment.