A/B Testing

Modified on Tue, 4 Jun, 2024 at 9:19 AM

In Brief

Function: Predicts the effect of an a/b experiment

Typical Use Case: The user has collected data of repetitive events such as weekly sales in a set of stores. In some of the stores they apply a change and would like to analyse whether the change affects the sales positively or negatively.

Description: The block analyses the control data and tries to find a good prediction of the treatment data (up to the point of the experiment) using the control data. This prediction is then extended for the treatment stores over the time period of the experiment, yielding a synthetic time-line of the treatment stores as if the experiment had not occurred. Now it is possible to compare the synthetic treatment with the real treatment to analyse whether the experiment was successful or not.

Case Study

A sales manager for a supermarket chain consisting of around 100 supermarkets has collected weekly sales data over the last couple of years for each of the stores. To increase profit he wonders whether the layout of the stores should be changed so that it takes customers longer to traverse the stores from entrance to exit. He orders the change of layout in some of the stores (the treatment stores) and keeps the layout for most of the other stores the same (the control stores). He then continues to collect sales data during this experiment to identify if the change in layout affects profits positively or negatively.

Workflow

We create a workflow by dragging the weekly sales demo data onto the workspace. The data consists of a list of sales per week, the store in which the sales were made and an identifier whether or not the store belonged to the control or the treatment set.

A/B testing workflow. Data is split into a control and treatment set and then analysed

Input data

Generally speaking, the data going into the A/B testing block has to consist of the following parts:

Two datasets: One for the CONTROL and the other for the EXPERIMENT.
1. The CONTROL dataset contains data from all the stores that did not participate in the experiment. The data from the stores is going to be compared against the stores participating in the experiment.
2. The EXPERIMENT dataset contains data from the stores that participated in the experiment. The stores contained within are going to be analysed by the block and will be part of the block's output.
Both datasets need to contain 2-3 time periods:
1. A mandatory time period BEFORE the experiment started.
2. A mandatory time period in which the experiment was performed.
3. An optional time period after the experiment.

Please note that if your data does not contain a time period before the actual A/B experiment started, this block cannot be used.

Weekly sales data

The weekly sales data contains a field "Store Type" which contains the values CONTROL and EXPERIMENT. Using the record filter block, we can split the data into control and experiment sets.

Record filter to split data

The respective sets are then connected to the a/b testing block inputs.

Options

Once connected, we can configure the block:

Options for Prediction block

Response: Select the field with values you are trying to improve with the a/b experiment. In our test case this is gross revenue.

Predictor Fields: Select fields that can be used to predict the response field. The response can be a part of that. These fields will be used to create a prediction of the response of the treatment set up to the point of the experiment. If similar stores share similar patterns in e.g. revenue, then it is very reasonable to predict treatment set revenue with control set revenue.

Store Identifier Field: Select here the field used to distinguish between different stores.

Date Field: Select the field that contains the dates of the time-series.

Equidistant Dates: This will make the block assume that the time between each date is the same (necessary for accurate estimation of the 'synthetic treatment'). If not checked the block will test if the time series is has a interval of a second, minute, hour, day, weekdays, month, quarter or year.

Test period start: Select the date at which the a/b experiment started.

Experiment Has Specific End Date: If not selected, the block will assume that the experiment was performed from the start date until the last record in your data-set. If selected, you can specify an end date.

Missing value behaviour: Missing values might or might not represent problems with your data. If missing data is encountered by the Reduce Dimensions block, they cannot be used for dimensionality reduction and are ignored by default. They will still be present in the output data, but their fields for reduced dimensions will be left empty. If you would like to entirely remove these records from the output data, checking this option will do so.

Output

The output data consists of the treatment stores over the full time-range of the data. The treatment field contains the real observed data in the treatment stores, while the control field contains the synthetic (predicted) data using the control stores, including upper and lower bounds for this prediction. Point effect is the treatment field minus the control field and specifies for each record what was the effect of the experiment. Cumulative Effect is the cumulation of the point effects over the time-range of the experiment. Test period indicates whether a record is from a date before (or after) the experiment, or from a date during the experiment.

Interpretation of the results

The block produces data that can power 3 informative views:

Treatment and Control
Point Effect
Cumulative Effect

Treatment and Control

This graph shows how differently treatment and control stores evolve over time. In our use case, the graph looks like this:

Before the start of the experiment (indicated by the vertical red line) the treatment and control stores are very similar in nature and hover around the same average values. After the start of the experiment however, the treatment line diverges from that of the control indicating that the stores now sell more items and that the treatment has a positive effect.

Point Effect

The point effect is the control subtracted from the treatment. It shows the difference between control and treatment stores in one line, including the confidence bounds which indicates possible ranges for the difference line.

In our test case, we can again see that before the the start of the experiment the difference hovers around 0, meaning that there is no real difference between treatment and control, but then increases to around just short of 10000, indicating that the treatment stores on average increased their daily sales by that amount.

Cumulative Effect

The cumulative effect are the gains (or losses) incurred during the time of the experiment added together.

Here, we see a sharp cumulative increase from the start of the experiment to the 1750000 mark. This is the overall gain achieved through the reorganisation of the stores at the end of the experiment.