Comparing Pure Geo Experiments to TBR Causal Effect Analysis

31 Oct 2019 - Tags: measurement

Recently I read a paper, “Estimating Ad Effectiveness using Geo Experiments in a Time-Based Regression Framework”, with the folks at Data Science Minneapolis. Essentially the paper is about how one can use both geographic information and a time-based regression (TBR) to estimate the causal effect of an ad campaign on your incremental return on ad spend (iROAS, the ratio of profit due to an ad to the money spent on that ad). Previous methods of estimating the effectiveness of ad campaigns included a purely geographical zone based regression (GBR).

The paper claims that using TBR is better than GBR when you’ve got few geographic areas (“geos”) to which you’re able to selectively target ads. This makes sense, since GBR draws its statistical power from the number of geos available. Apparently some people even suggest always preferring TBR to GBR. Unfortunately, in the paper they didn’t actually show any evidence that TBR is better than GBR when you have only a few geos. However, they created an R package, GeoexperimentsResearch, implementing the methods they developed in the paper, so let’s use it to see if TBR is more sensitive than GBR with a small number of geos!

Outline

Setup
Loading the Data
Purely Geo-based Regression Analysis
Time-Based Regression Analysis
Dependence on the Number of Geos
Conclusion
Original Computing Environment

Setup

Let’s install the GeoexperimentsResearch package,

devtools::install_github("google/GeoexperimentsResearch",
                         dependencies=TRUE)

And then set up our computing environment:

# Packages
library(tidyverse)
library(purrr)
library(tidyr)
library(GeoexperimentsResearch)

Data

First we’ll load the example dataset used in the paper (which comes with the GeoexperimentsResearch package!). The datset includes the group assignments, that is, which geo region corresponds to the control vs experimental group.

data(geoassignment)
head(geoassignment)

   geo geo.group
  1         2
 2         1
 3         1
 4         2
 5         1
 6         2

These groups are balanced (at least in terms of the number of geos per group):

geoassignment %>% 
  group_by(geo.group) %>%
  tally()

# A tibble: 2 x 2
  geo.group     n
      <int> <int>
1         1    50
2         2    50

The dataset also includes the product sales and ad costs for each geo over time.

data(salesandcost)
head(salesandcost)

        date geo   sales cost
2015-01-05   1 7227.32    0
2015-01-05  10 1827.21    0
2015-01-05 100   23.98    0
2015-01-05  11 1501.10    0
2015-01-05  12 1371.61    0
2015-01-05  13 1366.81    0

We can see a pretty strong weekly trend if we plot the log sales over time for each geo individually:

ggplot(salesandcost, aes(date, sales, color=geo)) +
  scale_y_continuous(trans='log10') +
  theme(legend.position = "none") +
  geom_line()

We’ll also need to define when the experiment started and ended, by creating an ExperimentPeriods object to store the pretest/test/cooldown periods.

periods = ExperimentPeriods(
  period.dates = c("2015-01-05", "2015-02-16", "2015-03-15", "2015-04-07"),
  period.names = c("PreTest", "Test", "Cooldown"))
periods

  Period     Name      Start        End Length
    0  PreTest 2015-01-05 2015-02-15     42
    1     Test 2015-02-16 2015-03-14     27
    2 Cooldown 2015-03-15 2015-04-07     24

To use the GeoexperimentsResearch package, we’ll combine all 3 data sources (the per-geo time series, pretest/test/cooldown periods, and the group assignments) into a GeoExperimentData object.

data = GeoExperimentData(
  GeoTimeseries(salesandcost, metrics=c("sales", "cost")),
  periods=periods,
  geo.assignment=GeoAssignment(geoassignment))
head(data)

        date geo period geo.group assignment   sales cost .weekday .weeknum .weekindex
2015-01-05   1      0         2         NA 7227.32    0        1        1     201501
2015-01-05  10      0         1         NA 1827.21    0        1        1     201501
2015-01-05 100      0         1         NA   23.98    0        1        1     201501
2015-01-05  11      0         1         NA 1501.10    0        1        1     201501
2015-01-05  12      0         2         NA 1371.61    0        1        1     201501
2015-01-05  13      0         1         NA 1366.81    0        1        1     201501

Purely Geo-Based Regression Analysis

We can perform a purely geo-based regression analysis using the DoGBRROASAnalysis function, which aggregates the data over pretest/test period, but not over geo group, and draws its statistical power from the number of geo regions available.

result = DoGBRROASAnalysis(data, response='sales', cost='cost',
                           pretest.period=0,
                           intervention.period=1,
                           cooldown.period=2,
                           control.group=1,
                           treatment.group=2)
result

      estimate precision    lower upper level incr.resp incr.cost thres prob model
iROAS  3.64077  0.605604 3.035166   Inf   0.9  182038.5     50000     0    1  gbr1

The prob is the posterior probability that the incremental return on ad spend (iROAS) is greater than some threshold (thres, 0 by default). Using all the geos, the model is extremely certain that the iROAS is >0:

summary(result)$prob

[1] 1.0

Time-Based Regression Analysis

The GeoexperimentsResearch package also includes a function to perform the Time-Based Regression (TBR) Causal Effect Analysis. This analysis uses the aggregate sales of the control group to predict the aggregate sales of the experimental group, computes the difference between the predicted and actual sales (the supposed causal effect of the ad campaign), and essentially integrates that difference to get the estimated cumulative effect of the campaign.

result = DoTBRAnalysis(data, response='sales',
                       model='tbr1',
                       pretest.period=0,
                       intervention.period=1,
                       cooldown.period=2,
                       control.group=1,
                       treatment.group=2)
summary(result)

            estimate precision    lower upper       se level thres prob model
incremental 143028.6  11348.74 131679.9   Inf 8709.188   0.9     0    1  tbr1

The plot.TBRAnalysisFitTbr1 function shows the actual aggregate sales of the experimental group (in the top panel in red), the predicted aggregate sales of the experimental group (as predicted from the aggregate sales of the control group, in blue), the difference between the actual and predicted sales (in the middle panel), and the estimated cumulative effect of the campaign on sales (in the bottom panel).

plot(result)

The package has a similar function which estimates the iROAS from this estimated cumulative effect.

result = DoTBRROASAnalysis(data, response='sales', cost='cost',
                           model='tbr1',
                           pretest.period=0,
                           intervention.period=1,
                           cooldown.period=2,
                           control.group=1,
                           treatment.group=2)
summary(result)

      estimate precision    lower upper level incr.resp incr.cost thres prob model
iROAS 2.860572 0.2269749 2.633598   Inf   0.9  143028.6     50000     0    1  tbr1

And as with the GBR model, this model fit includes the posterior probability that the iROAS is greater >0:

summary(result)$prob

[1] 1.0

Dependence on the Number of Geos

To simulate what each model’s estimate would look like as we have more or less geos to work with, we can subsample geos from the dataset. So, first we’ll create a function to create a dataset with subsampled (but still balanced) groups.

subsample_geos = function(groups, timeseries, periods, N) {

  # Subsample the geos
  sub_groups = groups %>%
    group_by(geo.group) %>%
    nest() %>%
    mutate(data=map(data, function(df) sample_n(df, N))) %>%
    unnest()
  
  # Subsample the corresponding geos from the time series
  sub_timeseries = timeseries %>%
    filter(geo %in% sub_groups$geo)
  
  # Return the subsampled experiment data
  GeoExperimentData(
    GeoTimeseries(sub_timeseries, metrics=c("sales", "cost")),
    periods=periods,
    geo.assignment=GeoAssignment(sub_groups))
}

Then, we can fit the models to subsampled datasets with different number of geos.

# Settings
Ngeos = seq(2, 20) #number of geos to test
Ns = 20 #number of random subsamples per #geo

# DF to store the data
df = tibble(num_geos = numeric(),
            type = character(),
            prob = numeric())

# For different possible # of geos,
for (iG in Ngeos) {
  
  # Subsample multiple times
  for (iS in 1:Ns) { 
    
    # Create dataset with subsampled geos
    data = subsample_geos(geoassignment, salesandcost, periods, iG)
    
    # Estimate the iROAS with GBR
    result = DoGBRROASAnalysis(data, response='sales', cost='cost',
                               pretest.period=0,
                               intervention.period=1,
                               cooldown.period=2,
                               control.group=1,
                               treatment.group=2)
    prob = summary(result)$prob
    df = add_row(df, num_geos=iG, type='GBR', prob=prob)
    
    # Estimate the iROAS with TBR
    result = DoTBRROASAnalysis(data, response='sales', cost='cost',
                               model='tbr1',
                               pretest.period=0,
                               intervention.period=1,
                               cooldown.period=2,
                               control.group=1,
                               treatment.group=2)
    prob = summary(result)$prob
    df = add_row(df, num_geos=iG, type='TBR', prob=prob)
  }
}

Then we can plot the models’ estimates of how likely it is that the iROAS is positive (which, for this data, it is) as a function of how many geos were used.

ggplot(df, aes(x=num_geos, y=prob, color=type)) +
  geom_jitter(height=0.0, width=0.2) +
  stat_smooth(aes(fill=type))

Conclusion

Indeed it looks like the time-based regression’s estimate of the iROAS is more sensitive when you have a low number of geos, especially with less than around 10. Since TBR doesn’t ever really seem to be worse than a pure geo-based regression (at least with this dataset and these experiments), the suggestion to always prefer TBR to GBR is probably good advice!

Original Computing Environment

writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))

CXXFLAGS=-O3 -Wno-unused-variable -Wno-unused-function

devtools::session_info()

Session info --------------------------------------
 
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, mingw32             
 ui       RTerm                       
 language (EN)                        
 collate  English_United States.1252  
 tz       America/Chicago             
 date     2019-10-31
 
Packages ------------------------------------------
 
 package                * version date      
 assertthat               0.2.1   2019-03-21
 backports                1.1.2   2017-12-13
 base                   * 3.5.1   2018-07-02
 bindr                    0.1.1   2018-03-13
 bindrcpp               * 0.2.2   2018-03-29
 broom                    0.5.0   2018-07-17
 cellranger               1.1.0   2016-07-27
 cli                      1.0.0   2017-11-05
 colorspace               1.3-2   2016-12-14
 compiler                 3.5.1   2018-07-02
 crayon                   1.3.4   2017-09-16
 datasets               * 3.5.1   2018-07-02
 devtools                 1.13.6  2018-06-27
 digest                   0.6.16  2018-08-22
 dplyr                  * 0.7.6   2018-06-29
 evaluate                 0.11    2018-07-17
 fansi                    0.3.0   2018-08-13
 forcats                * 0.3.0   2018-02-19
 GeoexperimentsResearch * 1.0.3   2019-10-31
 ggplot2                * 3.2.1   2019-08-10
 glue                     1.3.0   2018-07-17
 graphics               * 3.5.1   2018-07-02
 grDevices              * 3.5.1   2018-07-02
 grid                     3.5.1   2018-07-02
 gtable                   0.2.0   2016-02-26
 haven                    1.1.2   2018-06-27
 hms                      0.4.2   2018-03-10
 htmltools                0.3.6   2017-04-28
 httr                     1.3.1   2017-08-20
 jsonlite                 1.5     2017-06-01
 knitr                    1.25    2019-09-18
 labeling                 0.3     2014-08-23
 lattice                  0.20-35 2017-03-25
 lazyeval                 0.2.1   2017-10-29
 lubridate                1.7.4   2018-04-11
 magrittr                 1.5     2014-11-22
 MASS                     7.3-50  2018-04-30
 memoise                  1.1.0   2017-04-21
 methods                * 3.5.1   2018-07-02
 modelr                   0.1.2   2018-05-11
 munsell                  0.5.0   2018-06-12
 nlme                     3.1-137 2018-04-07
 pillar                   1.3.0   2018-07-14
 pkgconfig                2.0.2   2018-08-16
 plyr                     1.8.4   2016-06-08
 purrr                  * 0.2.5   2018-05-29
 R6                       2.2.2   2017-06-17
 Rcpp                     0.12.18 2018-07-23
 readr                  * 1.1.1   2017-05-16
 readxl                   1.1.0   2018-04-20
 reshape2                 1.4.3   2017-12-11
 rlang                    0.4.1   2019-10-24
 rmarkdown                1.10    2018-06-11
 rprojroot                1.3-2   2018-01-03
 rstudioapi               0.7     2017-09-07
 rvest                    0.3.2   2016-06-17
 scales                   1.0.0   2018-08-09
 stats                  * 3.5.1   2018-07-02
 stringi                  1.1.7   2018-03-12
 stringr                * 1.3.1   2018-05-10
 tibble                 * 1.4.2   2018-01-22
 tidyr                  * 0.8.1   2018-05-18
 tidyselect               0.2.4   2018-02-26
 tidyverse              * 1.2.1   2017-11-14
 tools                    3.5.1   2018-07-02
 utf8                     1.1.4   2018-05-24
 utils                  * 3.5.1   2018-07-02
 withr                    2.1.2   2018-03-15
 xfun                     0.3     2018-07-06
 xml2                     1.2.0   2018-01-24
 yaml                     2.2.0   2018-07-25

Brendan Hasz