Analyzing Google AdWords With R Part II

Chip Oglesby bio photo By Chip Oglesby

In part one we walked through the steps of getting set up to pull data from Google Adwords using the RAdwords package. Today we’ll dive in a bit deeper with some exploratory data analysis.

Let’s get started shall we?

yesterday <- gsub("-","",format(Sys.Date()-1,"%Y-%m-%d"))
thirtydays<- gsub("-","",format(Sys.Date()-29,"%Y-%m-%d"))

# Create statement
body <- statement(select = c(
  where = "AdNetworkType1 = SEARCH",
  start = thirtydays,
# Pull data from Adwords
data <- getData(clientCustomerId='xxx-xxx-xxxx', google_auth=google_auth, statement=body)

First lets walk through what’s happening with the code. The first two variables yesterday and thirtydays will take the current system date and format it for Adwords. It will give us the dates from 30 days ago and yesterday’s date.

Second, the variable body constructs our satement that we’ll use to request Adwords data. Here we’re asking for Keywords Performance Report data for Search Networks for the past thirty days. We’ll get the following items in return:

  1. Campaign Name
  2. Ad Group Name
  3. Criteria (Keyword)
  4. Keyword Match Type (Broad, Phrase, Exact)
  5. Quality Score
  6. Impressions
  7. Clicks
  8. Click Through Rate (Clicks divided by Impressions)
  9. Conversions
  10. Average Cost Per Click (Clicks divided by Cost)
  11. Cost
  12. Date

Here the date variable, although last is still important. When we call the AdWords API, we’ll get all of the results for each day in the call. That’s great!

Third, we are going to save the call with getData() to a data frame called data. Once the results have been returned, we can begin our analysis.

Quality Score

To begin with, let’s run summary(data). This will gives us a summary of our data frame. One of the most important factors with keywords are the Quality Score of your keywords. The higher your Quality Score, the lower your CPC.

Let’s breakout the performance using dplyr, grouping by Quality Score and counting keywords:

data %>% 
group_by(Qualityscore) %>% 
summarize(keywords = length(Keyword))
Qualityscore keywords
10 1
2 2
3 6
4 4
5 9
6 3
7 9
8 8
9 7

Let’s also plot a histogram to visualize the same information:

data %>% 
  ggplot(aes(Qualityscore)) + 
  geom_histogram(bins = 20) +
  labs(title = "Histogram of Quality Score", 
       x = "Quality Score", 
       y = "Count")

We can see that there are 21 keywords with a Quality Score lower than six. Depending on the goals and needs of your campaigns, you may want to save these for later to pause or rewrite the ads associated with the keywords. You can do that using:

keywordsToChange <- data %>% 
filter(Qualityscore < 6)

The quickest win that you get with your account is to increase the quality score. There are at least three main factors that affect your quality score:

  1. Expected Click-Through Rate
  2. Ad Relevance
  3. Landing Page Experience

You can also add the following variables to your body request if you’d like to see them: SearchPredictedCtr, CreativeQualityScore and PostClickQualityScore.

Linear Regression

One of the quickest assumptions that people make about Adwords is that increasing your spend or your cost per click will automatically increase the number of conversions that you’ll recieve for that keyword.

To test that assumption, let’s build a linear regression model which can be done with lm(data$Conversions ~ data$CPC) and we can look at the full summary with summary(lm(data$Conversions ~ data$CPC))

## Call:
## lm(formula = data$Conversions ~ data$CPC)
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.13118 -0.13118 -0.08722 -0.01500  1.87878 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.13118    0.06668   1.967   0.0551 .
## data$CPC    -0.03499    0.03193  -1.096   0.2787  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.3431 on 47 degrees of freedom
## Multiple R-squared:  0.02492,	Adjusted R-squared:  0.00417 
## F-statistic: 1.201 on 1 and 47 DF,  p-value: 0.2787

Our toy model here only explains 0.417% of the variation, so we may want to go back and look more at the outliers and build another model.

With an r-squared of 0.00417 we would want to visualize our data to visualize our assumptions.

Once our linear regression model has been fixed, we can build a way to predict how many conversions we would get if we raised or lowered the cost per click.

In the next post, we’ll look at more ways of working with Adwords data.