In my last post, I did an exploratory data analysis of my Google location history.
One of things that I wanted to calculate and measure is the difference between
different coordinates. For example, how far is it between
48.1265044, 11.6593258 and
40.555608, -105.098397? It turns out that
answering that question is a bit harder than I originally guessed. As you know
the earth isn’t flat (some people would have you believe otherwise) so measuring
the distance between two points when using latitude and longitude will have to
be done using the Haversine Formula.
I’m not going to pretend to understand the trigonometry that goes into calculating this, so the best way that I can explain it is measuring “as the crow flies.”
Okay, so how do we convert that to something usable in
SQL? Well, it turns out
that Felipe Hoffa has a solution for this on Stackoverflow.
SELECT lat, lon, name, (111.045 * DEGREES(ACOS(COS(RADIANS(40.73943)) * COS(RADIANS(lat)) * COS(RADIANS(-73.99585) - RADIANS(lon)) + SIN(RADIANS(40.73943)) * SIN(RADIANS(lat))))) AS distance FROM [bigquery-public-data:noaa_gsod.stations] HAVING distance > 0 ORDER BY distance LIMIT 4
While this is good for calculating a fixed point, I want to calculate the
distance between each previous point. For this, I’ll use
and then rewrite the
Haversine Formula to
properly calculate the distance (I think). I also changed
convert kilometers to miles.
Here’s my example:
Since there are
1,770,882 observations, I have no freaking clue how long it
R to run through something like this, with BigQuery, the query
is done in
13.3 seconds. That’s blazing fast!
Now that we have calculated distances between observations, let’s import it into
R and do some analysis.
- How many miles did I travel between 2011 and 2018?
- 654,572.32 miles. Is that even possible?
- How many miles did I travel by type?
Judging by the distance and the count of observations lets dive
still which comprise the largest number of observations.
Originally I thought the discrepancy could be from the GPS in my phone trying to locate itself by triangulation. It could be that Google’s algorithm wasn’t able to properly figure out what I was doing. Here’s an example from one night during one hour:
- What was the most popular activity by hour of day?
- How did my movements change over time?
- What days of the week were most popular for activities?
Time is of the essence
I’m also intrigued by the idea of “time” with all of this information. I think that it’s fascinating that Google is able to harness this information to do things like:
- Tell you when to leave for work and arrive on time based on traffic patterns
- Tell you the most popular visited times for restaurants.
- Tell you your average commute times at any given time.
- Suggest what you should listen to based on your current location.
Additionally, I’ve used some simple
CASE statements to augment the provided
information to give us seasons based on dates and part of day based on hour.
Since BigQuery only gives us time in
UTC format, I’ll need to do some
additional work to clean up timestamps so they’re in the proper timezone.
I’m working on a way of doing this in
which uses multiple Google maps API’s.