One of my favorite tools that I’ve been using for the past few months is the
Google BigQuery command line tool.
bq tool lets you interact with Google BigQuery through the command line interface. It’s a part of the
Google Cloud SDK.
So, why would you want to use this tool?
Great question. With this tool, you can run queries like
bq query "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"
and it will return a result programmatically for you.
At my job though, we use BigQuery for our data warehouse. We also have specific tables that need to be built
or rebuilt on a daily basis using the same queries. With that in mind, I was able to set up a
cronjob that runs
on a given interval that will automatically build datasets for me.
Here’s an example that will walk through.
#!/bin/bash # Variables token='token' url='https://github.com' table='table' sql=$(curl -H "Content-Type: application/json" -H "Authorization: token $token" $url | jq '.content' | sed -e 's/\\n/ /g' -re 's/"//g' | base64 --decode) bq query --q --destination_table '$table --allow_large_results --replace $sql
In this script, I’m remotely calling a large sql file that’s stored on Github.
bq query will then run the
$sql variable and save the output to a given table.
--allow_large_results lets you return large query results and
--replace will replace all of the data in the destination table.
If you’re interested in how the
sql variable is set up, you can check out
my post from yesterday.