Announcing dbterra: easily sync your jobs with dbt Cloud™️ ☁️

Eric Hagman
Instawork Engineering
5 min readApr 17, 2023

--

Making timely, data-informed decisions is critical at any company, and Instawork is no different. That’s why we invest in a data warehouse that’s kept up-to-date with a robust data pipeline.

We sync all of our data (app, sales, support, etc) into our data warehouse to get a glimpse into how we are doing as a business and what we can do to improve. Our analysts then create reports that guide our teams to deliver on Instawork’s vision.

dbt™️

At Instawork, we use dbt™️ to help transform a lot of of our raw data in our data warehouse into concrete, readable views so that querying for the things we need is fast and easy. Whether it be for sales reports on revenue generated or machine learning features, dbt™️ is an extremely powerful tool to help us empower others at Instawork to do their job better, faster, and with the confidence that the data they’re using is accurate.

What is dbt™️ really?

For those that don’t know what dbt™️ is, I highly recommend reading their blog post on the subject. The gist is that dbt Core™️ is a tool that allows you to commit SQL files that transform data from many sources in your data warehouse into views and other tables in your data warehouse that can be queried. This gives consumers of the data a more approachable way to query things and trust that what they are querying is accurate.

It allows analysts to create a gold standard of tables that are guaranteed to be up-to-date and accurate (including custom business logic) so that the people writing reports don’t have to worry about minor (but important) details.

A world without dbt™️ is a world where queries are copy and pasted, mistakes are propagated throughout reports, and eventually no one ends up trusting the data.

https://xkcd.com/2582/

A brief example of using dbt™️

For example, with a file called models/transactions.sql:

select vendors.name as vendor_name, transactions.price 
from transactions
join transactions on vendors where transactions.vendor_id = vendors.id

We can run:

$ dbt run --select transactions

And with some default settings (excluded here), we can easily create a view called analytics.transactions in our data warehouse that anyone can query from, without worrying about where the underlying data comes from.

select * from analytics.transactions where vendor_name = 'AWS'

While the SQL file above is an extremely simple example, you can imagine how powerful this can get when you add four more joins, some business logic to the mix, and even start building other views off of these tables.

Once you know your data needs, you can then schedule these commands to run as jobs so that you can provide accurate, fast, queryable tables and views without bogging down your data warehouse with the same, complicated query running in multiple reports.

For example, maybe every morning at 8AM, you want to run an expensive query used in your revenue reports because real time data here isn’t super critical but being able to run the query in seconds is! This is a great way to start organizing and optimizing your reporting needs.

dbt Cloud™️

While you can setup your own infrastructure to handle the creation of these views and tables, at Instawork we rely on dbt Cloud™️ to manage our projects, environments, and run our scheduled jobs so that we can focus on the important things (like writing reports!). While dbt Cloud™️ makes working with dbt™️ easy and convenient, as engineers we always want some way to make sure that our configuration is reproducible, easy to change, and has an audit trail in case something goes wrong or we need to debug an issue.

Terraform

Normally at Instawork, we use Terraform for all of our infrastructure-as-code needs. If you’re curious about some of the work we’re doing with Terraform, you can read our other blog post here about our journey writing a Terraform provider.

And while there are certainly Terraform providers that solve this issue (shout out to GtheSheep/dbt-cloud!), for this use case, we wanted something simple that could be used by anyone without any DevOps knowledge required. We want the analysts contributing to our dbt™️ repository to not have to worry about anything besides the SQL at hand… which is why we made dbterra!

Announcing dbterra 🎉

With dbterra, we can keep our scheduled jobs in the same repository as our dbt™️ transforms. The basic building block is the dbt_cloud.yml file in the root of your project. You can easily define your jobs in one place, have them updated via your CI/CD pipeline, and have the changes accompany your transforms in the same pull request so that things can be easily reviewed together. No more confusing Terraform HCL for the analysts, more transparency for the reviewers, and no split focus in your infrastructure pipeline and your data transforms.

Getting started with dbterra

First create your dbt_cloud.yml file in the root of your project. Here is an example dbt_cloud.yml file below.

account:
id: 123

projects:
example_project:
id: 123
jobs:
seed:
environment: bizops
target: production
threads: 4
steps:
- dbt seed
sales:
environment: bizops
target: production
steps:
- dbt run --select marts.sales
generate_docs: true
threads: 32
schedule:
cron: "0 9 * * *"

environments:
bizops:
id: 456

Next, make sure that you’ve set DBT_CLOUD_TOKEN somewhere in your environment. You can find your dbt Cloud™️ token on the profile page. While in CI you’d want to set these as secrets but to test locally you can use the following:

export DBT_CLOUD_TOKEN=abc123  # .bashrc, .zshrc, .profile, etc 
set -gx DBT_CLOUD_TOKEN abc123 # fish shell command

Assuming you’ve downloaded the release or built the binary yourself, you can run dbterra plan. This will show you what will be created/modified/deleted in dbt Cloud™️ just like with Terraform but doesn’t apply any changes yet.

Once you’re sure of the changes, you can run dbterra apply to run all of the commands to sync your dbt_cloud.yml to dbt Cloud™️. It’s as simple as that!

Example of `dbterra plan`

So…

If you use dbt™️ and dbt Cloud™️ and want to take control of your infrastructure and empower your analysts to think less about infrastructure and more about what matters, the data, give dbterra a try!

--

--