Advent of Code with dbt + Postgres (part 1)

It’s that time of year again for the Advent of Code! This year instead of taking the time to learn a new language, e.g. last year I enjoyed learning some Scala, I thought I would try to see how far I would be able to go by just using plain ol’ SQL.

Well… almost plain ol’ SQL. Since dbt has become the norm for data transformation I was going to see how far along I would be able to go with dbt and SQL.

The Setup

Besides what dbt init gives us the only personal additions I included were a docker compose file to spin up a postgres instance to use (yes yes, probably overkill for this)

version: '3.1'

services:
  aoc_db:
    image: postgres
    restart: always
    environment:
      POSTGRES_USER: johnmav
      POSTGRES_PASSWORD: test
    ports:
      - "5432:5432"

and a small makefile so it was easy to setup/teardown the postgres instance

start-postgres:
	docker compose -f stack.yml up --remove-orphans
stop-postgres:
	docker compose -f stack.yml down

Other than that we have what dbt generates for us in a project.

Puzzle 1 Part A

Day 1 of the Advent of Code Parts A/B can be found here

dbt made it easy to load the puzzle inputs by simply copying the data into a .csv file and placing it into the data/ directory, then updating dbt_project.yml with

seeds:
  advent_of_code:
    +enabled: true
    +schema: seed_data
    puzzle_1:
      +column_types:
        depth: int

We can now run dbt seed in our terminal and voilá! Our puzzle input data is readily available in our postgres instance ready to be messed around with. We’ll note that this new seeded data is not in our public schema but in our public_seed_data schema

Part A of Day 1’s puzzle boiled down to map-reduce type puzzle where the mapping is simply looking at if the prior steps measurements was smaller or larger than the current step.

So to model out whether each step was an INCREASE or not from the last step we can create a query in our models/ directory which I called models/puzzle_1/puzzle_1_part_a.sql

SELECT
  ROW_NUMBER() OVER () AS step,
  depth,
  CASE
    WHEN depth - LAG(depth) OVER () > 0
    THEN TRUE
    ELSE FALSE
  END AS increased
FROM {{ ref('puzzle_1') }}

A pretty simple query, for each step in our depth we check if the prior step was smaller, if so then we have an increase! The dbt magic is mostly in the {{ ref('puzzle_1') }} which simply just uses the reference to our seeded table.

Once we have that created in our models we can run dbt run and we end up with a pretty simple table that indicates the step, the current depth, and whether or not we increased in depth.

Finally to get our solution I added a new .sql file to our analysis directory analysis/puzzle_1_part_a.sql

SELECT
  COUNT(
    CASE 
      WHEN increased
      THEN step
      ELSE NULL
    END
  ) AS depth_increase_count
FROM {{ ref('puzzle_1_part_a') }}

Once we run dbt compile this converts our sql into a usable query that we could potentially use in a dashboarding tool, notebook, sql editor whatever. Our compiled sql looks like

SELECT
  COUNT(
    CASE 
      WHEN increased
      THEN step
      ELSE NULL
    END
  ) AS depth_increase_count
FROM "postgres"."public"."puzzle_1_part_a"

Puzzle 1 Part B

For Part B we are doing a similar approach but doing the same operation across a sliding window. We can simply create depth groups using a CTE and then use roughly the same query as we used in part A on the depth groups.

WITH depth_groups AS (
  SELECT 
    ROW_NUMBER() OVER () AS group_number,
    SUM(depth) OVER (ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS depth_sum
  FROM {{ ref('puzzle_1') }}
)

SELECT
  group_number,
  depth_sum,
  CASE
    WHEN depth_sum - LAG(depth_sum) OVER () > 0
    THEN TRUE
    ELSE FALSE
  END AS increased
FROM depth_groups

So this works. But the section outside of the CTE is pretty similar to what we did in Part A. In an effort to not repeat ourselves and utilize some more features of both Jinja and dbt we can create a reusable macro. In the macros/ directory lets create a puzzle_1_compute_increases.sql file that looks like

{% macro compute_increases(table_name, column_name) %}
  SELECT
    ROW_NUMBER() OVER () AS step,
    {{ column_name }},
    CASE
      WHEN {{ column_name }} - LAG({{ column_name }}) OVER () > 0
      THEN TRUE
      ELSE FALSE
    END AS increased
  FROM {{ table_name }}
{% endmacro %}

Simply our macro will take a table name and column name and perform the operations needed to compute whether a new depth is an increase over the last depth.

Now our solution to part A simply becomes:

{{ compute_increases(ref('puzzle_1'), 'depth') }}

and Part B becomes:

WITH depth_groups AS (
  SELECT 
    SUM(depth) OVER (ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS depth_sum
  FROM {{ ref('puzzle_1') }}
)

{{ compute_increases('depth_groups', 'depth_sum') }}

Entirely too much for this simple problem? Probably! Could we name things better, always. But you can imagine as things get more complicated these reusable components of sql code can be quite handy to have in our toolbox.


Author | John Mav

Currently a Senior Analytics Platform Engineer at Flexport