DBT – Use custom hashing function for your surrogate keys

In this post, I would not elaborate on already well defined and illustrated topic in DBT for generating surrogate keys in your project and its benefits.

{{ dbt_utils.generate_surrogate_key(['field_a', 'field_b'[,...]]) }}

Reference Articles:

Rather I was drawn to a challenge, where I wanted to use “SHA” for hashing rather than “MD5” while building out few transformed tables from the raw (bronze) data store.

This led me in search of few articles and varied combination of experiments to finally get it working.

So the key to achieve this functionality in DBT is to override its default hashing macro, with the following code snippet be added to your project as macro for the required outcome.

{% macro default__hash(field) -%}
    sha(cast({{ field }} as {{ api.Column.translate_type('string') }}))
{%- endmacro %}

Filename: hash.sql

So the before | after running the DBT commands output is as follows:

md5(cast(concat(coalesce(cast(Field1 as string), '_dbt_utils_surrogate_key_null_'), '-', coalesce(cast(Field2
sha(cast(concat(coalesce(cast(Field1 as string), '_dbt_utils_surrogate_key_null_'), '-', coalesce(cast(Field2

I hope the above helps.

Leave a Reply

Your email address will not be published. Required fields are marked *