In this post, I would not elaborate on already well defined and illustrated topic in DBT for generating surrogate keys in your project and its benefits.
{{ dbt_utils.generate_surrogate_key(['field_a', 'field_b'[,...]]) }}
Reference Articles:
Rather I was drawn to a challenge, where I wanted to use “SHA” for hashing rather than “MD5” while building out few transformed tables from the raw (bronze) data store.
This led me in search of few articles and varied combination of experiments to finally get it working.
So the key to achieve this functionality in DBT is to override its default hashing macro, with the following code snippet be added to your project as macro for the required outcome.
{% macro default__hash(field) -%}
sha(cast({{ field }} as {{ api.Column.translate_type('string') }}))
{%- endmacro %}
Filename: hash.sql
So the before | after running the DBT commands output is as follows:
Before:
md5(cast(concat(coalesce(cast(Field1 as string), '_dbt_utils_surrogate_key_null_'), '-', coalesce(cast(Field2
After:
sha(cast(concat(coalesce(cast(Field1 as string), '_dbt_utils_surrogate_key_null_'), '-', coalesce(cast(Field2
I hope the above helps.