Skip to Content

Macros

nhic_dbt ships a library of 30+ macros that handle the repetitive work of cleaning and transforming health data. All macros live under macros/ and are available to every model without any import.

Data cleaning macros

These macros enforce type safety and normalise nulls, empty strings, and malformed values. Each one wraps a source column expression in SQL and returns a cleaned value of the target type.

MacroPurpose
clean_booleanCast and normalise boolean-like values (1/0, true/false, Y/N)
clean_dateParse and validate date strings; return null for unparseable values
clean_doubleCast to double precision; coerce non-numeric to null
clean_integerCast to integer; coerce non-numeric to null
clean_integer_rangeCast to integer and enforce a min/max range; out-of-range returns null
clean_key_dateLike clean_date but intended for surrogate key columns — fails loudly on null
clean_key_integerLike clean_integer but for key columns — fails loudly on null
clean_key_regexpValidate a key column against a regular expression pattern
clean_key_textTrim and upper-case text key columns; null-safe
clean_key_timeParse and validate time strings for key columns
clean_month_dateParse partial dates in YYYY-MM format and truncate to month
clean_numberCast to numeric (arbitrary precision); coerce non-numeric to null
clean_regexpValidate a text column against a regular expression; non-matching returns null
clean_textTrim whitespace and normalise empty strings to null
clean_timeParse and validate time strings; return null for unparseable values
clean_timestampParse and validate timestamp strings; return null for unparseable values

Usage example

-- models/staging/cemr/stg_cemr__patients.sql select {{ clean_key_integer('patient_id') }} as patient_id, {{ clean_text('first_name') }} as first_name, {{ clean_text('last_name') }} as last_name, {{ clean_date('date_of_birth') }} as date_of_birth, {{ clean_key_text('national_id') }} as national_id, {{ clean_timestamp('created_at') }} as created_at, {{ clean_boolean('is_active') }} as is_active from {{ source('cemr', 'patients') }}

The clean_key_* variants signal intent — these columns are used in joins or as surrogate keys, so unexpected nulls are surfaced as warnings during dbt test.

Utility macros

MacroPurpose
array_castCast elements of a Postgres array from one type to another
date_utilCollection of date manipulation helpers (fiscal periods, age calculation, etc.)
hash_columnGenerate a deterministic MD5 hash key from one or more columns — used for deduplication and surrogate key generation
pivot_key_valuePivot a key-value pair table into wide columns — common for DHIS2 data elements
standardize_facility_nameNormalise health facility names across source systems using a reference lookup

hash_column example

-- Surrogate key from multiple columns select {{ hash_column(['patient_id', 'visit_date', 'facility_code']) }} as visit_sk, patient_id, visit_date, facility_code from {{ ref('stg_cemr__visits') }}

Domain subdirectories

For source-specific logic that doesn’t belong in a general macro, macros are grouped into subdirectories:

DirectoryContains
macros/crvs/Civil registration data transformations and CRVS-specific lookups
macros/efiche/eFICHE outpatient record parsing helpers
macros/hmis_tracker/DHIS2 tracked entity and program stage helpers
macros/incremental_load/Helpers for incremental materialisation strategies
macros/process_dhis2__org_units/DHIS2 organisation unit hierarchy processing
Last updated on