Macros
nhic_dbt ships a library of 30+ macros that handle the repetitive work of cleaning and transforming health data. All macros live under macros/ and are available to every model without any import.
Data cleaning macros
These macros enforce type safety and normalise nulls, empty strings, and malformed values. Each one wraps a source column expression in SQL and returns a cleaned value of the target type.
| Macro | Purpose |
|---|---|
clean_boolean | Cast and normalise boolean-like values (1/0, true/false, Y/N) |
clean_date | Parse and validate date strings; return null for unparseable values |
clean_double | Cast to double precision; coerce non-numeric to null |
clean_integer | Cast to integer; coerce non-numeric to null |
clean_integer_range | Cast to integer and enforce a min/max range; out-of-range returns null |
clean_key_date | Like clean_date but intended for surrogate key columns — fails loudly on null |
clean_key_integer | Like clean_integer but for key columns — fails loudly on null |
clean_key_regexp | Validate a key column against a regular expression pattern |
clean_key_text | Trim and upper-case text key columns; null-safe |
clean_key_time | Parse and validate time strings for key columns |
clean_month_date | Parse partial dates in YYYY-MM format and truncate to month |
clean_number | Cast to numeric (arbitrary precision); coerce non-numeric to null |
clean_regexp | Validate a text column against a regular expression; non-matching returns null |
clean_text | Trim whitespace and normalise empty strings to null |
clean_time | Parse and validate time strings; return null for unparseable values |
clean_timestamp | Parse and validate timestamp strings; return null for unparseable values |
Usage example
-- models/staging/cemr/stg_cemr__patients.sql
select
{{ clean_key_integer('patient_id') }} as patient_id,
{{ clean_text('first_name') }} as first_name,
{{ clean_text('last_name') }} as last_name,
{{ clean_date('date_of_birth') }} as date_of_birth,
{{ clean_key_text('national_id') }} as national_id,
{{ clean_timestamp('created_at') }} as created_at,
{{ clean_boolean('is_active') }} as is_active
from {{ source('cemr', 'patients') }}The clean_key_* variants signal intent — these columns are used in joins or as surrogate keys, so unexpected nulls are surfaced as warnings during dbt test.
Utility macros
| Macro | Purpose |
|---|---|
array_cast | Cast elements of a Postgres array from one type to another |
date_util | Collection of date manipulation helpers (fiscal periods, age calculation, etc.) |
hash_column | Generate a deterministic MD5 hash key from one or more columns — used for deduplication and surrogate key generation |
pivot_key_value | Pivot a key-value pair table into wide columns — common for DHIS2 data elements |
standardize_facility_name | Normalise health facility names across source systems using a reference lookup |
hash_column example
-- Surrogate key from multiple columns
select
{{ hash_column(['patient_id', 'visit_date', 'facility_code']) }} as visit_sk,
patient_id,
visit_date,
facility_code
from {{ ref('stg_cemr__visits') }}Domain subdirectories
For source-specific logic that doesn’t belong in a general macro, macros are grouped into subdirectories:
| Directory | Contains |
|---|---|
macros/crvs/ | Civil registration data transformations and CRVS-specific lookups |
macros/efiche/ | eFICHE outpatient record parsing helpers |
macros/hmis_tracker/ | DHIS2 tracked entity and program stage helpers |
macros/incremental_load/ | Helpers for incremental materialisation strategies |
macros/process_dhis2__org_units/ | DHIS2 organisation unit hierarchy processing |
Last updated on