Seeds & Reference Data
dbt seeds are static CSV files that dbt loads into the database as ordinary tables. nhic_dbt ships 40+ seed CSVs in the seeds/ directory covering facility mappings, administrative classifications, code lookups, and other reference data that changes infrequently.
What seeds are for
Seeds are the right choice when:
- The data is static or rarely changes (a new release is needed to update it)
- The data is small enough to commit to git as a CSV
- The data is authoritative reference data — code tables, classifications, approved facility lists
Seeds are the wrong choice when:
- The data is sourced from an external system and changes frequently — use a staging model instead
- The data is large (thousands of rows or more) — consider a staging model backed by a source table
Loading seeds
# Load all seeds
dbt seed
# Load a specific seed by name
dbt seed --select facility_mapping
# Full refresh (truncate and reload)
dbt seed --full-refreshSeeds are loaded into the staging schema by default (configured in dbt_project.yml).
Using seeds in models
Reference a seed in a model using {{ ref() }} just like any other dbt model:
-- models/intermediate/int_visits__enriched.sql
select
v.visit_id,
v.facility_code,
f.facility_name,
f.district,
f.province
from {{ ref('stg_cemr__visits') }} v
left join {{ ref('facility_mapping') }} f
on v.facility_code = f.facility_codedbt tracks this dependency in the DAG, so seeds are guaranteed to be loaded before any model that references them.
Adding or updating a seed
- Add or edit the CSV file in
seeds/. - If adding a new seed, declare column types in
dbt_project.ymlunderseeds:to prevent dbt from inferring incorrect types. - Run
dbt seed --select <seed_name>locally to verify it loads correctly. - Open a PR — CI will validate the seed as part of
dbt build.
Updating a seed CSV changes the committed data. For seeds that are authoritative (e.g., the approved facility list), treat the CSV as the source of truth and coordinate with the data team before modifying.