System Architecture

The HIC platform follows a linear ingestion-to-consumption pipeline. Source systems are polled by Prefect, raw data lands in PostgreSQL staging tables, dbt transforms it into the data warehouse, and BI tools query the warehouse directly.

Data flow

Platform components

Component	Technology	Role
Data Warehouse	PostgreSQL (StackGres on K8s)	Central storage for all health data
Transformation	dbt (Python 3.11)	Raw → staging → DWH → reporting layers
Orchestration	Prefect 3.3.2	Schedule and monitor all pipelines
Distributed Processing	Apache Spark 3.5.1 + Livy	Large-scale analytical workloads
BI Platform	Apache Superset	Dashboards and ad-hoc queries
Web Application	GreenRiver (NestJS + React)	HIC portal for MoH staff
Interactive Analysis	JupyterHub	Data science notebooks
Object Storage	MinIO (S3-compatible)	Data lake files, backups, exports
Cache	Valkey (Redis-compatible)	Session and query caching
Authentication	AWS Cognito + OAuth2/Dex	SSO across all services

Deployment

All workloads run on AWS EKS. Application lifecycle is managed through ArgoCD (GitOps) with Helmfile-managed Helm charts. The infrastructure layer is provisioned with AWS CDK TypeScript stacks.

Cluster: AWS EKS
GitOps: ArgoCD at gitops.awseks.rhos.africa
Helm management: Helmfile (~20 charts)
Service mesh: Istio
Database operator: StackGres (PostgreSQL on K8s)