Skip to Content
PlatformArchitecture

System Architecture

The HIC platform follows a linear ingestion-to-consumption pipeline. Source systems are polled by Prefect, raw data lands in PostgreSQL staging tables, dbt transforms it into the data warehouse, and BI tools query the warehouse directly.

Data flow

Platform components

ComponentTechnologyRole
Data WarehousePostgreSQL (StackGres on K8s)Central storage for all health data
Transformationdbt (Python 3.11)Raw → staging → DWH → reporting layers
OrchestrationPrefect 3.3.2Schedule and monitor all pipelines
Distributed ProcessingApache Spark 3.5.1 + LivyLarge-scale analytical workloads
BI PlatformApache SupersetDashboards and ad-hoc queries
Web ApplicationGreenRiver (NestJS + React)HIC portal for MoH staff
Interactive AnalysisJupyterHubData science notebooks
Object StorageMinIO (S3-compatible)Data lake files, backups, exports
CacheValkey (Redis-compatible)Session and query caching
AuthenticationAWS Cognito + OAuth2/DexSSO across all services

Deployment

All workloads run on AWS EKS. Application lifecycle is managed through ArgoCD (GitOps) with Helmfile-managed Helm charts. The infrastructure layer is provisioned with AWS CDK TypeScript stacks.

  • Cluster: AWS EKS
  • GitOps: ArgoCD at gitops.awseks.rhos.africa
  • Helm management: Helmfile (~20 charts)
  • Service mesh: Istio
  • Database operator: StackGres (PostgreSQL on K8s)
Last updated on