System Architecture
The HIC platform follows a linear ingestion-to-consumption pipeline. Source systems are polled by Prefect, raw data lands in PostgreSQL staging tables, dbt transforms it into the data warehouse, and BI tools query the warehouse directly.
Data flow
Platform components
| Component | Technology | Role |
|---|---|---|
| Data Warehouse | PostgreSQL (StackGres on K8s) | Central storage for all health data |
| Transformation | dbt (Python 3.11) | Raw → staging → DWH → reporting layers |
| Orchestration | Prefect 3.3.2 | Schedule and monitor all pipelines |
| Distributed Processing | Apache Spark 3.5.1 + Livy | Large-scale analytical workloads |
| BI Platform | Apache Superset | Dashboards and ad-hoc queries |
| Web Application | GreenRiver (NestJS + React) | HIC portal for MoH staff |
| Interactive Analysis | JupyterHub | Data science notebooks |
| Object Storage | MinIO (S3-compatible) | Data lake files, backups, exports |
| Cache | Valkey (Redis-compatible) | Session and query caching |
| Authentication | AWS Cognito + OAuth2/Dex | SSO across all services |
Deployment
All workloads run on AWS EKS. Application lifecycle is managed through ArgoCD (GitOps) with Helmfile-managed Helm charts. The infrastructure layer is provisioned with AWS CDK TypeScript stacks.
- Cluster: AWS EKS
- GitOps: ArgoCD at
gitops.awseks.rhos.africa - Helm management: Helmfile (~20 charts)
- Service mesh: Istio
- Database operator: StackGres (PostgreSQL on K8s)
Last updated on