On-Premise Deployment
Patient Journey Intelligence can be deployed entirely on your on-premise infrastructure for organizations with strict data residency requirements or existing datacenter investments.
Architecture Overview
┌──────────────────────────────────────────────────────────────────┐
│ On-Premise Datacenter │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Kubernetes Cluster (On-Prem) │ │
│ │ │ │
│ │ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │ │
│ │ │ Master │ │ Master │ │ Master │ │ Worker Nodes │ │ │
│ │ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ (10-50+) │ │ │
│ │ └─────────┘ └──────────┘ └──────────┘ └────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Application Workloads │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ │ │ │
│ │ │ │ Web UI │ │API Server│ │ NLP Pipeline │ │ │ │
│ │ │ └──────────┘ └──────────┘ └───────────────────┘ │ │ │
│ │ │ ┌─── ───────┐ ┌──────────┐ ┌───────────────────┐ │ │ │
│ │ │ │Ingestion │ │ De-ID │ │ Terminology Svc │ │ │ │
│ │ │ └──────────┘ └──────────┘ └───────────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ PostgreSQL │ │ NFS/SAN │ │ Redis Cluster │ │
│ │ HA Cluster │ │ Storage │ │ │ │
│ │ │ │ │ │ - Caching │ │
│ │ - OMOP CDM │ │ - Documents │ │ - Session Management │ │
│ │ - Metadata │ │ - Files │ │ │ │
│ │ - 3 nodes │ │ - Backups │ │ (3 node cluster) │ │
│ └──────────────┘ └──────────────┘ └────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ Load │ │ Monitoring │ │ Backup & Recovery │ │
│ │ Balancer │ │ │ │ │ │
│ │ (HAProxy/ │ │ - Prometheus │ │ - Velero (K8s) │ │
│ │ Nginx) │ │ - Grafana │ │ - pgBackRest (DB) │ │
│ │ │ │ - ELK Stack │ │ - File-level backups │ │
│ └──────────────┘ └──────────────┘ └────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘
Infrastructure Requirements
Compute
Kubernetes Master Nodes (3 required for HA):
- CPU: 8 cores per node
- RAM: 32 GB per node
- Storage: 200 GB SSD per node
- OS: Ubuntu 22.04 LTS, RHEL 8/9, or Rocky Linux 9
Kubernetes Worker Nodes (minimum 3, recommended 10+):
- CPU: 16-32 cores per node
- RAM: 64-128 GB per node
- Storage: 500 GB SSD per node (for container images and local volumes)
Total Minimum:
- 18 servers (3 masters + 15 workers for medium deployment)
- 384 CPU cores
- 1.5 TB RAM
Storage
Shared Storage (NFS, CephFS, GlusterFS, or enterprise SAN):
- Capacity: 10 TB - 100 TB (depends on data volume)
- IOPS: 10,000+ for database workloads
- Latency: < 5ms
- Redundancy: RAID 10 or equivalent
Database Storage:
- Dedicated SSD/NVMe storage for PostgreSQL
- 5 TB - 50 TB depending on patient volume
- 20,000+ IOPS
Networking
- Internal Network: 10 Gbps between nodes
- Load Balancer: HA proxy or hardware load balancer
- Firewall: Access control for internal/external traffic
- VPN/Bastion: Secure remote access
Kubernetes Distribution
Supported distributions:
- Vanilla Kubernetes (kubeadm)
- Red Hat OpenShift
- Rancher Kubernetes Engine (RKE)
- VMware Tanzu
- SUSE Rancher
Software Stack
Core Components
| Component | Technology | Purpose |
|---|---|---|
| Container Orchestration | Kubernetes 1.26+ | Manage application containers |
| Database | PostgreSQL 14+ (HA cluster) | OMOP CDM storage |
| Caching | Redis 7+ (cluster mode) | Session and query caching |
| Storage | NFS/CephFS/GlusterFS | Shared file storage |
| Load Balancer | HAProxy/Nginx | Traffic distribution |
| Monitoring | Prometheus + Grafana | Metrics and dashboards |
| Logging | ELK Stack or Loki | Centralized logging |
| Backup | Velero + pgBackRest | Disaster recovery |
Container Registry
Options:
- Harbor (recommended for on-prem)
- JFrog Artifactory
- Nexus Repository
- Docker Trusted Registry
Installation Process
1. Infrastructure Provisioning (Week 1-2)
Physical/Virtual Server Setup:
- Provision servers according to specifications
- Configure networking and storage
- Install base OS
Kubernetes Cluster Deployment:
# Using kubeadm (example)
kubeadm init --control-plane-endpoint="lb.example.com:6443" --upload-certs
kubeadm join <master-endpoint> --token <token> --discovery-token-ca-cert-hash <hash>
Persistent Storage:
# Install NFS provisioner or Rook-Ceph
kubectl apply -f nfs-provisioner.yaml
2. Database Cluster Setup (Week 2)
PostgreSQL HA Cluster:
- Deploy Patroni or Stolon for PostgreSQL HA
- Configure streaming replication
- Set up automated backups
# Example: Deploy PostgreSQL with Helm
helm install postgresql bitnami/postgresql-ha \
--set postgresql.replicaCount=3 \
--set persistence.size=1Ti
3. Application Deployment (Week 3-4)
Load Patient Journey Intelligence container images from John Snow Labs:
# Pull images to private registry
docker pull jsl.ocir.io/patient-journey:5.2.0
docker tag jsl.ocir.io/patient-journey:5.2.0 registry.local/patient-journey:5.2.0
docker push registry.local/patient-journey:5.2.0
Deploy with Helm:
helm install patient-journey jsl/patient-journey-intelligence \
--namespace Patient Journey Intelligence \
--values on-prem-values.yaml
4. Data Integration (Week 5-6)
Configure connections to on-premise EHR and clinical systems.
5. Testing & Go-Live (Week 7-8)
User acceptance testing and production cutover.
High Availability Configuration
Database HA
- Primary-Replica Setup: 1 primary + 2 replicas
- Automatic Failover: Patroni or Stolon
- Backup Strategy: Daily full + continuous WAL archiving
Application HA
- Multiple replicas for each service
- Pod anti-affinity to spread across nodes
- Readiness and liveness probes
Storage HA
- RAID 10 or distributed storage (Ceph)
- Snapshots and replication
Resource Sizing Examples
Small Deployment (< 100K patients)
| Resource | Specification |
|---|---|
| Masters | 3 x (8 cores, 32 GB RAM) |
| Workers | 6 x (16 cores, 64 GB RAM) |
| PostgreSQL | 3 x (16 cores, 128 GB RAM, 2 TB SSD) |
| Shared Storage | 10 TB NFS |
| Network | 10 Gbps |
Medium Deployment (100K - 1M patients)
| Resource | Specification |
|---|---|
| Masters | 3 x (16 cores, 64 GB RAM) |
| Workers | 15 x (32 cores, 128 GB RAM) |
| PostgreSQL | 3 x (32 cores, 256 GB RAM, 10 TB SSD) |
| Shared Storage | 50 TB NFS/SAN |
| Network | 25 Gbps |
Large Deployment (> 1M patients)
| Resource | Specification |
|---|---|
| Masters | 3 x (16 cores, 64 GB RAM) |
| Workers | 30+ x (32 cores, 128 GB RAM) |
| PostgreSQL | 3 x (64 cores, 512 GB RAM, 50 TB SSD) |
| Shared Storage | 200 TB SAN |
| Network | 40 Gbps |
Security
Network Security
- Firewall rules (iptables/firewalld)
- Network segmentation (VLANs)
- TLS 1.2+ for all communications
Access Control
- LDAP/Active Directory integration
- RBAC for Kubernetes
- Database role-based access
Data Protection
- Encryption at rest (LUKS, dm-crypt)
- Encrypted backups
- De-identification for secondary use
Compliance
- Audit logging
- Access tracking
- Regular security assessments
Monitoring & Operations
Metrics
- Kubernetes cluster health (Prometheus)
- Node resource utilization
- Application performance (APM)
- Database metrics
Dashboards
- Grafana for visualization
- Pre-built Patient Journey Intelligence dashboards
Alerting
- PagerDuty/Opsgenie integration
- Email/SMS notifications
- Escalation policies
Backup & Disaster Recovery
Backup Strategy
- Database: Daily full + hourly incrementals
- File Storage: Daily snapshots
- Kubernetes State: Velero backups
- Retention: 30 days online, 1 year archive
Recovery Procedures
- RTO: < 4 hours
- RPO: < 1 hour
- Regular DR testing quarterly
Advantages of On-Premise
- Data Sovereignty: Complete control over data location
- Network Performance: Low latency to on-prem EHR systems
- Compliance: Meet strict regulatory requirements
- Integration: Direct access to internal systems
- Cost Predictability: No cloud usage spikes
Challenges & Considerations
- Capital Investment: Upfront hardware costs
- Operational Overhead: Requires dedicated IT staff
- Scalability: Limited by physical infrastructure
- Disaster Recovery: Requires second datacenter
Support Model
John Snow Labs provides:
- Deployment automation scripts
- Container images and updates
- Technical support (24/7 available)
- Runbooks and operational guides
- Quarterly health checks
Next Steps
- Infrastructure Assessment: Review current datacenter capacity
- Architecture Planning: Design deployment with John Snow Labs
- Procurement: Order necessary hardware
- Pilot Deployment: Test with sample data