On-Premise Deployment

Patient Journey Intelligence can be deployed entirely on your on-premise infrastructure for organizations with strict data residency requirements or existing datacenter investments.

Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│                    On-Premise Datacenter                          │
│                                                                    │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │              Kubernetes Cluster (On-Prem)                    │ │
│  │                                                              │ │
│  │  ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │ │
│  │  │ Master  │ │ Master   │ │ Master   │ │  Worker Nodes  │ │ │
│  │  │ Node 1  │ │ Node 2   │ │ Node 3   │ │  (10-50+)      │ │ │
│  │  └─────────┘ └──────────┘ └──────────┘ └────────────────┘ │ │
│  │                                                              │ │
│  │  ┌──────────────────────────────────────────────────────┐  │ │
│  │  │          Application Workloads                        │  │ │
│  │  │  ┌──────────┐ ┌──────────┐ ┌───────────────────┐   │  │ │
│  │  │  │  Web UI  │ │API Server│ │  NLP Pipeline     │   │  │ │
│  │  │  └──────────┘ └──────────┘ └───────────────────┘   │  │ │
│  │  │  ┌──────────┐ ┌──────────┐ ┌───────────────────┐   │  │ │
│  │  │  │Ingestion │ │  De-ID   │ │ Terminology Svc   │   │  │ │
│  │  │  └──────────┘ └──────────┘ └───────────────────┘   │  │ │
│  │  └──────────────────────────────────────────────────────┘  │ │
│  └──────────────────────────────────────────────────────────────┘ │
│                                                                    │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐ │
│  │ PostgreSQL   │  │  NFS/SAN     │  │   Redis Cluster        │ │
│  │ HA Cluster   │  │  Storage     │  │                        │ │
│  │              │  │              │  │  - Caching             │ │
│  │ - OMOP CDM   │  │ - Documents  │  │  - Session Management  │ │
│  │ - Metadata   │  │ - Files      │  │                        │ │
│  │ - 3 nodes    │  │ - Backups    │  │  (3 node cluster)      │ │
│  └──────────────┘  └──────────────┘  └────────────────────────┘ │
│                                                                    │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐ │
│  │ Load         │  │  Monitoring  │  │   Backup & Recovery    │ │
│  │ Balancer     │  │              │  │                        │ │
│  │ (HAProxy/    │  │ - Prometheus │  │  - Velero (K8s)        │ │
│  │  Nginx)      │  │ - Grafana    │  │  - pgBackRest (DB)     │ │
│  │              │  │ - ELK Stack  │  │  - File-level backups  │ │
│  └──────────────┘  └──────────────┘  └────────────────────────┘ │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

Infrastructure Requirements

Compute

Kubernetes Master Nodes (3 required for HA):

CPU: 8 cores per node
RAM: 32 GB per node
Storage: 200 GB SSD per node
OS: Ubuntu 22.04 LTS, RHEL 8/9, or Rocky Linux 9

Kubernetes Worker Nodes (minimum 3, recommended 10+):

CPU: 16-32 cores per node
RAM: 64-128 GB per node
Storage: 500 GB SSD per node (for container images and local volumes)

Total Minimum:

18 servers (3 masters + 15 workers for medium deployment)
384 CPU cores
1.5 TB RAM

Storage

Shared Storage (NFS, CephFS, GlusterFS, or enterprise SAN):

Capacity: 10 TB - 100 TB (depends on data volume)
IOPS: 10,000+ for database workloads
Latency: < 5ms
Redundancy: RAID 10 or equivalent

Database Storage:

Dedicated SSD/NVMe storage for PostgreSQL
5 TB - 50 TB depending on patient volume
20,000+ IOPS

Networking

Internal Network: 10 Gbps between nodes
Load Balancer: HA proxy or hardware load balancer
Firewall: Access control for internal/external traffic
VPN/Bastion: Secure remote access

Kubernetes Distribution

Supported distributions:

Vanilla Kubernetes (kubeadm)
Red Hat OpenShift
Rancher Kubernetes Engine (RKE)
VMware Tanzu
SUSE Rancher

Software Stack

Core Components

Component	Technology	Purpose
Container Orchestration	Kubernetes 1.26+	Manage application containers
Database	PostgreSQL 14+ (HA cluster)	OMOP CDM storage
Caching	Redis 7+ (cluster mode)	Session and query caching
Storage	NFS/CephFS/GlusterFS	Shared file storage
Load Balancer	HAProxy/Nginx	Traffic distribution
Monitoring	Prometheus + Grafana	Metrics and dashboards
Logging	ELK Stack or Loki	Centralized logging
Backup	Velero + pgBackRest	Disaster recovery

Container Registry

Options:

Harbor (recommended for on-prem)
JFrog Artifactory
Nexus Repository
Docker Trusted Registry

Installation Process

1. Infrastructure Provisioning (Week 1-2)

Physical/Virtual Server Setup:

Provision servers according to specifications
Configure networking and storage
Install base OS

Kubernetes Cluster Deployment:

# Using kubeadm (example)
kubeadm init --control-plane-endpoint="lb.example.com:6443" --upload-certs
kubeadm join <master-endpoint> --token <token> --discovery-token-ca-cert-hash <hash>

Persistent Storage:

# Install NFS provisioner or Rook-Ceph
kubectl apply -f nfs-provisioner.yaml

2. Database Cluster Setup (Week 2)

PostgreSQL HA Cluster:

Deploy Patroni or Stolon for PostgreSQL HA
Configure streaming replication
Set up automated backups

# Example: Deploy PostgreSQL with Helm
helm install postgresql bitnami/postgresql-ha \
  --set postgresql.replicaCount=3 \
  --set persistence.size=1Ti

3. Application Deployment (Week 3-4)

Load Patient Journey Intelligence container images from John Snow Labs:

# Pull images to private registry
docker pull jsl.ocir.io/patient-journey:5.2.0
docker tag jsl.ocir.io/patient-journey:5.2.0 registry.local/patient-journey:5.2.0
docker push registry.local/patient-journey:5.2.0

Deploy with Helm:

helm install patient-journey jsl/patient-journey-intelligence \
  --namespace Patient Journey Intelligence \
  --values on-prem-values.yaml

4. Data Integration (Week 5-6)

Configure connections to on-premise EHR and clinical systems.

5. Testing & Go-Live (Week 7-8)

User acceptance testing and production cutover.

High Availability Configuration

Database HA

Primary-Replica Setup: 1 primary + 2 replicas
Automatic Failover: Patroni or Stolon
Backup Strategy: Daily full + continuous WAL archiving

Application HA

Multiple replicas for each service
Pod anti-affinity to spread across nodes
Readiness and liveness probes

Storage HA

RAID 10 or distributed storage (Ceph)
Snapshots and replication

Resource Sizing Examples

Small Deployment (< 100K patients)

Resource	Specification
Masters	3 x (8 cores, 32 GB RAM)
Workers	6 x (16 cores, 64 GB RAM)
PostgreSQL	3 x (16 cores, 128 GB RAM, 2 TB SSD)
Shared Storage	10 TB NFS
Network	10 Gbps

Medium Deployment (100K - 1M patients)

Resource	Specification
Masters	3 x (16 cores, 64 GB RAM)
Workers	15 x (32 cores, 128 GB RAM)
PostgreSQL	3 x (32 cores, 256 GB RAM, 10 TB SSD)
Shared Storage	50 TB NFS/SAN
Network	25 Gbps

Large Deployment (> 1M patients)

Resource	Specification
Masters	3 x (16 cores, 64 GB RAM)
Workers	30+ x (32 cores, 128 GB RAM)
PostgreSQL	3 x (64 cores, 512 GB RAM, 50 TB SSD)
Shared Storage	200 TB SAN
Network	40 Gbps

Security

Network Security

Firewall rules (iptables/firewalld)
Network segmentation (VLANs)
TLS 1.2+ for all communications

Access Control

LDAP/Active Directory integration
RBAC for Kubernetes
Database role-based access

Data Protection

Encryption at rest (LUKS, dm-crypt)
Encrypted backups
De-identification for secondary use

Compliance

Audit logging
Access tracking
Regular security assessments

Monitoring & Operations

Metrics

Kubernetes cluster health (Prometheus)
Node resource utilization
Application performance (APM)
Database metrics

Dashboards

Grafana for visualization
Pre-built Patient Journey Intelligence dashboards

Alerting

PagerDuty/Opsgenie integration
Email/SMS notifications
Escalation policies

Backup & Disaster Recovery

Backup Strategy

Database: Daily full + hourly incrementals
File Storage: Daily snapshots
Kubernetes State: Velero backups
Retention: 30 days online, 1 year archive

Recovery Procedures

RTO: < 4 hours
RPO: < 1 hour
Regular DR testing quarterly

Advantages of On-Premise

Data Sovereignty: Complete control over data location
Network Performance: Low latency to on-prem EHR systems
Compliance: Meet strict regulatory requirements
Integration: Direct access to internal systems
Cost Predictability: No cloud usage spikes

Challenges & Considerations

Capital Investment: Upfront hardware costs
Operational Overhead: Requires dedicated IT staff
Scalability: Limited by physical infrastructure
Disaster Recovery: Requires second datacenter

Support Model

John Snow Labs provides:

Deployment automation scripts
Container images and updates
Technical support (24/7 available)
Runbooks and operational guides
Quarterly health checks

Next Steps

Infrastructure Assessment: Review current datacenter capacity
Architecture Planning: Design deployment with John Snow Labs
Procurement: Order necessary hardware
Pilot Deployment: Test with sample data

Architecture Overview​

Infrastructure Requirements​

Compute​

Storage​

Networking​

Kubernetes Distribution​

Software Stack​

Core Components​

Container Registry​

Installation Process​

1. Infrastructure Provisioning (Week 1-2)​

2. Database Cluster Setup (Week 2)​

3. Application Deployment (Week 3-4)​

4. Data Integration (Week 5-6)​

5. Testing & Go-Live (Week 7-8)​

High Availability Configuration​

Database HA​

Application HA​

Storage HA​

Resource Sizing Examples​

Small Deployment (< 100K patients)​

Medium Deployment (100K - 1M patients)​

Large Deployment (> 1M patients)​

Security​

Network Security​

Access Control​

Data Protection​

Compliance​

Monitoring & Operations​

Metrics​

Dashboards​

Alerting​

Backup & Disaster Recovery​

Backup Strategy​

Recovery Procedures​

Advantages of On-Premise​

Challenges & Considerations​

Support Model​

Next Steps​

Additional Resources​