Deploying Patient Journey Intelligence on Snowflake
Patient Journey Intelligence integrates with Snowflake Data Cloud to store OMOP CDM data alongside your existing healthcare analytics infrastructure.
Architecture Overview
Patient Journey Intelligence uses Snowflake as the primary data warehouse for OMOP CDM tables, while compute-intensive NLP processing runs on separate infrastructure (AWS/Azure/GCP).
┌────────────────────────────────────────────────────────────────┐
│ Snowflake Data Cloud │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ OMOP CDM Database │ │
│ │ │ │
│ │ ┌─────────── ─┐ ┌────────────┐ ┌────────────────────┐ │ │
│ │ │ Person │ │ Condition │ │ Drug_Exposure │ │ │
│ │ │ Observation│ │ Procedure │ │ Measurement │ │ │
│ │ │ Visit │ │ Note │ │ Device_Exposure │ │ │
│ │ └────────────┘ └────────────┘ └────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Vocabulary & Metadata │ │
│ │ │ │
│ │ - SNOMED CT Concepts │ │
│ │ - RxNorm Drug Mappings │ │
│ │ - LOINC Lab Tests │ │
│ │ - Source-to-Concept Mappings │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Patient Journey Intelligence Processing Layer (AWS/Azure/GCP) │
│ │
│ ┌────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
│ │ NLP │ │ De-ID │ │ Terminology │ │
│ │ Processing │ │ Engine │ │ Normalization │ │
│ └─────┬──────┘ └──────┬───────┘ └──────────┬────────────┘ │
│ │ │ │ │
│ └────────────────┴──────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Snowflake Connector │ │
│ │ (Snowpark/JDBC) │ │
│ └──────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
Deployment Model
Hybrid Architecture
- Snowflake: Stores OMOP CDM data, vocabularies, and metadata
- Kubernetes (AWS/Azure/GCP): Runs NLP processing, de-identification, and ingestion
- Snowpark (optional): Run Python-based transformations directly in Snowflake
- Snowflake Connector: Secure data exchange between layers
Data Flow
- Raw data ingested by Patient Journey Intelligence processing layer
- NLP extraction and de-identification
- Terminology normalization
- Write to Snowflake OMOP tables via Snowflake Connector
- Analytics and queries run directly in Snowflake
Key Snowflake Features
Virtual Warehouses
Separate compute for different workloads:
- LOAD_WH: Data ingestion (X-Small to Small)
- ANALYTICS_WH: Cohort queries and analytics (Medium to Large)
- REPORTS_WH: Dashboard and reporting (Small)
Snowflake Secure Data Sharing
Share de-identified OMOP data with:
- Research partners
- External collaborators
- Multi-site studies
Without copying data or complex ETL.
Time Travel & Zero-Copy Cloning
- Query historical data (up to 90 days)
- Create instant dev/test environments
- Rollback data errors
Row-Level Security
Implement fine-grained access control:
CREATE ROW ACCESS POLICY patient_access_policy AS (person_id NUMBER) RETURNS BOOLEAN ->
CURRENT_ROLE() = 'ADMIN' OR
person_id IN (SELECT person_id FROM authorized_patients WHERE user_name = CURRENT_USER());
ALTER TABLE omop_cdm.person ADD ROW ACCESS POLICY patient_access_policy ON (person_id);
Prerequisites
Snowflake Account
- Edition: Enterprise or higher
- Region: Choose based on data residency
- Features Required:
- Multi-cluster warehouses
- Snowpark (for Python UDFs)
- Row-level security
Cloud Storage
- AWS S3, Azure Blob, or GCS for staging files
- External stages configured
Processing Infrastructure
- Kubernetes cluster (AWS EKS, Azure AKS, or Google GKE)
- For NLP and de-identification workloads
Installation Process
1. Snowflake Setup (Week 1)
-- Create database and schemas
CREATE DATABASE omop_cdm;
CREATE SCHEMA omop_cdm.cdm_v5_4;
CREATE SCHEMA omop_cdm.vocabulary;
CREATE SCHEMA omop_cdm.metadata;
-- Create virtual warehouses
CREATE WAREHOUSE load_wh WITH
WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE;
CREATE WAREHOUSE analytics_wh WITH
WAREHOUSE_SIZE = 'MEDIUM'
AUTO_SUSPEND = 600
AUTO_RESUME = TRUE
SCALING_POLICY = 'STANDARD'
MAX_CLUSTER_COUNT = 4;
-- Create roles and users
CREATE ROLE omop_reader;
CREATE ROLE omop_writer;
GRANT USAGE ON DATABASE omop_cdm TO ROLE omop_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA omop_cdm.cdm_v5_4 TO ROLE omop_reader;
2. OMOP CDM Schema Creation (Week 1)
Deploy all OMOP CDM v5.4 tables:
-- Example: Person table
CREATE OR REPLACE TABLE omop_cdm.cdm_v5_4.person (
person_id NUMBER(38,0) NOT NULL,
gender_concept_id NUMBER(38,0) NOT NULL,
year_of_birth NUMBER(38,0) NOT NULL,
month_of_birth NUMBER(38,0),
day_of_birth NUMBER(38,0),
birth_datetime TIMESTAMP_NTZ,
race_concept_id NUMBER(38,0) NOT NULL,
ethnicity_concept_id NUMBER(38,0) NOT NULL,
location_id NUMBER(38,0),
provider_id NUMBER(38,0),
care_site_id NUMBER(38,0),
person_source_value VARCHAR(50),
gender_source_value VARCHAR(50),
gender_source_concept_id NUMBER(38,0),
race_source_value VARCHAR(50),
race_source_concept_id NUMBER(38,0),
ethnicity_source_value VARCHAR(50),
ethnicity_source_concept_id NUMBER(38,0),
PRIMARY KEY (person_id)
);
-- Create all OMOP tables...
3. Processing Layer Deployment (Week 2-4)
Deploy Patient Journey Intelligence on Kubernetes and configure Snowflake connector.
4. Data Integration (Week 5-6)
Configure ingestion pipelines to write to Snowflake.
Resource Sizing & Costs
Small Deployment (< 100K patients)
| Component | Specification | Monthly Cost |
|---|---|---|
| Snowflake Storage | 500 GB | $25 |
| Load Warehouse | X-Small, 2 hrs/day | $100 |
| Analytics Warehouse | Small, 8 hrs/day | $800 |
| Processing (K8s) | 3 x m5.xlarge | $500 |
| Total | ~$1,500 |
Medium Deployment (100K - 1M patients)
| Component | Specification | Monthly Cost |
|---|---|---|
| Snowflake Storage | 5 TB | $250 |
| Load Warehouse | Small, 4 hrs/day | $400 |
| Analytics Warehouse | Medium, 12 hrs/day | $3,600 |
| Processing (K8s) | 6 x m5.2xlarge | $2,000 |
| Total | ~$6,300 |
Large Deployment (> 1M patients)
| Component | Specification | Monthly Cost |
|---|---|---|
| Snowflake Storage | 50 TB | $2,500 |
| Load Warehouse | Medium, 8 hrs/day | $2,400 |
| Analytics Warehouse | Large, 24 hrs/day | $21,600 |
| Processing (K8s) | 12 x m5.4xlarge | $8,000 |
| Total | ~$35,000 |
Performance Optimization
Clustering Keys
ALTER TABLE omop_cdm.cdm_v5_4.condition_occurrence
CLUSTER BY (person_id, condition_start_date);
ALTER TABLE omop_cdm.cdm_v5_4.drug_exposure
CLUSTER BY (person_id, drug_exposure_start_date);
Materialized Views
CREATE MATERIALIZED VIEW omop_cdm.analytics.patient_condition_summary AS
SELECT
person_id,
COUNT(DISTINCT condition_concept_id) AS condition_count,
MIN(condition_start_date) AS first_diagnosis_date,
MAX(condition_start_date) AS last_diagnosis_date
FROM omop_cdm.cdm_v5_4.condition_occurrence
GROUP BY person_id;
Search Optimization
ALTER TABLE omop_cdm.cdm_v5_4.person
ADD SEARCH OPTIMIZATION ON EQUALITY(person_id, person_source_value);
Integration with BI Tools
Power BI
Connect via Snowflake connector for dashboards and reports.
Tableau
Native Snowflake integration for visual analytics.
Python/R
Use Snowpark for advanced analytics and ML within Snowflake.
Security
- Encryption: All data encrypted at rest and in transit
- Network Policies: Restrict access by IP address
- MFA: Required for privileged accounts
- Row-Level Security: Implemented via policies
- Column Masking: Protect sensitive fields
Next Steps
- Schedule architecture review
- Provision Snowflake account
- Deploy processing infrastructure
- Pilot with sample dataset