Deploying Patient Journey Intelligence on Snowflake

Patient Journey Intelligence integrates with Snowflake Data Cloud to store OMOP CDM data alongside your existing healthcare analytics infrastructure.

Architecture Overview

Patient Journey Intelligence uses Snowflake as the primary data warehouse for OMOP CDM tables, while compute-intensive NLP processing runs on separate infrastructure (AWS/Azure/GCP).

┌────────────────────────────────────────────────────────────────┐
│                  Snowflake Data Cloud                          │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │                   OMOP CDM Database                       │ │
│  │                                                           │ │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐ │ │
│  │  │   Person   │  │ Condition  │  │   Drug_Exposure    │ │ │
│  │  │ Observation│  │ Procedure  │  │   Measurement      │ │ │
│  │  │   Visit    │  │    Note    │  │   Device_Exposure  │ │ │
│  │  └────────────┘  └────────────┘  └────────────────────┘ │ │
│  │                                                           │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │              Vocabulary & Metadata                        │ │
│  │                                                           │ │
│  │  - SNOMED CT Concepts                                    │ │
│  │  - RxNorm Drug Mappings                                  │ │
│  │  - LOINC Lab Tests                                       │ │
│  │  - Source-to-Concept Mappings                            │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│              Patient Journey Intelligence Processing Layer (AWS/Azure/GCP)               │
│                                                                  │
│  ┌────────────┐  ┌──────────────┐  ┌───────────────────────┐  │
│  │ NLP        │  │ De-ID        │  │  Terminology          │  │
│  │ Processing │  │ Engine       │  │  Normalization        │  │
│  └─────┬──────┘  └──────┬───────┘  └──────────┬────────────┘  │
│        │                │                      │                │
│        └────────────────┴──────────────────────┘                │
│                         │                                       │
│                         ▼                                       │
│              ┌──────────────────────┐                          │
│              │ Snowflake Connector  │                          │
│              │ (Snowpark/JDBC)      │                          │
│              └──────────────────────┘                          │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Deployment Model

Hybrid Architecture

Snowflake: Stores OMOP CDM data, vocabularies, and metadata
Kubernetes (AWS/Azure/GCP): Runs NLP processing, de-identification, and ingestion
Snowpark (optional): Run Python-based transformations directly in Snowflake
Snowflake Connector: Secure data exchange between layers

Data Flow

Raw data ingested by Patient Journey Intelligence processing layer
NLP extraction and de-identification
Terminology normalization
Write to Snowflake OMOP tables via Snowflake Connector
Analytics and queries run directly in Snowflake

Key Snowflake Features

Virtual Warehouses

Separate compute for different workloads:

LOAD_WH: Data ingestion (X-Small to Small)
ANALYTICS_WH: Cohort queries and analytics (Medium to Large)
REPORTS_WH: Dashboard and reporting (Small)

Share de-identified OMOP data with:

Research partners
External collaborators
Multi-site studies

Without copying data or complex ETL.

Time Travel & Zero-Copy Cloning

Query historical data (up to 90 days)
Create instant dev/test environments
Rollback data errors

Row-Level Security

Implement fine-grained access control:

CREATE ROW ACCESS POLICY patient_access_policy AS (person_id NUMBER) RETURNS BOOLEAN ->
  CURRENT_ROLE() = 'ADMIN' OR
  person_id IN (SELECT person_id FROM authorized_patients WHERE user_name = CURRENT_USER());

ALTER TABLE omop_cdm.person ADD ROW ACCESS POLICY patient_access_policy ON (person_id);

Prerequisites

Snowflake Account

Edition: Enterprise or higher
Region: Choose based on data residency
Features Required:
- Multi-cluster warehouses
- Snowpark (for Python UDFs)
- Row-level security

Cloud Storage

AWS S3, Azure Blob, or GCS for staging files
External stages configured

Processing Infrastructure

Kubernetes cluster (AWS EKS, Azure AKS, or Google GKE)
For NLP and de-identification workloads

Installation Process

1. Snowflake Setup (Week 1)

-- Create database and schemas
CREATE DATABASE omop_cdm;
CREATE SCHEMA omop_cdm.cdm_v5_4;
CREATE SCHEMA omop_cdm.vocabulary;
CREATE SCHEMA omop_cdm.metadata;

-- Create virtual warehouses
CREATE WAREHOUSE load_wh WITH
  WAREHOUSE_SIZE = 'SMALL'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE;

CREATE WAREHOUSE analytics_wh WITH
  WAREHOUSE_SIZE = 'MEDIUM'
  AUTO_SUSPEND = 600
  AUTO_RESUME = TRUE
  SCALING_POLICY = 'STANDARD'
  MAX_CLUSTER_COUNT = 4;

-- Create roles and users
CREATE ROLE omop_reader;
CREATE ROLE omop_writer;

GRANT USAGE ON DATABASE omop_cdm TO ROLE omop_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA omop_cdm.cdm_v5_4 TO ROLE omop_reader;

2. OMOP CDM Schema Creation (Week 1)

Deploy all OMOP CDM v5.4 tables:

-- Example: Person table
CREATE OR REPLACE TABLE omop_cdm.cdm_v5_4.person (
  person_id NUMBER(38,0) NOT NULL,
  gender_concept_id NUMBER(38,0) NOT NULL,
  year_of_birth NUMBER(38,0) NOT NULL,
  month_of_birth NUMBER(38,0),
  day_of_birth NUMBER(38,0),
  birth_datetime TIMESTAMP_NTZ,
  race_concept_id NUMBER(38,0) NOT NULL,
  ethnicity_concept_id NUMBER(38,0) NOT NULL,
  location_id NUMBER(38,0),
  provider_id NUMBER(38,0),
  care_site_id NUMBER(38,0),
  person_source_value VARCHAR(50),
  gender_source_value VARCHAR(50),
  gender_source_concept_id NUMBER(38,0),
  race_source_value VARCHAR(50),
  race_source_concept_id NUMBER(38,0),
  ethnicity_source_value VARCHAR(50),
  ethnicity_source_concept_id NUMBER(38,0),
  PRIMARY KEY (person_id)
);

-- Create all OMOP tables...

3. Processing Layer Deployment (Week 2-4)

Deploy Patient Journey Intelligence on Kubernetes and configure Snowflake connector.

4. Data Integration (Week 5-6)

Configure ingestion pipelines to write to Snowflake.

Resource Sizing & Costs

Small Deployment (< 100K patients)

Component	Specification	Monthly Cost
Snowflake Storage	500 GB	$25
Load Warehouse	X-Small, 2 hrs/day	$100
Analytics Warehouse	Small, 8 hrs/day	$800
Processing (K8s)	3 x m5.xlarge	$500
Total		~$1,500

Medium Deployment (100K - 1M patients)

Component	Specification	Monthly Cost
Snowflake Storage	5 TB	$250
Load Warehouse	Small, 4 hrs/day	$400
Analytics Warehouse	Medium, 12 hrs/day	$3,600
Processing (K8s)	6 x m5.2xlarge	$2,000
Total		~$6,300

Large Deployment (> 1M patients)

Component	Specification	Monthly Cost
Snowflake Storage	50 TB	$2,500
Load Warehouse	Medium, 8 hrs/day	$2,400
Analytics Warehouse	Large, 24 hrs/day	$21,600
Processing (K8s)	12 x m5.4xlarge	$8,000
Total		~$35,000

Performance Optimization

Clustering Keys

ALTER TABLE omop_cdm.cdm_v5_4.condition_occurrence
  CLUSTER BY (person_id, condition_start_date);

ALTER TABLE omop_cdm.cdm_v5_4.drug_exposure
  CLUSTER BY (person_id, drug_exposure_start_date);

Materialized Views

CREATE MATERIALIZED VIEW omop_cdm.analytics.patient_condition_summary AS
SELECT
  person_id,
  COUNT(DISTINCT condition_concept_id) AS condition_count,
  MIN(condition_start_date) AS first_diagnosis_date,
  MAX(condition_start_date) AS last_diagnosis_date
FROM omop_cdm.cdm_v5_4.condition_occurrence
GROUP BY person_id;

Search Optimization

ALTER TABLE omop_cdm.cdm_v5_4.person
  ADD SEARCH OPTIMIZATION ON EQUALITY(person_id, person_source_value);

Integration with BI Tools

Power BI

Connect via Snowflake connector for dashboards and reports.

Tableau

Native Snowflake integration for visual analytics.

Python/R

Use Snowpark for advanced analytics and ML within Snowflake.

Security

Encryption: All data encrypted at rest and in transit
Network Policies: Restrict access by IP address
MFA: Required for privileged accounts
Row-Level Security: Implemented via policies
Column Masking: Protect sensitive fields

Next Steps

Schedule architecture review
Provision Snowflake account
Deploy processing infrastructure
Pilot with sample dataset

Architecture Overview​

Deployment Model​

Hybrid Architecture​

Data Flow​

Key Snowflake Features​

Virtual Warehouses​

Snowflake Secure Data Sharing​

Time Travel & Zero-Copy Cloning​

Row-Level Security​

Prerequisites​

Snowflake Account​

Cloud Storage​

Processing Infrastructure​

Installation Process​

1. Snowflake Setup (Week 1)​

2. OMOP CDM Schema Creation (Week 1)​

3. Processing Layer Deployment (Week 2-4)​

4. Data Integration (Week 5-6)​

Resource Sizing & Costs​

Small Deployment (< 100K patients)​

Medium Deployment (100K - 1M patients)​

Large Deployment (> 1M patients)​

Performance Optimization​

Clustering Keys​

Materialized Views​

Search Optimization​

Integration with BI Tools​

Power BI​

Tableau​

Python/R​

Security​

Next Steps​

Additional Resources​