Skip to main content

Deploying Patient Journey Intelligence on Snowflake

Patient Journey Intelligence integrates with Snowflake Data Cloud to store OMOP CDM data alongside your existing healthcare analytics infrastructure.

Architecture Overview

Patient Journey Intelligence uses Snowflake as the primary data warehouse for OMOP CDM tables, while compute-intensive NLP processing runs on separate infrastructure (AWS/Azure/GCP).

┌────────────────────────────────────────────────────────────────┐
│ Snowflake Data Cloud │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ OMOP CDM Database │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ │
│ │ │ Person │ │ Condition │ │ Drug_Exposure │ │ │
│ │ │ Observation│ │ Procedure │ │ Measurement │ │ │
│ │ │ Visit │ │ Note │ │ Device_Exposure │ │ │
│ │ └────────────┘ └────────────┘ └────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Vocabulary & Metadata │ │
│ │ │ │
│ │ - SNOMED CT Concepts │ │
│ │ - RxNorm Drug Mappings │ │
│ │ - LOINC Lab Tests │ │
│ │ - Source-to-Concept Mappings │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ Patient Journey Intelligence Processing Layer (AWS/Azure/GCP) │
│ │
│ ┌────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
│ │ NLP │ │ De-ID │ │ Terminology │ │
│ │ Processing │ │ Engine │ │ Normalization │ │
│ └─────┬──────┘ └──────┬───────┘ └──────────┬────────────┘ │
│ │ │ │ │
│ └────────────────┴──────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Snowflake Connector │ │
│ │ (Snowpark/JDBC) │ │
│ └──────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘

Deployment Model

Hybrid Architecture

  • Snowflake: Stores OMOP CDM data, vocabularies, and metadata
  • Kubernetes (AWS/Azure/GCP): Runs NLP processing, de-identification, and ingestion
  • Snowpark (optional): Run Python-based transformations directly in Snowflake
  • Snowflake Connector: Secure data exchange between layers

Data Flow

  1. Raw data ingested by Patient Journey Intelligence processing layer
  2. NLP extraction and de-identification
  3. Terminology normalization
  4. Write to Snowflake OMOP tables via Snowflake Connector
  5. Analytics and queries run directly in Snowflake

Key Snowflake Features

Virtual Warehouses

Separate compute for different workloads:

  • LOAD_WH: Data ingestion (X-Small to Small)
  • ANALYTICS_WH: Cohort queries and analytics (Medium to Large)
  • REPORTS_WH: Dashboard and reporting (Small)

Snowflake Secure Data Sharing

Share de-identified OMOP data with:

  • Research partners
  • External collaborators
  • Multi-site studies

Without copying data or complex ETL.

Time Travel & Zero-Copy Cloning

  • Query historical data (up to 90 days)
  • Create instant dev/test environments
  • Rollback data errors

Row-Level Security

Implement fine-grained access control:

CREATE ROW ACCESS POLICY patient_access_policy AS (person_id NUMBER) RETURNS BOOLEAN ->
CURRENT_ROLE() = 'ADMIN' OR
person_id IN (SELECT person_id FROM authorized_patients WHERE user_name = CURRENT_USER());

ALTER TABLE omop_cdm.person ADD ROW ACCESS POLICY patient_access_policy ON (person_id);

Prerequisites

Snowflake Account

  • Edition: Enterprise or higher
  • Region: Choose based on data residency
  • Features Required:
    • Multi-cluster warehouses
    • Snowpark (for Python UDFs)
    • Row-level security

Cloud Storage

  • AWS S3, Azure Blob, or GCS for staging files
  • External stages configured

Processing Infrastructure

  • Kubernetes cluster (AWS EKS, Azure AKS, or Google GKE)
  • For NLP and de-identification workloads

Installation Process

1. Snowflake Setup (Week 1)

-- Create database and schemas
CREATE DATABASE omop_cdm;
CREATE SCHEMA omop_cdm.cdm_v5_4;
CREATE SCHEMA omop_cdm.vocabulary;
CREATE SCHEMA omop_cdm.metadata;

-- Create virtual warehouses
CREATE WAREHOUSE load_wh WITH
WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE;

CREATE WAREHOUSE analytics_wh WITH
WAREHOUSE_SIZE = 'MEDIUM'
AUTO_SUSPEND = 600
AUTO_RESUME = TRUE
SCALING_POLICY = 'STANDARD'
MAX_CLUSTER_COUNT = 4;

-- Create roles and users
CREATE ROLE omop_reader;
CREATE ROLE omop_writer;

GRANT USAGE ON DATABASE omop_cdm TO ROLE omop_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA omop_cdm.cdm_v5_4 TO ROLE omop_reader;

2. OMOP CDM Schema Creation (Week 1)

Deploy all OMOP CDM v5.4 tables:

-- Example: Person table
CREATE OR REPLACE TABLE omop_cdm.cdm_v5_4.person (
person_id NUMBER(38,0) NOT NULL,
gender_concept_id NUMBER(38,0) NOT NULL,
year_of_birth NUMBER(38,0) NOT NULL,
month_of_birth NUMBER(38,0),
day_of_birth NUMBER(38,0),
birth_datetime TIMESTAMP_NTZ,
race_concept_id NUMBER(38,0) NOT NULL,
ethnicity_concept_id NUMBER(38,0) NOT NULL,
location_id NUMBER(38,0),
provider_id NUMBER(38,0),
care_site_id NUMBER(38,0),
person_source_value VARCHAR(50),
gender_source_value VARCHAR(50),
gender_source_concept_id NUMBER(38,0),
race_source_value VARCHAR(50),
race_source_concept_id NUMBER(38,0),
ethnicity_source_value VARCHAR(50),
ethnicity_source_concept_id NUMBER(38,0),
PRIMARY KEY (person_id)
);

-- Create all OMOP tables...

3. Processing Layer Deployment (Week 2-4)

Deploy Patient Journey Intelligence on Kubernetes and configure Snowflake connector.

4. Data Integration (Week 5-6)

Configure ingestion pipelines to write to Snowflake.

Resource Sizing & Costs

Small Deployment (< 100K patients)

ComponentSpecificationMonthly Cost
Snowflake Storage500 GB$25
Load WarehouseX-Small, 2 hrs/day$100
Analytics WarehouseSmall, 8 hrs/day$800
Processing (K8s)3 x m5.xlarge$500
Total~$1,500

Medium Deployment (100K - 1M patients)

ComponentSpecificationMonthly Cost
Snowflake Storage5 TB$250
Load WarehouseSmall, 4 hrs/day$400
Analytics WarehouseMedium, 12 hrs/day$3,600
Processing (K8s)6 x m5.2xlarge$2,000
Total~$6,300

Large Deployment (> 1M patients)

ComponentSpecificationMonthly Cost
Snowflake Storage50 TB$2,500
Load WarehouseMedium, 8 hrs/day$2,400
Analytics WarehouseLarge, 24 hrs/day$21,600
Processing (K8s)12 x m5.4xlarge$8,000
Total~$35,000

Performance Optimization

Clustering Keys

ALTER TABLE omop_cdm.cdm_v5_4.condition_occurrence
CLUSTER BY (person_id, condition_start_date);

ALTER TABLE omop_cdm.cdm_v5_4.drug_exposure
CLUSTER BY (person_id, drug_exposure_start_date);

Materialized Views

CREATE MATERIALIZED VIEW omop_cdm.analytics.patient_condition_summary AS
SELECT
person_id,
COUNT(DISTINCT condition_concept_id) AS condition_count,
MIN(condition_start_date) AS first_diagnosis_date,
MAX(condition_start_date) AS last_diagnosis_date
FROM omop_cdm.cdm_v5_4.condition_occurrence
GROUP BY person_id;

Search Optimization

ALTER TABLE omop_cdm.cdm_v5_4.person
ADD SEARCH OPTIMIZATION ON EQUALITY(person_id, person_source_value);

Integration with BI Tools

Power BI

Connect via Snowflake connector for dashboards and reports.

Tableau

Native Snowflake integration for visual analytics.

Python/R

Use Snowpark for advanced analytics and ML within Snowflake.

Security

  • Encryption: All data encrypted at rest and in transit
  • Network Policies: Restrict access by IP address
  • MFA: Required for privileged accounts
  • Row-Level Security: Implemented via policies
  • Column Masking: Protect sensitive fields

Next Steps

  1. Schedule architecture review
  2. Provision Snowflake account
  3. Deploy processing infrastructure
  4. Pilot with sample dataset

Additional Resources