PJI Architecture Overview

The Patient Journey Intelligence (PJI) platform is built on a modern, modular, and cloud-native architecture designed to ingest, standardize, and analyze multimodal healthcare data at scale.

By combining AI-assisted processing, terminology normalization, OMOP standardization, and interactive applications, PJI enables organizations to transform raw clinical content into structured, explainable patient intelligence.

1. Data Sources & Ingestion Hub

PJI supports a wide range of source systems and clinical data types, including unstructured documents (clinical notes, PDFs, DOCX, HTML, FHIR bundles), imaging data (DICOM studies and radiology reports), and structured records from EHRs and clinical data warehouses.

Supported sources include Amazon S3 for storage of unstructured documents and images, Amazon HealthLake / HealthImaging for FHIR and imaging-native services, EHRs such as Epic, Cerner, and OpenEHR, and Snowflake for structured data lakehouse. These inputs are collected in a central ingestion hub, which acts as the gateway to downstream processing.

2. Ingestion & AI Processing Pipeline

Once data is ingested, it enters PJI's AI-powered multimodal pipeline. This pipeline extracts, transforms, and standardizes content through several key stages: multimodal processing (text, DICOM, and FHIR parsing using OCR + NLP + Vision models), de-identification (optional PHI removal from both text and pixel-level images, compliant with HIPAA, GDPR, and clinical trial protocols), facts & knowledge enrichment (entity recognition, temporal context, relationships, ontology-based enrichment, and normalization with SNOMED, RxNorm, ICD, LOINC), OMOP CDM modeling (transformation of all facts into the OMOP Common Data Model with deduplication across modalities, stored as a clean, analytics-ready Gold Layer), and clinical calculations (execution of standardized clinical formulas such as BMI, eGFR, A-a Gradient, with results stored alongside OMOP facts).

Each stage in the pipeline is modular, versioned, and traceable, supporting reproducibility and auditability.

3. External Knowledge Engines

To ensure semantic consistency, PJI integrates with dedicated terminology and reasoning services. The Terminology Server performs concept mapping and normalization across vocabularies (e.g., SNOMED, LOINC, RxNorm), while Medical LLMs/VLMs enhance NLP pipelines with summarization, quality inference, and registry support.

4. OMOP Storage Layer

All structured and enriched data is stored in a robust, OMOP-compatible relational backend. PJI supports various infrastructure targets, including Amazon Redshift, PostgreSQL (RDS / self-hosted), and OHDSI-compliant CDM layers.

Data is stratified by layers: Bronze (raw), Silver (normalized), Gold (deduplicated + curated). Each fact is linked to provenance metadata, enabling backward traceability to source documents and models used.

5. Provenance & Traceability

Every output (e.g., a diagnosis, lab value, or staging result) includes source file reference, text snippet or image region used, model version and timestamp, and confidence score and processing metadata. This design enables full lifecycle traceability, which is essential for auditability, clinical validation, and trust in analytics.

6. Application & Interaction Layer

At the top layer, PJI exposes all processed data through a set of integrated user-facing applications. Insight Assistant provides a natural-language interface for querying data, generating SQL, and initiating cohort analyses. Patient Journey offers an interactive longitudinal timeline aggregating visits, medications, labs, imaging, and notes. Cohort Studio enables visual filtering, inclusion/exclusion logic, and export tools for patient set construction. Clinical Measures computes and visualizes standardized metrics for individuals or populations. Data Curation Automation supports ontology-driven abstraction and validation of structured clinical registries. Medical Ontologies provides configurable schema definitions aligned with domain standards (e.g., AJCC, mCODE, SEER).

All modules operate over a shared OMOP data layer, ensuring consistency across tools.

7. Technology Stack

PJI is deployed on a scalable, healthcare-grade infrastructure stack including Docker & Kubernetes for container orchestration and scaling, Elasticsearch for high-performance search, PostgreSQL for OMOP-compatible structured storage, and AWS / Azure / GCP for fully cloud-agnostic deployment. This foundation ensures high availability, secure processing, and elastic scaling for healthcare workloads.

The PJI architecture brings together multimodal data ingestion and transformation, AI-driven extraction and standardization pipelines, ontology and terminology-aware enrichment, OMOP-native, audit-ready storage, and end-user applications for research, care, and analytics.

This design empowers healthcare organizations to generate rich, structured, and explainable patient intelligence from fragmented raw data—securely, transparently, and at scale.

1. Data Sources & Ingestion Hub​

2. Ingestion & AI Processing Pipeline​

3. External Knowledge Engines​

4. OMOP Storage Layer​

5. Provenance & Traceability​

6. Application & Interaction Layer​

7. Technology Stack​