Patient Journey Intelligence Platform
What is Patient Journey Intelligence?
Patient Journey Intelligence is a healthcare analytics approach that transforms clinical data into longitudinal patient pathways to support research, population health management, and evidence generation.
How It Works
Healthcare organizations collect vast amounts of clinical data during routine patient care, but most of it remains locked in formats unsuitable for secondary use. Patient Journey Intelligence Platform solves this by continuously transforming raw, multimodal clinical data into standardized, longitudinal patient journeys, enabling research, AI development, quality measurement, and regulatory compliance from a single, reusable data foundation.
Data Integration
Connect to external data sources, ingest clinical documents, and transform raw data into structured, standardized OMOP format
Data Curation
Build and manage clinical ontologies, automate case finding, and maintain expert-reviewed registries with full audit trails
De-identified OMOP
Automatically maintain synchronized identified and de-identified OMOP datasets in parallel, ensuring research and operational data stay up-to-date together
Agents
Deploy pre-built AI assistants for patient co-pilot, journey timelines, and cohort building with conversational interfaces
Patient Registries
Automate cancer, research, and custom registries with AI-powered extraction, NAACCR compliance, and regulatory reporting
Governance
Ensure compliance with AI governance, security best practices, de-identification standards, and comprehensive audit logging
Deploy a Unified Data Platform for Secondary Use
Secondary use means repurposing clinical data originally collected during patient care for research studies, quality improvement programs, population health analytics, disease registries, AI model development, and regulatory reporting. Unlike operational systems designed for real-time clinical workflows, secondary use environments must work with years of historically accumulated data from multiple systems, using inconsistent formats and terminologies, mixing structured and unstructured content, and often containing gaps and conflicts that went unnoticed during original data entry. The promise is compelling: unlock insights from millions of patient encounters, identify at-risk populations, accelerate clinical trials, and train AI models that improve care. But realizing this promise depends on solving three fundamental challenges: data accuracy (does the data reflect what actually happened to patients?), data engineering (can fragmented sources be unified into a coherent foundation?), and AI readiness (can the data support production clinical AI agents with full provenance and compliance?).
The Challenge: Clinical Data Wasn't Built for Analytics
Clinical data is captured primarily for billing and documentation, not analytics or AI. This creates three fundamental gaps that Patient Journey Intelligence addresses:
Data Accuracy Gap
Up to 40% of critical diagnoses exist only in unstructured clinical notes, never coded into structured fields. Treatment rationale, disease progression, and clinical reasoning are documented as free text, invisible to traditional analytics. Without extracting this context, secondary use initiatives work with incomplete patient pictures.
Learn more →
Data Engineering Gap
Each new use case—cancer registry, clinical trial cohort, quality measure, or AI model—requires rebuilding similar data pipelines from scratch. Organizations spend 10+ FTE-years annually on redundant engineering. Patient information is scattered across EHR fields, clinical notes, scanned PDFs, imaging reports, and lab systems with no unified foundation.
Learn more →
AI Governance Gap
Models trained on research datasets fail in production because operational data uses inconsistent terminologies and different preprocessing. Maintaining separate identified and de-identified datasets doubles infrastructure costs. Audit trails, data lineage, provenance tracking, and PHI management require custom tooling that most teams lack.
Learn more →
Real-World Impact
Consider a health system developing a sepsis prediction model:
- Months of data engineering to extract vital signs from EHR tables, parse infection mentions from clinical notes, link lab results across systems
- Critical context missed because antibiotic administration notes were in free text, not discrete orders
- Model fails at deployment because research data used LOINC codes while production EHR uses proprietary lab codes
- New registry project starts from zero six months later, rebuilding similar pipelines for a different condition
This cycle repeats across every analytics initiative, preventing healthcare organizations from compounding their AI investments.
How Patient Journey Intelligence Solves the Secondary Use Bottleneck
Patient Journey Intelligence Platform provides a single, reusable foundation that eliminates redundant pipeline development and ensures every downstream use case operates on the same curated, standardized patient data.
Unified Data Pipeline: Build Once, Use Everywhere
1. Ingest Any Data Source
Connect EHR systems, clinical notes, imaging (DICOM), PDFs, lab feeds, and external registries without extensive preprocessing. The platform handles data as-is, regardless of format or structure.
2. AI-Powered Extraction
John Snow Labs Medical Language Models extract structured facts from unstructured content, detect negations and temporal relationships, and resolve entities across documents, automatically capturing the 40% of clinical context trapped in free text.
3. Standardize to OMOP CDM
All data is mapped to OMOP Common Data Model v5.4 with standard terminologies (SNOMED CT, RxNorm, LOINC, ICD-10). This creates a unified schema compatible with OHDSI analytics, BI tools, and AI frameworks.
4. Continuously Update Patient Journeys
Longitudinal timelines combine all data about each patient into coherent narratives, visits, conditions, medications, procedures, labs, automatically updated as new clinical documents arrive.
Living, Governed, AI Ready Data Assets
Unlike point-in-time data exports, the platform maintains living datasets that stay synchronized with your clinical systems. Every extracted fact includes:
- Provenance: Source documents, extraction timestamps, transformation lineage
- Confidence Scores: ML model certainty for quality control and expert review workflows
- Versioning: Time-stamped updates preserving historical states for reproducibility
- Clinical Context: Complete temporal sequences showing disease progression, treatment response, and care patterns
Multiple Datasets in Parallel
The platform maintains multiple derivative datasets simultaneously from the same source data:
- Identified OMOP (Operational Dataset): Full PHI for clinical operations, care coordination, point-of-care AI, and internal quality improvement
- De-Identified OMOP (Research Dataset): HIPAA Safe Harbor compliant with consistent pseudonyms, date-shifting, and PHI removal for research, external collaborations, and AI model training
- Patient Registry: NAACCR-compliant cancer registries with out-of-the-box support for all cancer sites with automated abstraction and full provenance information
- Custom Registries: Define your own registries by specifying the target fields to curate and the patient cohort to monitor, then let the platform continuously extract and maintain those data points from incoming clinical documentation.
All dataset types share the same underlying clinical facts, curated once and synchronized continuously. This means operational teams work with complete PHI for patient care, researchers access de-identified data for model training and external collaboration, registrars get automated abstraction for regulatory reporting, and custom use cases receive exactly the fields they need. Different governance, access controls, and transformations apply based on intended use, but the source of truth remains unified, eliminating the research-to-production gap.
The Impact: Measurable Improvements Across Your Organization
40% More Complete Data
Capture critical diagnoses, treatment rationale, and clinical context from unstructured notes that structured EHR fields alone miss, eliminating blind spots in cohort identification and outcome measurement.
10+ FTE-Years Saved Annually
Stop rebuilding similar data pipelines for each registry, cohort, dashboard, or AI model. Operate on one shared data foundation that every downstream system can trust.
Weeks to Hours
Automate patient timeline creation, cohort queries, and feature engineering that previously required weeks of manual data engineering, accelerating time-to-insight from months to days.
Built-In Compliance
De-identification, comprehensive audit trails, data lineage tracking, and governance workflows included, no separate tools or custom development required.
Use Cases Enabled
Clinical Research
- Identify trial-eligible patients across all data sources in minutes, not weeks
- Analyze real-world treatment effectiveness with complete medication and outcome timelines
- Federate multi-site studies using standardized OMOP cohorts
AI and Predictive Analytics
- Train models on de-identified research data, deploy on identified operational data with zero feature drift
- Build clinical decision support tools that compound across use cases rather than fragment
- Access pre-extracted features (diagnosis timelines, medication adherence, lab trends) without custom NLP pipelines
Quality Improvement and Population Health
- Measure outcomes against clinical guidelines using standardized terminologies
- Track chronic disease management across all touchpoints
- Detect care gaps and intervention opportunities at scale
Regulatory Reporting and Registries
- Automate cancer registry abstraction with NAACCR compliance
- Generate quality measure reports (HEDIS, CMS) without manual chart review
- Maintain public health surveillance feeds with full data lineage
Built on Open Standards
Patient Journey Intelligence is architected around open standards that ensure your data remains yours: portable, interoperable, and future-proof. This standards-first approach prevents vendor lock-in and enables seamless integration with the broader healthcare AI ecosystem.
OMOP Common Data Model v5.4
All patient data is standardized to OMOP CDM v5.4, the leading observational research standard maintained by the OHDSI community. By adopting OMOP, your data becomes immediately compatible with:
- OHDSI Ecosystem Tools: ATLAS for cohort definitions, ACHILLES for data characterization, CohortMethod for causal inference, and dozens of validated analytics packages
- Multi-Institutional Collaboration: Share study protocols and federated analytics without exchanging raw data. The results remain comparable because everyone speaks the same schema
- Reproducible Research: Published studies using OMOP cohorts can be replicated across institutions, accelerating evidence generation
- AI Model Portability: Train models on standardized features that work across any OMOP dataset, eliminating custom preprocessing for each deployment
Your data stays in your control. OMOP is an open specification with no licensing fees, proprietary formats, or cloud dependencies. If you ever choose to move away from Patient Journey Intelligence, your OMOP data remains fully accessible and usable with any OMOP-compatible tool.
Supported domains: Condition, Drug Exposure, Procedure, Measurement, Observation, Visit, Person, Provider, Device Exposure, Note, Specimen
Standard Medical Terminologies
Clinical concepts are mapped to open, standardized medical vocabularies that enable semantic interoperability across systems. This means a diagnosis coded in your EHR can be automatically aligned with research cohorts, clinical guidelines, and AI models, without manual mapping.
Core Vocabularies:
- SNOMED CT: Comprehensive clinical terminology covering diagnoses, findings, procedures, and anatomical structures
- RxNorm: Standardized drug nomenclature linking brand names, generics, and ingredients
- LOINC: Universal codes for lab tests, clinical observations, and diagnostic studies
- ICD-10-CM: Diagnosis codes for billing and epidemiology, automatically mapped to SNOMED concepts
- HPO (Human Phenotype Ontology): Phenotypic abnormalities for rare disease and genetics research
- UMLS Metathesaurus: Cross-terminology mappings enabling translation between coding systems
Why This Matters: When your data uses standard terminologies, insights from one tool immediately transfer to another. A cohort defined in ATLAS can be directly queried in your BI tool. An AI model trained on SNOMED-coded features will work on any OMOP dataset. Clinical decision support rules written once apply everywhere.
This eliminates the "translation tax" where each new application requires custom data dictionaries, and ensures your AI investments compound rather than fragment.
Model Context Protocol (MCP)
All platform capabilities (e.g. data extraction, cohort queries, patient timelines, registry abstraction, etc.) are exposed via Model Context Protocol (MCP), an open standard for AI agent interoperability developed by Anthropic.
What MCP Enables:
- Composable Workflows: AI agents can invoke platform tools (e.g., "find patients with diabetic retinopathy") and combine them with external capabilities (e.g., scheduling, EHR writes) in multi-step workflows.
- Tool Discovery: Agents automatically discover available functions, parameters, and data schemas. No hardcoded integrations are required.
- Ecosystem Integration: Any MCP compatible agent can access your curated patient data, while platform agents can leverage external MCP tools for scheduling, notifications, or real-time data feeds.
- Custom Extensions: Build internal MCP tools that expose proprietary logic or institutional data, making them instantly available to all agents.
Example Use Case: A clinical research coordinator asks an AI agent to "identify eligible patients for the diabetes trial and draft recruitment letters." The agent uses MCP to query your OMOP cohort, retrieve patient summaries, and compose personalized outreach, all without custom API development.
By standardizing on MCP, Patient Journey Intelligence becomes a composable platform rather than a closed system, enabling you to build sophisticated agentic workflows that span clinical operations, research, and analytics.
Get Started
Patient Journey Intelligence is deployed within your infrastructure, it is not framed as SaaS. Your clinical data never leaves your environment, ensuring complete control over security, compliance, and data governance. The platform runs on your chosen cloud provider (AWS, Azure, Google Cloud), data warehouse (Snowflake, Databricks), or on-premise Kubernetes cluster.
The John Snow Labs team deploys and configures the platform for you. We handle infrastructure setup, data source integration, clinical workflow configuration, and provide comprehensive team training. Initial deployment typically takes 12 weeks from kickoff to production-ready OMOP datasets. Contact us at sales@johnsnowlabs.com to get started.
Three steps to AI-ready clinical data
Assess Your Data Landscape
Evaluate your data sources, infrastructure, and governance needs.
Integrate & Curate
Integrate patient data from your private sources, extract relevant medical information, reason and normalize data point, translate to OMOP.
Create Applications and Agents
Create cohorts, compute measures, visualize patient journeys, and extract features for AI.
FAQ
Patient Journey Intelligence is a platform by John Snow Labs that transforms raw, multimodal clinical data into standardized, longitudinal patient journeys using OMOP Common Data Model v5.4. It enables healthcare organizations to reuse clinical data collected during routine care for research, AI development, quality measurement, registry automation, and regulatory reporting — all from a single, continuously updated data foundation.
Patient Journey Intelligence is built for healthcare organizations that need to unlock the secondary use of clinical data. Primary users include clinical research teams, data science and AI groups, quality improvement departments, population health analysts, registry abstractors, and healthcare IT leaders responsible for data infrastructure and governance.
Patient Journey Intelligence is not designed for primary clinical care delivery, real-time clinical documentation, or EHR replacement. Organizations looking for a general-purpose EHR, a patient portal, or a billing system should look elsewhere. The platform focuses exclusively on secondary use of clinical data — research, analytics, AI, registries, and quality measurement.
Traditional data warehouses and ETL pipelines require rebuilding similar data engineering work for each new use case — a cancer registry, a clinical trial cohort, a quality measure, or an AI model. Patient Journey Intelligence eliminates this redundancy by creating a single, shared data foundation that all downstream applications build on. It also extracts structured facts from unstructured clinical notes using Medical Language Models, capturing up to 40% more clinical information than structured EHR fields alone.
The platform ingests data from structured EHR systems (via FHIR, HL7 v2), free-text clinical notes, scanned PDFs, imaging metadata (DICOM), lab feeds, claims and billing data, and external registries. It handles data as-is regardless of format, and continuously updates patient journeys as new clinical documents arrive.
OMOP Common Data Model v5.4 is an open standard for observational health research, maintained by the OHDSI community and adopted by over 400 institutions worldwide. Patient Journey Intelligence standardizes all patient data to OMOP, making it immediately compatible with OHDSI ecosystem tools (ATLAS, ACHILLES, CohortMethod), multi-institutional research collaboration, and AI model training with portable features. OMOP is an open specification with no licensing fees or proprietary formats.
Patient Journey Intelligence uses John Snow Labs Medical Language Models to extract structured clinical facts from unstructured text — including diagnoses, medications, procedures, lab interpretations, and clinical reasoning. The models detect negation ("no evidence of pneumonia"), temporal relationships, and assertion status, preserving clinical context that naive text search would miss.
The platform normalizes clinical concepts to SNOMED CT (conditions, procedures, findings), RxNorm (medications), LOINC (lab tests), ICD-10-CM (diagnosis codes), HPO (phenotypic abnormalities), and UMLS Metathesaurus for cross-terminology mapping. Over 40 additional specialized vocabularies are also supported, including NDC, HCPCS, MedDRA, ICD-O-3, ATC, and CVX.
The platform is deployed entirely within your infrastructure — on-premises, private cloud, or your chosen cloud provider. No PHI ever leaves your security perimeter. All medical LLMs run locally, with no data transmitted to third-party model providers. The platform includes encryption at rest (AES-256) and in transit (TLS 1.3), role-based access control, SSO integration, comprehensive audit logging, and automated de-identification pipelines supporting both HIPAA Safe Harbor and Expert Determination methods.
Yes. The platform automatically maintains two synchronized OMOP datasets from the same source data: an identified dataset for clinical operations and internal quality improvement, and a de-identified dataset (HIPAA Safe Harbor compliant) for research, external collaboration, and AI model training. Both datasets share identical feature definitions and terminology mappings, eliminating the research-to-production gap.
Patient Journey Intelligence includes four production-ready agents: the Patient Copilot for conversational queries about individual patients, the Patient Journey Viewer for interactive longitudinal timeline visualization, the Cohort Builder for no-code patient population definition, and the Patient Registry agent for automated cancer and custom registry abstraction with human-in-the-loop review.
Yes. All platform capabilities are exposed through Model Context Protocol (MCP), REST APIs, and direct SQL access to OMOP datasets. Custom agents inherit the platform's AI-ready data foundation, security infrastructure, compliance controls, and provenance tracking without additional development. MCP enables agents to dynamically discover available tools and compose multi-step clinical workflows.
Patient Journey Intelligence provides pre-configured templates for cancer registries (NAACCR-compliant with automated case finding, AJCC staging, and state/national submissions) and de-identified OMOP research registries. Cardiovascular and rare disease registry templates are planned. Custom registries for institutional quality improvement, disease-specific research, or specialty-specific tracking can also be configured.
Every clinical fact in the platform includes full lineage: the original source document, extraction method (structured import or NLP with model version), terminology normalization steps, confidence scores, and timestamps. When AI agents use these facts, the reasoning chain is also captured. This allows any output to be traced back to its source documentation for audit, regulatory compliance, or clinical verification.
Initial deployment typically takes 12 weeks from kickoff to production-ready OMOP datasets. The John Snow Labs team handles infrastructure setup, data source integration, clinical workflow configuration, and team training. The platform runs on AWS, Azure, Google Cloud, Snowflake, Databricks, or on-premises Kubernetes clusters.