Question 1

What is secondary use of clinical data?

Accepted Answer

Secondary use means taking clinical data originally collected during patient care and repurposing it for research studies, quality improvement programs, population health analytics, disease registries, AI model development, public health surveillance, and healthcare operations optimization. Unlike primary use for direct treatment, secondary use works with historically accumulated data from multiple systems.

Question 2

What is unstructured EHR extraction and why does it matter?

Accepted Answer

Unstructured EHR extraction is the process of automatically identifying and converting clinical information from free-text sources such as physician notes, discharge summaries, pathology reports, radiology narratives, and operative records, into structured, queryable data. Approximately 80% of healthcare data is unstructured. Without automated unstructured EHR extraction, analytics and AI systems can only access the minority of clinical information that clinicians entered into structured fields, systematically missing diagnoses, medications, social determinants, biomarkers, and clinical context that exist only in narrative text.

Question 3

What is NLP medical entity extraction in healthcare?

Accepted Answer

NLP medical entity extraction is the use of language models trained on clinical text to identify and classify medical entities such as diagnoses, medications, procedures, findings, anatomical locations, and their attributes, from unstructured clinical notes. Unlike general-purpose NLP, healthcare-specific models understand clinical nuance: negation ("no evidence of pneumonia"), assertion status (confirmed vs. ruled-out vs. family history), uncertainty, and temporal relationships. Patient Journey Intelligence uses NLP medical entity extraction to surface clinical facts that never appear in structured EHR fields, achieving 85–95% precision on clinical extraction tasks, approximately 30% more accurate than general-purpose LLMs.

Question 4

What is the clinical data accuracy gap in healthcare analytics?

Accepted Answer

The clinical data accuracy gap refers to the systematic inaccuracy that occurs when secondary use projects operate on incomplete patient data. Studies show that 40% of diagnoses, 81% of suicidality mentions, 68% of cancer staging data, and over 90% of social determinants of health exist only in unstructured clinical notes and are missed by structured-only approaches.

Question 5

Why do structured EHR fields miss so much clinical information?

Accepted Answer

Structured EHR fields capture only a subset of clinically relevant information and frequently omit context such as certainty, negation, temporality, disease progression, adverse events, social factors, and clinician reasoning. Studies show that 87% of extracted clinical concepts exist solely in free-text narratives with no structured counterpart.

Question 6

What are the eight requirements for accurate secondary use of clinical data?

Accepted Answer

Patient Journey Intelligence addresses eight requirements: complete multimodal data integration, healthcare-specific NLP, terminology standardization (SNOMED CT, RxNorm, LOINC, ICD-10-CM), clinical reasoning and conflict resolution, longitudinal patient timelines, privacy and de-identification, provenance and auditability, and continuous updates with living datasets.

Question 7

How accurate is Patient Journey Intelligence healthcare-specific NLP compared to general-purpose LLMs?

Accepted Answer

Healthcare-specific NLP models achieve 85–95% precision on clinical extraction tasks, approximately 30% more accurate than general-purpose LLMs. These models understand clinical context including negation, uncertainty, assertion status (confirmed vs. ruled-out vs. family history), and temporal relationships that general AI tools miss.

Question 8

How does Patient Journey Intelligence handle negation in clinical notes?

Accepted Answer

Patient Journey Intelligence uses healthcare-specific NLP that detects negation ("no evidence of pneumonia"), uncertainty, and assertion status in clinical text. This prevents common errors like treating a ruled-out condition as a confirmed diagnosis, which naive text search or general-purpose AI would miss.

Question 9

What timeline completeness does Patient Journey Intelligence achieve?

Accepted Answer

Patient Journey Intelligence delivers 96%+ timeline completeness by extracting clinical facts from all data modalities: structured fields, clinical notes, scanned PDFs, imaging metadata, and claims data. This compares to 60% or less completeness with structured-only approaches that miss information in unstructured text.

Question 10

How does Patient Journey Intelligence accelerate patient timeline construction?

Accepted Answer

Patient timeline construction that previously took weeks of manual abstraction now completes in hours with Patient Journey Intelligence. Organizations analyze 6× more patients in the same timeframe while achieving higher completeness and consistency than manual review workflows.

Question 11

Who is Patient Journey Intelligence designed for?

Accepted Answer

Patient Journey Intelligence is designed for clinical research teams, quality improvement departments, population health analysts, registry abstractors, data science groups, and healthcare IT leaders who need accurate, complete patient data for secondary use. The platform is most valuable when accuracy matters and structured-only data is insufficient.

Question 12

Who should not use Patient Journey Intelligence?

Accepted Answer

Patient Journey Intelligence is not designed for primary clinical care delivery, real-time EHR documentation, or billing system replacement. Organizations that only need simple structured data extracts without NLP, terminology normalization, or multimodal integration may not require the platform's full capabilities.

Question 13

How is Patient Journey Intelligence different from a clinical data warehouse?

Accepted Answer

Traditional clinical data warehouses store structured EHR extracts but miss 40%+ of clinical information in unstructured notes. Patient Journey Intelligence extracts facts from all modalities using healthcare-specific NLP, normalizes to standard vocabularies, resolves conflicts, maintains temporal relationships, and provides living datasets with full provenance—delivering complete secondary use rather than just data storage.

Question 14

How does Patient Journey Intelligence support regulatory compliance?

Accepted Answer

Patient Journey Intelligence provides complete lineage tracking from source document through extraction to final OMOP representation. Every clinical fact includes confidence scores, extraction method, and direct links to supporting evidence. This provenance chain supports HIPAA, 21 CFR Part 11, IRB requirements, and audit trails for research reproducibility.

Question 15

Can Patient Journey Intelligence maintain both identified and de-identified datasets?

Accepted Answer

Yes. Patient Journey Intelligence maintains parallel identified and de-identified datasets from the same source data, kept semantically synchronized with identical feature definitions. This enables seamless progression from research (de-identified) to production (identified) without pipeline rewrites. De-identification achieves 99%+ accuracy.

Question 16

What is a living dataset in Patient Journey Intelligence?

Accepted Answer

A living dataset is continuously updated as new clinical data arrives, rather than being a static snapshot that becomes stale. Patient Journey Intelligence keeps patient journeys current automatically, ensuring AI agents and analytics always operate on complete, up-to-date patient representations.

Clinical Data Type	Structured EHR Capture	Unstructured Notes	Invisible to Structured Queries	Clinical Implication
Diagnoses	~60%	~40% found only in notes	40% missed	Cohort queries miss nearly half of eligible patients; clinical trial recruitment and population health programs (utilizing ICD-10 and SNOMED standards) are systematically underpowered.
Family History	~5%	~59%	12x discrepancy	Risk stratification models built on structured data alone lose their most heritable predictive signal, producing unreliable genetic risk scores.
SDOH (Social Determinants)	2% (ICD-10 Z-codes)	93.8% (NLP on notes)	46x discrepancy	Readmission and outcome models ignore the social drivers that explain most variation, making interventions ineffective and equity analyses impossible.
Cancer Staging	<32%	>68% missing from structured fields	68%+ gap	Cancer registries and real-world evidence studies cannot accurately report stage distribution or treatment outcomes without pathology narrative extraction.
Medication Histories	30–40% accurate	60–70% contain errors	Majority inaccurate	Pharmacovigilance studies, adherence analyses, and drug safety signals derived from structured medication records are built on a foundation that is wrong more often than right.
Suicide / Self-harm Events	<19% coded	>81% only in notes	>81% missed	Mental health risk models and quality measures relying on ICD codes miss the overwhelming majority of at-risk patients, with direct patient safety consequences.
All Clinical Concepts	13% have structured counterparts	87% in free text only	87% gap	Any AI model trained on structured EHR fields alone operates on less than one-seventh of the available clinical signal, making high accuracy on complex clinical tasks structurally unachievable.

Key Statistics at a Glance: Structured EHR vs. Unstructured Clinical Notes

The Clinical Data Accuracy Challenge: Incomplete Patient Views Lead to Wrong Results​

Incomplete Patient Data Creates Systematic Inaccuracy​

⚠️ The Structured-Only Trap

The Research Evidence: How Much Clinical Information Do Structured-Only Systems Miss?​

Clinical Diagnoses: The 40% Gap​

Family History: The 12x Discrepancy​

Social Determinants of Health: 93.8% vs. Minority​

Oncology: The Structured Data Gap​

Medication Reconciliation: Discrepancies Everywhere​

Suicidality and Self-Harm: The Coding Gap​

Clinical Prediction: The Unstructured Data Advantage​

The Root Cause of Clinical Data Accuracy Gap: Lack of Continuous Multimodal Data Integration​

✅ The Path Forward

What a Secondary Use Clinical Data Platform Requires​

1. Multimodal Clinical Data Integration

2. Healthcare-Specific NLP

3. Terminology Standardization

4. Clinical Reasoning & Conflict Resolution

5. Longitudinal Patient Timelines

6. Privacy & De-Identification

7. Provenance & Auditability

8. Continuous Updates & Living Datasets

How Patient Journey Intelligence Solves the Clinical Data Accuracy Gap​

Multimodal Clinical Data Integration

NLP Medical Entity Extraction

Terminology Standardization

Clinical Reasoning

Longitudinal Timelines

Privacy & De-Identification

Provenance & Auditability

Continuous Updates

What You Gain: Accuracy, Speed, and Reuse​

Speed to Insight

Improved Completeness

Reuse at Scale

Embedded Governance

Regulatory Readiness

Future-Proof AI

The Result: A Repeatable Operating Model for Secondary Use of Clinical Data​

💡 The Bottom Line

FAQ​

What is secondary use of clinical data?

What is unstructured EHR extraction and why does it matter?

What is NLP medical entity extraction in healthcare?

What is the clinical data accuracy gap in healthcare analytics?

Why do structured EHR fields miss so much clinical information?

What are the eight requirements for accurate secondary use of clinical data?

How accurate is Patient Journey Intelligence healthcare-specific NLP compared to general-purpose LLMs?

How does Patient Journey Intelligence handle negation in clinical notes?

What timeline completeness does Patient Journey Intelligence achieve?

How does Patient Journey Intelligence accelerate patient timeline construction?

Who is Patient Journey Intelligence designed for?

Who should not use Patient Journey Intelligence?

How is Patient Journey Intelligence different from a clinical data warehouse?

How does Patient Journey Intelligence support regulatory compliance?

Can Patient Journey Intelligence maintain both identified and de-identified datasets?

What is a living dataset in Patient Journey Intelligence?

The Clinical Data Accuracy Challenge: Incomplete Patient Views Lead to Wrong Results

Incomplete Patient Data Creates Systematic Inaccuracy

The Research Evidence: How Much Clinical Information Do Structured-Only Systems Miss?

Clinical Diagnoses: The 40% Gap

Family History: The 12x Discrepancy

Social Determinants of Health: 93.8% vs. Minority

Oncology: The Structured Data Gap

Medication Reconciliation: Discrepancies Everywhere

Suicidality and Self-Harm: The Coding Gap

Clinical Prediction: The Unstructured Data Advantage

The Root Cause of Clinical Data Accuracy Gap: Lack of Continuous Multimodal Data Integration

What a Secondary Use Clinical Data Platform Requires

How Patient Journey Intelligence Solves the Clinical Data Accuracy Gap

What You Gain: Accuracy, Speed, and Reuse

The Result: A Repeatable Operating Model for Secondary Use of Clinical Data

FAQ