NAACCR-Compliant Cancer Registry Automation
Cancer registries are foundational to oncology surveillance, quality improvement, and clinical research, yet traditional registry abstraction remains one of healthcare's most labor-intensive processes, requiring certified tumor registrars to manually extract hundreds of data points from fragmented clinical documentation.
Patient Journey Intelligence transforms cancer registry operations from a manual burden into an automated, audit-ready workflow, using AI-powered case finding and clinical data extraction to reduce abstraction time by 80% while capturing more complete data faster than human abstractors working alone.
Key Capabilities
The Cancer Registry module provides six integrated capabilities that work together to automate the entire abstraction workflow, from initial case detection through regulatory submission.
Automated Case Finding
Multi-source detection: ICD-O-3 codes, pathology reports, radiology findings, treatment orders, death certificates
AJCC TNM Staging
Automated staging (AJCC 8th Edition), histology/grade extraction, biomarkers (ER, PR, HER2, PD-L1), lymph nodes, metastasis
Treatment Tracking
Surgical procedures, chemotherapy regimens, radiation therapy, immunotherapy, targeted therapy, clinical trials
Longitudinal Outcomes
Continuous monitoring: disease progression, recurrence, metastasis, survival status, quality of life indicators
Registrar Review Workflows
Evidence-based review interface, side-by-side source documents, confidence scoring, one-click corrections, audit trails
NAACCR Compliance
NAACCR v23 export, state registry submissions, SEER reporting, CoC accreditation, automated edit checks
The Cancer Registry Abstraction Challenge
Cancer registries are mandated infrastructure for accredited cancer programs, state public health surveillance, and national epidemiology through SEER and NAACCR. Yet maintaining these registries remains one of oncology's most resource-intensive operational challenges.
The Manual Abstraction Bottleneck
Cancer registry abstraction traditionally requires an average of 2 hours per case of manual chart review by certified tumor registrars, though complex cases can take significantly longer. For a health system with 2,000 new cancer cases annually, this represents 4,000 hours of specialized labor. At the national level, with approximately 2.1 million new cancer diagnoses projected in 2026, the United States requires roughly 4.2 million hours of manual cancer registry abstraction annually.
This massive labor requirement creates persistent challenges:
Case Identification Delays: Relying on ICD-10 diagnosis codes means cases often aren't identified until weeks after diagnosis, when billing codes are finalized. Critical staging information from early pathology reports and imaging studies may be documented long before structured codes appear in the EHR.
Fragmented Documentation: Each NAACCR data item may require reviewing dozens of documents across multiple systems. A single breast cancer case might involve pathology reports for primary tumor characteristics, surgical operative notes for margin status, imaging reports for metastasis detection, oncology notes for treatment planning, pharmacy records for chemotherapy regimens, and radiation oncology documentation for radiation therapy details.
Specialized Expertise Shortage: Certified tumor registrars require extensive training in oncology, anatomy, staging systems (AJCC TNM), and coding standards (ICD-O-3). Many facilities struggle to recruit and retain qualified registrars, creating backlogs that delay reporting and limit registry capacity.
Timeliness Standards: Commission on Cancer (CoC) accreditation requires 90% of cases to be abstracted within six months of diagnosis. Meeting this standard while maintaining data quality consumes significant registrar capacity, leaving little time for quality improvement initiatives or research support.
Data Completeness Trade-offs: Under time pressure, registrars may prioritize mandatory fields for compliance reporting while leaving optional research-valuable fields (biomarkers, comorbidities, treatment response) incomplete or unvalidated.
The Compliance Burden
Beyond abstraction time, cancer registries must maintain strict compliance with evolving standards. NAACCR releases updated data standards annually, requiring registries to adapt abstraction protocols, update edit checks, and ensure historical data remains comparable. Clinical staging systems themselves evolve, with AJCC publishing new editions of the TNM staging manual every 6-8 years, each introducing changes to staging criteria, new biomarker requirements, and revised anatomic classifications that registrars must master. State registries have unique reporting requirements, submission formats, and timeliness deadlines. CoC accreditation demands comprehensive documentation of abstraction procedures, inter-rater reliability testing, and quality assurance processes.
This compliance overhead adds administrative burden to already resource-constrained registry teams, diverting effort from the core mission of capturing complete, accurate oncology data. Keeping registrars trained and certified on the latest AJCC editions, NAACCR standards, and ICD-O-3 coding updates requires continuous education, reference manual updates, and workflow adjustments that multiply the operational complexity of registry maintenance.
How Patient Journey Intelligence Solves Cancer Registry Challenges
The Cancer Registry module transforms abstraction from a manual burden into an automated, audit-ready workflow, enabling your registrar team to maintain more cases with higher data quality while meeting all compliance requirements.
Continuous Automated Case Finding
The system continuously monitors your electronic health records for new cancer cases across multiple detection pathways, ensuring no reportable cases are missed and cases are identified as early as possible:
Structured Code Monitoring
Automatically detects new ICD-O-3 diagnosis codes, ICD-10-CM cancer diagnoses, and oncology-related procedure codes the moment they appear in your EHR.
Unstructured Clinical Note Screening
Medical Language Models analyze clinical notes, pathology reports, and consultation documentation to identify cancer mentions before formal diagnosis coding occurs, often detecting cases weeks earlier than code-based methods alone.
Pathology Report Parsing
Every pathology report is automatically screened for cancer diagnoses, malignancy classifications, and suspicious findings, triggering case creation for biopsy-confirmed malignancies.
Radiology Findings Analysis
Imaging reports (CT, MRI, PET scans) are analyzed for primary tumor identification, staging information, and metastasis detection, linking radiologic findings to existing cases or creating new ones.
Treatment Indication Detection
Oncology treatment orders (chemotherapy, radiation therapy, immunotherapy) are monitored to identify cases where cancer treatment was initiated before formal registry abstraction, preventing gaps in case capture.
This multi-pathway case finding ensures comprehensive coverage while identifying cases earlier in their diagnostic journey, giving registrars more time to complete abstraction before timeliness deadlines.
AI-Powered NAACCR Data Extraction
Once a case is detected, the platform automatically extracts 400+ NAACCR-reportable data elements from clinical documentation, transforming hours of manual chart review into minutes of focused validation. Medical Language Models trained on oncology documentation extract structured facts from unstructured clinical notes, pathology reports, operative notes, imaging reports, and treatment plans.
Patient Demographics & Identification
Name, date of birth, sex, race, ethnicity, address, Social Security number, insurance status, and occupation are drawn from structured EHR fields and supplemented by unstructured documentation when necessary for completeness.
Primary Site and Histology
ICD-O-3 topography and morphology codes extracted from pathology reports, with laterality, behavior codes, and grade determination.
AJCC TNM Staging
Clinical and pathologic T, N, M categories extracted from pathology reports, imaging studies, and clinical documentation, with automatic stage group calculation following AJCC 8th Edition guidelines.
Tumor Characteristics
Grade, differentiation, tumor size, extension, lymph node involvement, and site-specific data items (SSDIs) extracted from pathology and surgical reports.
Biomarker Status
Estrogen receptor (ER), progesterone receptor (PR), HER2, Ki-67, PD-L1, and other prognostic markers extracted from immunohistochemistry and molecular testing reports.
Treatment Details
Surgical procedures with margins and scope, chemotherapy regimens with cycles and doses, radiation therapy with modality and anatomic sites, immunotherapy and targeted therapy agents, all extracted from operative notes, chemotherapy orders, and radiation planning documentation.
Longitudinal Outcomes
Disease progression, recurrence detection, metastasis identification, survival status, and cause of death tracked continuously as new clinical documentation arrives, keeping registry data current without manual follow-up searches.
Every extracted value includes provenance tracking: the source document, specific text supporting the extraction, AI confidence score, and extraction timestamp, creating a complete audit trail from clinical documentation to registry data.
Certified Registrar Validation Workflow
Rather than replacing human expertise, the Cancer Registry module optimizes it through intelligent human-in-the-loop review. Certified tumor registrars focus their time on validating AI-extracted data and resolving ambiguous cases instead of manually searching through charts and transcribing information.
Structured Validation Interface
Registrars review a pre-populated NAACCR abstraction form with all data items already extracted, each field showing the AI-extracted value, confidence score, and direct link to source evidence.
Evidence-Based Review
Clicking any data item displays the source documentation, pathology report text, imaging findings, clinical note excerpts, with relevant passages highlighted, enabling registrars to verify accuracy without searching through the EHR manually.
One-Click Corrections
If the AI extraction is incorrect or incomplete, registrars can edit the value directly in the validation interface. The system logs who made the change, when it occurred, and what the original AI-extracted value was, maintaining complete audit trail transparency.
Quality Assurance Tools
Built-in NAACCR edit checks flag impossible or unlikely value combinations before submission. Inter-rater review workflows support dual abstraction for quality assurance, with discrepancy tracking and resolution.
This AI-registrar partnership reduces abstraction time from 2 hours per case to 20-30 minutes of focused validation, enabling your registrar team to maintain significantly more cases without sacrificing quality or falling behind on reporting deadlines.
NAACCR-Compliant Regulatory Reporting
When cases are ready for submission, the system generates NAACCR-compliant export files formatted for direct submission to state and national cancer registries. All required fields are populated and validated against registry specifications, with edit checks applied automatically before export.
The platform supports NAACCR v25 format, state-specific reporting requirements, SEER program submissions, and Commission on Cancer (CoC) accreditation needs, ensuring your registry data meets all compliance standards without manual file formatting or edit check scripting.
Automatic Extraction of all NAACCR Data Fields
Once cases are identified, the platform automatically extracts 400+ NAACCR-reportable data elements from clinical documentation, transforming hours of manual chart review into minutes of focused validation. Medical Language Models trained on oncology documentation extract structured facts from unstructured clinical notes, pathology reports, operative notes, imaging reports, and treatment plans.
400+ Automatically Extracted NAACCR Data Items
The Cancer Registry automatically extracts, codes, and validates all required NAACCR data items from your clinical documentation, covering patient identification, cancer characteristics, staging, treatment, and outcomes.
Supported NAACCR Record Layout Categories
The platform extracts data across all major sections of the NAACCR record layout specification, ensuring comprehensive coverage of required data items for state and national registry reporting.
Record ID
Demographic
Cancer Identification
Hospital-Specific
Stage/Prognostic Factors
Treatment-1st Course
Treatment-Subsequent & Other
Follow-up/Recurrence/Death
Edit Overrides/Conversion History
Patient Confidential
Hospital-Confidential
Other-Confidential
Text-Diagnosis
Text-Treatment
Text-Miscellaneous
Special Use
Pathology
Pathology Report Processing Example
Input: Free-text surgical pathology report
DIAGNOSIS: LEFT BREAST, LUMPECTOMY:
- Invasive ductal carcinoma, Grade 2
- Tumor size: 1.8 cm in greatest dimension
- Margins: All margins negative, closest margin 0.3 cm
- Lymphovascular invasion: Present
- ER: Positive (90%, strong)
- PR: Positive (70%, moderate)
- HER2: Negative (IHC 1+)
- Ki-67: 25%
Extracted Data:
- Primary Site: C50.9 (Breast, NOS)
- Histology: 8500/3 (Infiltrating duct carcinoma)
- Grade: 2 (Moderately differentiated)
- Tumor Size: 18 mm
- ER Status: Positive
- PR Status: Positive
- HER2 Status: Negative
- CS Extension: 310 (Confined to breast)
Radiology Report Processing Example
Input: PET/CT report
IMPRESSION:
1. Hypermetabolic right upper lobe mass (SUV 8.2), suspicious for primary lung malignancy
2. FDG-avid right hilar and mediastinal lymph nodes (SUV 4.5-6.1)
3. No evidence of distant metastatic disease
Extracted Data:
- Primary Site Confirmed: C34.1 (Upper lobe, lung)
- Regional Lymph Nodes: Positive (hilar + mediastinal)
- Distant Metastasis: M0 (No distant mets)
- Clinical Stage: At least Stage IIIA
Comprehensive Cancer Site Coverage
The Cancer Registry module supports all major cancer sites with site-specific data item extraction, AJCC staging rules, and NAACCR reporting requirements. Medical Language Models are trained on oncology documentation for each cancer type, ensuring accurate extraction of histology, grade, biomarkers, and staging variables specific to that anatomic site.
Breast
Thorax (Lung)
Lower GI Tract (Colon, Rectum)
Male Genital (Prostate)
Head and Neck
Urinary Tract (Bladder, Kidney)
Upper GI Tract (Esophagus, Stomach)
Female Reproductive (Ovary, Cervix, Uterus)
Hematologic (Leukemia, Lymphoma, Myeloma)
Central Nervous System
Hepatobiliary (Liver, Pancreas)
Skin (Melanoma)
Endocrine (Thyroid)
Soft Tissue Sarcoma
Bone
Neuroendocrine Tumors
Ophthalmic Sites
Registrar Validation & Review
Automated extraction provides the foundation, but cancer registry standards demand certified expertise. The Cancer Registry module optimizes human judgment through structured validation workflows where certified tumor registrars review AI-extracted data with supporting evidence already assembled, ensuring every case meets NAACCR standards and state reporting requirements.
Registry Project Workflow
Cancer registry work is organized into Registry Projects, focused collections of cases for a specific cancer site, reporting period, or facility. Each project has its own configuration, assigned registrar team, and validation workflow.
Project Assignment
When you create a registry project, you define which certified tumor registrars will validate the automated abstractions. Projects can be assigned based on cancer site expertise (e.g., breast cancer specialists handle breast cases), facility coverage (registrars cover specific hospitals), or workload distribution to balance case volume across your team.
The project dashboard shows each registrar which cases require their review, preventing duplicate work and ensuring complete coverage of all cases.
Patient Dashboard for Validation
Once automation completes, registrars see all patients in the project dashboard with key information at a glance: patient demographics, cancer site and histology, date of diagnosis, abstraction completion status, and fields flagged for review.
Registrars can filter by completion status, cancer type, or time period to prioritize their work and track progress through the validation queue.
Field-Level Validation with Evidence
For each patient case, registrars review a structured form presenting all NAACCR data items extracted by the AI. Each field shows:
Extracted Value: The data element extracted by AI (e.g., "pT3" for pathologic T stage)
Source Attribution: Whether the value came from AI extraction or manual entry, with confidence scores for AI-extracted values
Evidence Trail: Direct links to source documents, pathology reports, operative notes, clinical notes, radiology reports, or structured EHR data, with the specific text or data element that supports the extracted value highlighted
Validation Actions: Registrars can accept the AI-extracted value as correct, edit the value if the extraction was incorrect or incomplete, or flag the field for discussion if clinical judgment is needed
This evidence-based validation ensures registrars spend their time verifying accuracy rather than hunting through charts for information.
Source Documentation Access
Every extracted value includes direct access to its source evidence. When reviewing primary site, the system shows the pathology report text describing the tumor location. For staging, it displays the relevant portions of pathology, imaging, and clinical notes that establish T, N, and M categories. Treatment data links to operative notes, chemotherapy orders, and radiation planning documents.
This immediate evidence access accelerates validation as registrars don't search through the EHR manually; they review the AI's work with supporting documentation already presented.
Complete Audit Trail
Every interaction with the registry data is logged with full provenance:
- AI Extractions: When was the value extracted, the confidence, and source document reference
- Registrar Edits: Who modified which fields, what the original and new values were, when the change occurred, and optional notes explaining the rationale
- Validation Status: Which registrars reviewed which cases, validation timestamps, and approval status
- Source Attribution: Whether each data element came from structured EHR data, clinical notes (with NLP extraction), pathology reports, or manual registrar entry
This comprehensive audit trail satisfies regulatory requirements, supports quality audits, and provides complete transparency for how every data element was obtained and validated.
Quality Assurance Features
The Cancer Registry module provides tools to maintain data quality throughout the validation process:
Pre-built Edit Checks: NAACCR-compliant edit checks run automatically, flagging impossible or unlikely value combinations (e.g., prostate cancer in female patients, dates out of sequence) before submission
Inter-Rater Review: Cases can be assigned to multiple registrars for dual review, with the system tracking agreement rates and flagging discrepancies for discussion
Supervisor Review: Senior registrars can conduct final quality reviews before cases are marked complete, ensuring consistent interpretation across the registry
Completion Tracking: The dashboard shows validation progress at both individual registrar and overall project levels, helping you manage workload and meet reporting deadlines
The AI-Registrar Partnership
The Cancer Registry workflow embodies the principle that AI and certified expertise work best together.
Automated extraction handles the time-consuming chart review, reading thousands of pages of clinical documentation to find relevant NAACCR data items.
Certified tumor registrars provide the clinical judgment by validating complex staging scenarios, resolving ambiguous documentation, ensuring coding accuracy, and applying registry standards consistently.
This partnership reduces abstraction time from 2 hours per case to 10-15 minutes of focused validation, letting your registrar team maintain more cases without sacrificing quality or falling behind on reporting deadlines.
Measurable Impact on Registry Operations
Automating cancer registry abstraction with AI-powered extraction and evidence-based validation delivers quantifiable improvements across registry efficiency, data quality, and compliance performance.
80% Reduction in Abstraction Time
Decrease per-case abstraction from 2 hours to 20-30 minutes of focused validation, enabling your registrar team to maintain significantly more cases without increasing FTEs or falling behind on timeliness standards.
Improved Data Quality and Consistency
Standardized AI extraction reduces coding variability and inter-rater discrepancies, while comprehensive audit trails document every data element's provenance and validation history for quality assurance and regulatory review.
100% Automated Case Finding
Multi-source case finding across diagnosis codes, pathology reports, imaging findings, and treatment orders ensures no reportable cancers are missed, while identifying cases earlier than code-based methods alone.
Meet CoC Timeliness Standards
Automated workflows and early case identification help you consistently achieve Commission on Cancer (CoC) requirements for 90% of cases abstracted within six months of diagnosis without manual escalation or overtime burdens.
More Complete Research Data
Automated extraction captures research-valuable fields like biomarkers, comorbidities, treatment response, and clinical trial participation that manual abstractors often skip under time pressure, enhancing registry value for quality improvement and outcomes research.
Operational Cost Savings
Reduce registrar FTE requirements for routine abstraction, enabling staff reallocation to quality improvement initiatives, special studies, and research support, or simply maintain growing case volumes without proportional staffing increases.
Setting Up Your Cancer Registry Project
Getting started with automated cancer registry abstraction follows a straightforward configuration process. The John Snow Labs team works with your registry leadership and IT team to establish data connections, configure case finding rules, and train your registrar team on the validation workflow.
Simply configure your registry and start the automatic abstraction
Connect Data Sources
Integrate your EHR systems, pathology databases, imaging repositories, and oncology documentation sources to enable comprehensive automated case finding and data extraction.
Configure Registry Project
Define cancer sites, reporting state, NAACCR version, and case finding criteria. Customize which data items to extract based on your registry's reporting requirements and research objectives.
Assign Registrar Team
Designate certified tumor registrars who will validate automated abstractions, ensuring clinical expertise guides data quality and coding accuracy throughout the workflow.
Launch Automated Abstraction
Start the AI extraction process. Cases appear in the registrar dashboard with pre-populated NAACCR data items, ready for validation and quality review.
Registrar Review & Validation
Certified tumor registrars review AI-extracted abstractions, validate data accuracy against source documents, correct any errors, and approve cases for submission with complete audit trail.
Submit Registry Reports
Export validated cases as NAACCR-compliant files and submit directly to state cancer registries, SEER programs, or CoC accreditation bodies.