Skip to main content

NAACCR-Compliant Cancer Registry Automation

Cancer registries are foundational to oncology surveillance, quality improvement, and clinical research, yet traditional registry abstraction remains one of healthcare's most labor-intensive processes, requiring certified tumor registrars to manually extract hundreds of data points from fragmented clinical documentation.

Patient Journey Intelligence transforms cancer registry operations from a manual burden into an automated, audit-ready workflow, using AI-powered case finding and clinical data extraction to reduce abstraction time by 80% while capturing more complete data faster than human abstractors working alone.

Key Capabilities

The Cancer Registry module provides six integrated capabilities that work together to automate the entire abstraction workflow, from initial case detection through regulatory submission.

Automated Case Finding

Multi-source detection: ICD-O-3 codes, pathology reports, radiology findings, treatment orders, death certificates

AJCC TNM Staging

Automated staging (AJCC 8th Edition), histology/grade extraction, biomarkers (ER, PR, HER2, PD-L1), lymph nodes, metastasis

Treatment Tracking

Surgical procedures, chemotherapy regimens, radiation therapy, immunotherapy, targeted therapy, clinical trials

Longitudinal Outcomes

Continuous monitoring: disease progression, recurrence, metastasis, survival status, quality of life indicators

Registrar Review Workflows

Evidence-based review interface, side-by-side source documents, confidence scoring, one-click corrections, audit trails

NAACCR Compliance

NAACCR v23 export, state registry submissions, SEER reporting, CoC accreditation, automated edit checks


The Cancer Registry Abstraction Challenge

Cancer registries are mandated infrastructure for accredited cancer programs, state public health surveillance, and national epidemiology through SEER and NAACCR. Yet maintaining these registries remains one of oncology's most resource-intensive operational challenges.

The Manual Abstraction Bottleneck

Cancer registry abstraction traditionally requires an average of 2 hours per case of manual chart review by certified tumor registrars, though complex cases can take significantly longer. For a health system with 2,000 new cancer cases annually, this represents 4,000 hours of specialized labor. At the national level, with approximately 2.1 million new cancer diagnoses projected in 2026, the United States requires roughly 4.2 million hours of manual cancer registry abstraction annually.

This massive labor requirement creates persistent challenges:

Case Identification Delays: Relying on ICD-10 diagnosis codes means cases often aren't identified until weeks after diagnosis, when billing codes are finalized. Critical staging information from early pathology reports and imaging studies may be documented long before structured codes appear in the EHR.

Fragmented Documentation: Each NAACCR data item may require reviewing dozens of documents across multiple systems. A single breast cancer case might involve pathology reports for primary tumor characteristics, surgical operative notes for margin status, imaging reports for metastasis detection, oncology notes for treatment planning, pharmacy records for chemotherapy regimens, and radiation oncology documentation for radiation therapy details.

Specialized Expertise Shortage: Certified tumor registrars require extensive training in oncology, anatomy, staging systems (AJCC TNM), and coding standards (ICD-O-3). Many facilities struggle to recruit and retain qualified registrars, creating backlogs that delay reporting and limit registry capacity.

Timeliness Standards: Commission on Cancer (CoC) accreditation requires 90% of cases to be abstracted within six months of diagnosis. Meeting this standard while maintaining data quality consumes significant registrar capacity, leaving little time for quality improvement initiatives or research support.

Data Completeness Trade-offs: Under time pressure, registrars may prioritize mandatory fields for compliance reporting while leaving optional research-valuable fields (biomarkers, comorbidities, treatment response) incomplete or unvalidated.

The Compliance Burden

Beyond abstraction time, cancer registries must maintain strict compliance with evolving standards. NAACCR releases updated data standards annually, requiring registries to adapt abstraction protocols, update edit checks, and ensure historical data remains comparable. Clinical staging systems themselves evolve, with AJCC publishing new editions of the TNM staging manual every 6-8 years, each introducing changes to staging criteria, new biomarker requirements, and revised anatomic classifications that registrars must master. State registries have unique reporting requirements, submission formats, and timeliness deadlines. CoC accreditation demands comprehensive documentation of abstraction procedures, inter-rater reliability testing, and quality assurance processes.

This compliance overhead adds administrative burden to already resource-constrained registry teams, diverting effort from the core mission of capturing complete, accurate oncology data. Keeping registrars trained and certified on the latest AJCC editions, NAACCR standards, and ICD-O-3 coding updates requires continuous education, reference manual updates, and workflow adjustments that multiply the operational complexity of registry maintenance.


How Patient Journey Intelligence Solves Cancer Registry Challenges

The Cancer Registry module transforms abstraction from a manual burden into an automated, audit-ready workflow, enabling your registrar team to maintain more cases with higher data quality while meeting all compliance requirements.

Continuous Automated Case Finding

The system continuously monitors your electronic health records for new cancer cases across multiple detection pathways, ensuring no reportable cases are missed and cases are identified as early as possible:

Structured Code Monitoring

Automatically detects new ICD-O-3 diagnosis codes, ICD-10-CM cancer diagnoses, and oncology-related procedure codes the moment they appear in your EHR.

Unstructured Clinical Note Screening

Medical Language Models analyze clinical notes, pathology reports, and consultation documentation to identify cancer mentions before formal diagnosis coding occurs, often detecting cases weeks earlier than code-based methods alone.

Pathology Report Parsing

Every pathology report is automatically screened for cancer diagnoses, malignancy classifications, and suspicious findings, triggering case creation for biopsy-confirmed malignancies.

Radiology Findings Analysis

Imaging reports (CT, MRI, PET scans) are analyzed for primary tumor identification, staging information, and metastasis detection, linking radiologic findings to existing cases or creating new ones.

Treatment Indication Detection

Oncology treatment orders (chemotherapy, radiation therapy, immunotherapy) are monitored to identify cases where cancer treatment was initiated before formal registry abstraction, preventing gaps in case capture.

This multi-pathway case finding ensures comprehensive coverage while identifying cases earlier in their diagnostic journey, giving registrars more time to complete abstraction before timeliness deadlines.

AI-Powered NAACCR Data Extraction

Once a case is detected, the platform automatically extracts 400+ NAACCR-reportable data elements from clinical documentation, transforming hours of manual chart review into minutes of focused validation. Medical Language Models trained on oncology documentation extract structured facts from unstructured clinical notes, pathology reports, operative notes, imaging reports, and treatment plans.

Patient Demographics & Identification

Name, date of birth, sex, race, ethnicity, address, Social Security number, insurance status, and occupation are drawn from structured EHR fields and supplemented by unstructured documentation when necessary for completeness.

Primary Site and Histology

ICD-O-3 topography and morphology codes extracted from pathology reports, with laterality, behavior codes, and grade determination.

AJCC TNM Staging

Clinical and pathologic T, N, M categories extracted from pathology reports, imaging studies, and clinical documentation, with automatic stage group calculation following AJCC 8th Edition guidelines.

Tumor Characteristics

Grade, differentiation, tumor size, extension, lymph node involvement, and site-specific data items (SSDIs) extracted from pathology and surgical reports.

Biomarker Status

Estrogen receptor (ER), progesterone receptor (PR), HER2, Ki-67, PD-L1, and other prognostic markers extracted from immunohistochemistry and molecular testing reports.

Treatment Details

Surgical procedures with margins and scope, chemotherapy regimens with cycles and doses, radiation therapy with modality and anatomic sites, immunotherapy and targeted therapy agents, all extracted from operative notes, chemotherapy orders, and radiation planning documentation.

Longitudinal Outcomes

Disease progression, recurrence detection, metastasis identification, survival status, and cause of death tracked continuously as new clinical documentation arrives, keeping registry data current without manual follow-up searches.

Every extracted value includes provenance tracking: the source document, specific text supporting the extraction, AI confidence score, and extraction timestamp, creating a complete audit trail from clinical documentation to registry data.

Certified Registrar Validation Workflow

Rather than replacing human expertise, the Cancer Registry module optimizes it through intelligent human-in-the-loop review. Certified tumor registrars focus their time on validating AI-extracted data and resolving ambiguous cases instead of manually searching through charts and transcribing information.

Structured Validation Interface

Registrars review a pre-populated NAACCR abstraction form with all data items already extracted, each field showing the AI-extracted value, confidence score, and direct link to source evidence.

Evidence-Based Review

Clicking any data item displays the source documentation, pathology report text, imaging findings, clinical note excerpts, with relevant passages highlighted, enabling registrars to verify accuracy without searching through the EHR manually.

One-Click Corrections

If the AI extraction is incorrect or incomplete, registrars can edit the value directly in the validation interface. The system logs who made the change, when it occurred, and what the original AI-extracted value was, maintaining complete audit trail transparency.

Quality Assurance Tools

Built-in NAACCR edit checks flag impossible or unlikely value combinations before submission. Inter-rater review workflows support dual abstraction for quality assurance, with discrepancy tracking and resolution.

This AI-registrar partnership reduces abstraction time from 2 hours per case to 20-30 minutes of focused validation, enabling your registrar team to maintain significantly more cases without sacrificing quality or falling behind on reporting deadlines.

NAACCR-Compliant Regulatory Reporting

When cases are ready for submission, the system generates NAACCR-compliant export files formatted for direct submission to state and national cancer registries. All required fields are populated and validated against registry specifications, with edit checks applied automatically before export.

The platform supports NAACCR v25 format, state-specific reporting requirements, SEER program submissions, and Commission on Cancer (CoC) accreditation needs, ensuring your registry data meets all compliance standards without manual file formatting or edit check scripting.

Automatic Extraction of all NAACCR Data Fields

Once cases are identified, the platform automatically extracts 400+ NAACCR-reportable data elements from clinical documentation, transforming hours of manual chart review into minutes of focused validation. Medical Language Models trained on oncology documentation extract structured facts from unstructured clinical notes, pathology reports, operative notes, imaging reports, and treatment plans.

400+ Automatically Extracted NAACCR Data Items

The Cancer Registry automatically extracts, codes, and validates all required NAACCR data items from your clinical documentation, covering patient identification, cancer characteristics, staging, treatment, and outcomes.

Supported NAACCR Record Layout Categories

The platform extracts data across all major sections of the NAACCR record layout specification, ensuring comprehensive coverage of required data items for state and national registry reporting.

Record ID

Demographic

Cancer Identification

Hospital-Specific

Stage/Prognostic Factors

Treatment-1st Course

Treatment-Subsequent & Other

Follow-up/Recurrence/Death

Edit Overrides/Conversion History

Patient Confidential

Hospital-Confidential

Other-Confidential

Text-Diagnosis

Text-Treatment

Text-Miscellaneous

Special Use

Pathology

Pathology Report Processing Example

Input: Free-text surgical pathology report

DIAGNOSIS: LEFT BREAST, LUMPECTOMY:
- Invasive ductal carcinoma, Grade 2
- Tumor size: 1.8 cm in greatest dimension
- Margins: All margins negative, closest margin 0.3 cm
- Lymphovascular invasion: Present
- ER: Positive (90%, strong)
- PR: Positive (70%, moderate)
- HER2: Negative (IHC 1+)
- Ki-67: 25%

Extracted Data:

  • Primary Site: C50.9 (Breast, NOS)
  • Histology: 8500/3 (Infiltrating duct carcinoma)
  • Grade: 2 (Moderately differentiated)
  • Tumor Size: 18 mm
  • ER Status: Positive
  • PR Status: Positive
  • HER2 Status: Negative
  • CS Extension: 310 (Confined to breast)

Radiology Report Processing Example

Input: PET/CT report

IMPRESSION:
1. Hypermetabolic right upper lobe mass (SUV 8.2), suspicious for primary lung malignancy
2. FDG-avid right hilar and mediastinal lymph nodes (SUV 4.5-6.1)
3. No evidence of distant metastatic disease

Extracted Data:

  • Primary Site Confirmed: C34.1 (Upper lobe, lung)
  • Regional Lymph Nodes: Positive (hilar + mediastinal)
  • Distant Metastasis: M0 (No distant mets)
  • Clinical Stage: At least Stage IIIA

Comprehensive Cancer Site Coverage

The Cancer Registry module supports all major cancer sites with site-specific data item extraction, AJCC staging rules, and NAACCR reporting requirements. Medical Language Models are trained on oncology documentation for each cancer type, ensuring accurate extraction of histology, grade, biomarkers, and staging variables specific to that anatomic site.

Breast

Thorax (Lung)

Lower GI Tract (Colon, Rectum)

Male Genital (Prostate)

Head and Neck

Urinary Tract (Bladder, Kidney)

Upper GI Tract (Esophagus, Stomach)

Female Reproductive (Ovary, Cervix, Uterus)

Hematologic (Leukemia, Lymphoma, Myeloma)

Central Nervous System

Hepatobiliary (Liver, Pancreas)

Skin (Melanoma)

Endocrine (Thyroid)

Soft Tissue Sarcoma

Bone

Neuroendocrine Tumors

Ophthalmic Sites


Registrar Validation & Review

Automated extraction provides the foundation, but cancer registry standards demand certified expertise. The Cancer Registry module optimizes human judgment through structured validation workflows where certified tumor registrars review AI-extracted data with supporting evidence already assembled, ensuring every case meets NAACCR standards and state reporting requirements.

Registry Project Workflow

Cancer registry work is organized into Registry Projects, focused collections of cases for a specific cancer site, reporting period, or facility. Each project has its own configuration, assigned registrar team, and validation workflow.

Project Assignment

When you create a registry project, you define which certified tumor registrars will validate the automated abstractions. Projects can be assigned based on cancer site expertise (e.g., breast cancer specialists handle breast cases), facility coverage (registrars cover specific hospitals), or workload distribution to balance case volume across your team.

The project dashboard shows each registrar which cases require their review, preventing duplicate work and ensuring complete coverage of all cases.

Patient Dashboard for Validation

Once automation completes, registrars see all patients in the project dashboard with key information at a glance: patient demographics, cancer site and histology, date of diagnosis, abstraction completion status, and fields flagged for review.

Registrars can filter by completion status, cancer type, or time period to prioritize their work and track progress through the validation queue.

Field-Level Validation with Evidence

For each patient case, registrars review a structured form presenting all NAACCR data items extracted by the AI. Each field shows:

Extracted Value: The data element extracted by AI (e.g., "pT3" for pathologic T stage)

Source Attribution: Whether the value came from AI extraction or manual entry, with confidence scores for AI-extracted values

Evidence Trail: Direct links to source documents, pathology reports, operative notes, clinical notes, radiology reports, or structured EHR data, with the specific text or data element that supports the extracted value highlighted

Validation Actions: Registrars can accept the AI-extracted value as correct, edit the value if the extraction was incorrect or incomplete, or flag the field for discussion if clinical judgment is needed

This evidence-based validation ensures registrars spend their time verifying accuracy rather than hunting through charts for information.

Source Documentation Access

Every extracted value includes direct access to its source evidence. When reviewing primary site, the system shows the pathology report text describing the tumor location. For staging, it displays the relevant portions of pathology, imaging, and clinical notes that establish T, N, and M categories. Treatment data links to operative notes, chemotherapy orders, and radiation planning documents.

This immediate evidence access accelerates validation as registrars don't search through the EHR manually; they review the AI's work with supporting documentation already presented.

Complete Audit Trail

Every interaction with the registry data is logged with full provenance:

  • AI Extractions: When was the value extracted, the confidence, and source document reference
  • Registrar Edits: Who modified which fields, what the original and new values were, when the change occurred, and optional notes explaining the rationale
  • Validation Status: Which registrars reviewed which cases, validation timestamps, and approval status
  • Source Attribution: Whether each data element came from structured EHR data, clinical notes (with NLP extraction), pathology reports, or manual registrar entry

This comprehensive audit trail satisfies regulatory requirements, supports quality audits, and provides complete transparency for how every data element was obtained and validated.

Quality Assurance Features

The Cancer Registry module provides tools to maintain data quality throughout the validation process:

Pre-built Edit Checks: NAACCR-compliant edit checks run automatically, flagging impossible or unlikely value combinations (e.g., prostate cancer in female patients, dates out of sequence) before submission

Inter-Rater Review: Cases can be assigned to multiple registrars for dual review, with the system tracking agreement rates and flagging discrepancies for discussion

Supervisor Review: Senior registrars can conduct final quality reviews before cases are marked complete, ensuring consistent interpretation across the registry

Completion Tracking: The dashboard shows validation progress at both individual registrar and overall project levels, helping you manage workload and meet reporting deadlines

The AI-Registrar Partnership

The Cancer Registry workflow embodies the principle that AI and certified expertise work best together.

Automated extraction handles the time-consuming chart review, reading thousands of pages of clinical documentation to find relevant NAACCR data items.

Certified tumor registrars provide the clinical judgment by validating complex staging scenarios, resolving ambiguous documentation, ensuring coding accuracy, and applying registry standards consistently.

This partnership reduces abstraction time from 2 hours per case to 10-15 minutes of focused validation, letting your registrar team maintain more cases without sacrificing quality or falling behind on reporting deadlines.


Measurable Impact on Registry Operations

Automating cancer registry abstraction with AI-powered extraction and evidence-based validation delivers quantifiable improvements across registry efficiency, data quality, and compliance performance.

80% Reduction in Abstraction Time

Decrease per-case abstraction from 2 hours to 20-30 minutes of focused validation, enabling your registrar team to maintain significantly more cases without increasing FTEs or falling behind on timeliness standards.

Improved Data Quality and Consistency

Standardized AI extraction reduces coding variability and inter-rater discrepancies, while comprehensive audit trails document every data element's provenance and validation history for quality assurance and regulatory review.

100% Automated Case Finding

Multi-source case finding across diagnosis codes, pathology reports, imaging findings, and treatment orders ensures no reportable cancers are missed, while identifying cases earlier than code-based methods alone.

Meet CoC Timeliness Standards

Automated workflows and early case identification help you consistently achieve Commission on Cancer (CoC) requirements for 90% of cases abstracted within six months of diagnosis without manual escalation or overtime burdens.

More Complete Research Data

Automated extraction captures research-valuable fields like biomarkers, comorbidities, treatment response, and clinical trial participation that manual abstractors often skip under time pressure, enhancing registry value for quality improvement and outcomes research.

Operational Cost Savings

Reduce registrar FTE requirements for routine abstraction, enabling staff reallocation to quality improvement initiatives, special studies, and research support, or simply maintain growing case volumes without proportional staffing increases.


Setting Up Your Cancer Registry Project

Getting started with automated cancer registry abstraction follows a straightforward configuration process. The John Snow Labs team works with your registry leadership and IT team to establish data connections, configure case finding rules, and train your registrar team on the validation workflow.

Simply configure your registry and start the automatic abstraction

1

Connect Data Sources

Integrate your EHR systems, pathology databases, imaging repositories, and oncology documentation sources to enable comprehensive automated case finding and data extraction.

2

Configure Registry Project

Define cancer sites, reporting state, NAACCR version, and case finding criteria. Customize which data items to extract based on your registry's reporting requirements and research objectives.

3

Assign Registrar Team

Designate certified tumor registrars who will validate automated abstractions, ensuring clinical expertise guides data quality and coding accuracy throughout the workflow.

4

Launch Automated Abstraction

Start the AI extraction process. Cases appear in the registrar dashboard with pre-populated NAACCR data items, ready for validation and quality review.

5

Registrar Review & Validation

Certified tumor registrars review AI-extracted abstractions, validate data accuracy against source documents, correct any errors, and approve cases for submission with complete audit trail.

6

Submit Registry Reports

Export validated cases as NAACCR-compliant files and submit directly to state cancer registries, SEER programs, or CoC accreditation bodies.