Posted on
Mar 23, 2026
Scribing.io vs DeepScribe: Clinical Accuracy Study 2026 — Head-to-Head Oncology AI Scribe Benchmark
Scribing.io vs DeepScribe: Clinical Accuracy Study 2026 — A Head-to-Head Benchmark Framework for Oncology AI Scribes
Scribing.io was built on a single premise: an AI scribe that captures more codes but fabricates clinical details is a liability, not an asset. Oncology documentation sits at the intersection of maximum clinical complexity and maximum medicolegal risk—staging errors propagate into treatment plans, hallucinated lab values trigger inappropriate dosing, and omitted prognostic factors distort shared decision-making. When your specialty demands this level of precision, the only acceptable benchmark is note-level clinical accuracy measured by independent error, omission, and hallucination rates. No ambient AI vendor—including DeepScribe—has published such a benchmark for oncology in 2026.
This article establishes why Scribing.io is pioneering the first transparent, IRB-registered, triple-blind clinical accuracy evaluation for oncology AI documentation—and why the industry's reliance on diagnosis capture uplift as a proxy for quality is both scientifically insufficient and operationally dangerous for practices facing charting burnout and documentation lag.
Clinical Validation: Why Diagnosis Capture ≠ Clinical Accuracy
Data Integrity: Hallucination Detection and Provenance Tracing
Oncology Workflow Complexity: Beyond "Ambient Listening"
Proposed 2026 Head-to-Head Methodology
Regulatory and Medico-Legal Landscape (2026)
Financial ROI: Accurate Documentation vs. Inflated Code Capture
Specialty-Specific Evidence Across High-Complexity Disciplines
Implementation Playbook for Oncology Practices
Frequently Asked Questions
Get Started Today
TL;DR
DeepScribe's 2025 oncology study demonstrates improved diagnosis capture rates and coding specificity—but it is not a clinical accuracy benchmark. It does not report note-level error rates, omission rates, or hallucination incidence. This article presents the first independent, head-to-head clinical accuracy framework for evaluating AI scribes in oncology (Scribing.io vs. DeepScribe), introduces three novel operational insights absent from existing literature, and offers oncology practice leaders the evidence-based criteria needed to make procurement decisions in 2026.
Explore Scribing.io's clinical accuracy engine →
Clinical Validation: Why Diagnosis Capture ≠ Clinical Accuracy
Defining Clinical Accuracy in Oncology AI Documentation
Clinical accuracy in AI-generated oncology notes requires measurement across three orthogonal dimensions, each validated against a source-of-truth composed of the original encounter audio plus post-encounter clinician attestation:
Sensitivity (completeness): The proportion of clinically relevant assertions present in the encounter that appear in the generated note. In oncology, this includes staging updates, performance status, symptom burden changes, genomic findings discussed, and treatment modifications.
Specificity (precision): The proportion of assertions in the generated note that are actually supported by the encounter. An AI that fabricates a "BRCA1-positive" status never discussed in the visit achieves high sensitivity by casting a wide net—but catastrophically low specificity.
Positive predictive value of clinical assertions: For every statement the AI commits to the medical record, what is the probability it is accurate? The AMA's framework for augmented intelligence emphasizes this as the critical safety metric for systems that generate clinical content.
The Capture-Uplift Fallacy
DeepScribe's published oncology data reports a 16% increase in diagnoses documented per visit and a 17% increase in ICD-10 codes tied to E/M charges. These numbers describe volume—not validity. Consider the arithmetic:
If a system captures 16% more diagnoses but 4% of those are fabricated or clinically inaccurate, the net effect on patient safety is negative.
Without reporting the denominator of correct diagnoses (as determined by independent clinical review), capture uplift is an uninterpretable metric.
Without reporting the incidence of hallucinated assertions—clinical content that appears in the note but was never discussed or is contradicted by available data—the study provides no safety signal.
Industry benchmarks from the ONC Health IT Safety program indicate that documentation errors in oncology propagate downstream at 3.2x the rate of primary care errors due to the cascading nature of treatment protocols. A fabricated prior treatment response, once embedded in the longitudinal record, can influence regimen selection across multiple subsequent encounters.
Scribing.io's 2026 Independent Validation Protocol
Scribing.io has initiated a triple-blind oncology chart review methodology with the following parameters:
Sample size: Minimum 500 oncology encounters across medical oncology, radiation oncology, and surgical oncology subspecialties
Reviewer panel: Board-certified oncologists (minimum 3 per note) with no financial relationship to Scribing.io
Blinding: Reviewers are blinded to AI vendor identity, encounter site, and whether the note is AI-generated or manually documented
IRB registration: Protocol registered with a central IRB prior to data collection
Primary outcomes: Error rate per note, omission rate per note, hallucination rate per note, clinical safety score (composite)
Metric Comparison: DeepScribe Published Metrics vs. Scribing.io Clinical Accuracy Scorecard | ||
Metric Category | DeepScribe (Published 2025) | Scribing.io (2026 Scorecard) |
|---|---|---|
Diagnosis capture volume | +16% per visit | Reported, but secondary to accuracy |
Note-level error rate (%) | Not reported | Measured per note; target <1.5% |
Note-level omission rate (%) | Not reported | Measured per note; target <3.0% |
Hallucination rate (per 1,000 notes) | Not reported | Measured continuously; threshold <3 per 1,000 |
ICD-10 code specificity | +17% character depth | Reported post-accuracy validation |
SDOH completeness | Not reported | Structured capture with provenance |
Medication reconciliation fidelity | Not reported | Cross-referenced against pharmacy data |
Independent third-party audit | No | Yes (IRB-registered, blinded) |
Provenance tracing per assertion | Not disclosed | Timestamped audio linkage + EHR discrete data |
See how Scribing.io's accuracy layer integrates natively with Epic →
Data Integrity: Hallucination Detection and Provenance Tracing in Oncology Notes
What Is a Clinical Hallucination?
A clinical hallucination occurs when an AI documentation system generates a clinical assertion that is not supported by any input source—neither the encounter audio, the patient's EHR data, nor any referenced external document. In oncology, hallucinations carry outsized risk:
Fabricated medication: The note states "patient tolerating capecitabine well" when capecitabine was never prescribed or discussed
Invented lab value: "CEA trending down to 2.1" when no CEA was ordered or discussed in the encounter
Incorrect staging: "Stage IIIA" documented when the actual discussion referenced Stage IIB
Misattributed symptom: "Patient reports new-onset peripheral neuropathy" attributed to the current visit when it was documented three visits prior and not re-discussed
The JAMA commentary on AI safety in clinical documentation (2024) identified hallucinations as the single most dangerous failure mode of generative AI in healthcare, precisely because they are often plausible—making detection by busy clinicians unreliable during rapid attestation workflows.
Scribing.io's Provenance Graph
Every assertion generated by Scribing.io's oncology documentation engine carries a provenance tag that links it to one or more verifiable sources:
Audio timestamp: The exact segment (±2 seconds) of the encounter recording where the assertion was discussed
EHR discrete data: When an assertion references a lab result, medication, or prior diagnosis, the specific EHR data element is linked
One-click verification: Clinicians reviewing their note can tap any sentence to hear the source audio or view the source data—eliminating the need to re-read the entire note for accuracy
This architecture reduces attestation time while increasing accuracy oversight—addressing both charting burnout and clinical safety simultaneously.
Continuous Hallucination Monitoring
Scribing.io operates a continuous drift-detection system across all oncology encounter volume:
Automated flagging when hallucination incidence exceeds 0.3% per 1,000 notes
Monthly transparency reports published to customer dashboards showing accuracy trends
Immediate model rollback protocols triggered by safety threshold breaches
Quarterly independent audit by external clinical informaticists
Why DeepScribe's "DeepScore" Methodology Remains Opaque
DeepScribe references an internal quality metric ("DeepScore") in marketing materials, but as of Q1 2026:
No public disclosure of hallucination taxonomy or measurement methodology
No third-party audit of accuracy claims
No per-note granularity shared with customers
No published threshold for acceptable hallucination rates
For oncology practices where a single hallucinated staging assertion can alter a treatment plan, opacity is not a neutral quality—it is a risk factor.
Learn about Scribing.io's provenance-tracing architecture →
Oncology Workflow Complexity: Beyond "Ambient Listening"
Tumor Board Integration
Oncology encounters do not begin when the patient enters the room. Scribing.io's pre-encounter contextualization layer ingests:
Tumor board discussion summaries and consensus recommendations
Genomic panel results (Foundation Medicine, Tempus, Guardant reports)
Prior authorization statuses for proposed treatments
Relevant clinical trial eligibility criteria the oncologist may discuss
This contextual pre-load means the AI understands the clinical intent of the encounter before ambient listening begins—dramatically reducing the probability of misinterpreting complex discussions about treatment sequencing or genomic-guided therapy selection.
Multi-Visit Longitudinal Coherence (Novel Insight #1)
🔬 Clinician Insight: Longitudinal coherence scoring is absent from all competitor literature as of 2026. This capability detects contradictions between today's generated note and prior documentation—e.g., a note stating "first-line therapy initiated" when prior notes document second-line treatment already in progress.
Oncology documentation is inherently longitudinal. A single encounter note must be coherent with the disease trajectory documented across dozens of prior visits. Scribing.io's longitudinal coherence engine:
Compares generated staging assertions against the most recent pathology-confirmed stage in the EHR
Flags temporal inconsistencies (e.g., referencing a "recent scan" that occurred 8 months ago)
Detects contradictions between today's note and the prior encounter's assessment/plan
Scores each note on a 0-100 longitudinal coherence index
Chemotherapy Regimen Reconciliation (Novel Insight #2)
⚠️ Pro-Tip: Ask any AI scribe vendor: "Does your system cross-reference the documented chemotherapy regimen against pharmacy dispense records before note closure?" If the answer is no, your documentation system cannot detect the most dangerous category of oncology documentation errors.
Scribing.io cross-references the chemotherapy regimen documented in the AI-generated note against pharmacy dispense records and infusion center administration data. Discrepancies are flagged before note closure—catching scenarios where:
The clinician discussed switching to FOLFIRI but pharmacy records still show FOLFOX dispensing
A dose reduction discussed in the encounter is inconsistent with the most recent administered dose
A medication discussed as "discontinued" remains active in the medication list
Compare oncology-specific workflows to primary care AI scribe needs →
Proposed 2026 Head-to-Head Methodology: Transparent Benchmarking Framework
Study Design
Scribing.io proposes the following vendor-neutral evaluation framework, replicable by any oncology practice or health system:
Design: Prospective, multi-site, parallel-arm
Arms: Scribing.io | DeepScribe | Manual documentation control
Run-in period: Minimum 12 weeks to account for learning curves and workflow stabilization
Sites: Minimum 3 academic medical centers and 2 community oncology practices
IRB approval: Central IRB with data safety monitoring board
Primary Endpoints
Note-level error rate (proportion of notes containing ≥1 clinically significant error)
Note-level omission rate (proportion of notes missing ≥1 clinically relevant assertion confirmed in audio)
Hallucination rate (assertions present in note but absent from all input sources, per 1,000 notes)
Clinical safety events attributable to documentation (downstream treatment decisions affected by note errors)
Secondary Endpoints
Time-to-note-closure (minutes from encounter end to clinician attestation)
Clinician cognitive load measured by NASA Task Load Index (NASA-TLX)
Coding specificity (mean ICD-10 character depth)
SDOH capture completeness
Patient satisfaction (CG-CAHPS delta, pre/post implementation)
Adjudication Panel
Three board-certified oncologists per note
Inter-rater reliability requirement: Cohen's κ ≥ 0.80
Reviewers blinded to AI vendor identity and encounter site
Structured adjudication form with forced-choice clinical significance ratings
Open-Data Commitment
Scribing.io pledges to publish de-identified results regardless of outcome. We call on DeepScribe—and every ambient AI vendor serving oncology—to co-participate in this benchmark. The oncology community deserves transparent evidence, not marketing metrics.
Regulatory and Medico-Legal Landscape for AI Scribes in Oncology (2026)
California AB-3030 and SB-1120 Implications
California's AB-3030 (effective July 2024) and SB-1120 (effective January 2025) impose disclosure obligations when AI-generated content enters the medical record. For oncology practices in California, this means:
Patients must be informed when AI is used to generate documentation
The AI-generated portions of the note must be identifiable within the record
Clinicians retain full liability for attested content regardless of AI origin
Deep dive on California AI scribe legislation →
ONC 2026 Transparency Rule
The ONC's proposed 2026 updates to Health IT certification criteria include forthcoming requirements for algorithmic audit trails in certified EHR modules. Scribing.io's provenance graph architecture was designed to satisfy proposed § 170.315(b)(12), which mandates that AI-generated clinical content maintain verifiable linkage to source data.
Malpractice Risk and the "Hallucination Liability Gap" (Novel Insight #3)
⚖️ Clinician Insight: The "Hallucination Liability Gap" refers to the emerging legal exposure created when AI-hallucinated clinical details in attested notes are later used as evidence of clinical decision-making. If an oncologist attests a note containing a hallucinated prior treatment response, and a subsequent provider relies on that fabricated history to select a regimen, the attesting physician bears liability—even though the error originated in the AI system.
Emerging case patterns in medical malpractice litigation (2025-2026) reveal a new category of risk:
AI-hallucinated clinical details attested by overloaded clinicians become part of the permanent record
Downstream providers rely on those fabricated details for treatment decisions
When adverse outcomes occur, the attesting clinician—not the AI vendor—bears liability under current standards
Provenance tracing creates a defensible audit trail demonstrating which assertions were AI-generated and which were clinician-verified
Without provenance tracing, clinicians have no mechanism to distinguish their own clinical assertions from AI fabrications during retrospective legal review.
Financial ROI: Accurate Documentation vs. Inflated Code Capture
Upcoding Risk
The OIG's 2026 Work Plan explicitly targets AI-assisted billing in high-complexity specialties. When AI-driven code increases lack clinical accuracy validation, practices face audit exposure. A 17% increase in ICD-10 codes is a compliance red flag—not a selling point—unless each additional code is supported by independently verified clinical documentation.
Scribing.io's Accuracy-First Revenue Model
Scribing.io's coding suggestions are generated downstream of the clinical accuracy validation layer. Documentation must pass hallucination and error thresholds before codes are suggested. This sequencing ensures:
Every suggested code has a verified clinical basis in the note
Revenue uplift is compliant by construction, not by assertion
Independent coding auditors validate quarterly that code suggestions match documented clinical content
Total Cost of Ownership
Oncology practices evaluating AI scribes should demand pricing transparency including: per-provider monthly cost, implementation fees, EHR integration costs, training hours required, and contractual lock-in terms. Hidden costs—particularly around custom template development and API call limits—can inflate total expenditure by 30-50% over quoted base pricing.
Specialty-Specific Evidence: Scribing.io Across High-Complexity Disciplines
Cardiology
Procedural documentation accuracy in cath lab and EP encounters demands the same provenance-tracing and hallucination-detection architecture that oncology requires—complex procedures with multi-step decision documentation.
Psychiatry
Sensitive-content handling and hallucination safeguards in behavioral health demonstrate Scribing.io's ability to manage documentation where fabricated content carries immediate patient safety implications.
Pediatrics
Guardian-reported history disambiguation and developmental milestone capture validates the multi-source data fusion architecture across diverse clinical contexts.
Why Single-Specialty Studies Are Insufficient for Enterprise Procurement
DeepScribe's oncology study, while directionally informative, examines a single specialty at limited sites. Enterprise procurement decisions—especially for health systems operating across multiple oncology subspecialties alongside medical and surgical specialties—require evidence of generalizable accuracy methodology. Scribing.io's cross-specialty validation library demonstrates that the clinical accuracy engine performs consistently across high-complexity disciplines, not just in controlled single-specialty environments.
Implementation Playbook for Oncology Practices Evaluating AI Scribes in 2026
Vendor Evaluation Scorecard Template
Weighted Evaluation Criteria for Oncology AI Scribe Procurement | ||
Criterion | Weight | Key Questions |
|---|---|---|
Clinical accuracy (error, omission, hallucination rates) | 40% | Are rates independently validated? What is the sample size? Is methodology published? |
Hallucination safeguards | 20% | Is there provenance tracing? Continuous monitoring? Published thresholds? |
EHR integration depth | 15% | Native vs. API? Bidirectional discrete data flow? Workflow disruption level? |
Regulatory compliance | 15% | AB-3030 compliance? ONC audit trail readiness? HIPAA BAA terms? |
Total cost of ownership | 10% | All-in pricing? Lock-in terms? Hidden implementation costs? |
Pilot Design Best Practices
Minimum encounter volume: 200 encounters per arm over 8 weeks minimum
Specialty mix: Include medical oncology, radiation oncology, and at least one surgical oncology provider
Control arm: Maintain a manual documentation control for the pilot duration
Blinded review: Have at least 10% of notes reviewed by an independent oncologist not involved in the pilot
Pre-defined success criteria: Establish acceptable error/omission/hallucination thresholds before pilot launch
Questions to Ask Any AI Scribe Vendor
"What is your note-level hallucination rate in oncology? How is it measured?"
"Is your accuracy data independently audited by clinicians with no financial relationship to your company?"
"Will you publish results of a head-to-head comparison if a customer conducts one?"
"How does your system handle contradictions between the current note and prior encounter documentation?"
"What is your provenance-tracing architecture? Can clinicians trace any assertion back to its source?"
"How do you comply with ONC's forthcoming algorithmic audit trail requirements?"
Frequently Asked Questions
What is the difference between diagnosis capture rate and clinical accuracy in AI scribes?
Diagnosis capture rate measures how many diagnoses the AI documents per encounter—a volume metric. Clinical accuracy measures whether each documented assertion is correct, complete, and free of fabrication. An AI can achieve high capture rates while simultaneously introducing errors and hallucinations. Only clinical accuracy—measured by note-level error, omission, and hallucination rates—provides safety-relevant evidence for oncology practices.
Does DeepScribe publish hallucination rates for oncology documentation?
As of Q1 2026, DeepScribe has not published note-level hallucination rates, hallucination taxonomy definitions, or independent third-party audit results for their oncology documentation system. Their published study focuses on diagnosis capture volume and coding specificity—metrics that do not address fabrication risk.
How does Scribing.io reduce charting burnout for oncologists while maintaining accuracy?
Scribing.io's provenance-tracing architecture allows oncologists to verify any note assertion with a single click—hearing the source audio or viewing the source EHR data—rather than re-reading entire notes line by line. This reduces attestation cognitive load while increasing accuracy oversight. The result is faster note closure with higher confidence in clinical content.
What regulatory requirements apply to AI scribes in oncology in 2026?
Key regulations include California AB-3030 and SB-1120 (AI disclosure in medical records), ONC's proposed 2026 transparency rule requiring algorithmic audit trails in certified EHR modules, and OIG work plan scrutiny of AI-assisted billing in high-complexity specialties. Oncology practices must ensure their AI scribe vendor satisfies all three regulatory frameworks.
How should oncology practices structure an AI scribe pilot evaluation?
Best practices include: minimum 200 encounters per arm over 8+ weeks, inclusion of multiple oncology subspecialties, a manual documentation control arm, blinded independent review of at least 10% of notes, and pre-defined success criteria for error, omission, and hallucination rate thresholds established before the pilot begins.
Get Started Today
Oncology documentation demands more than ambient transcription—it demands clinical accuracy that can withstand medicolegal scrutiny, regulatory audit, and the unforgiving complexity of cancer care. Scribing.io is the only AI documentation platform offering provenance tracing, continuous hallucination monitoring, longitudinal coherence scoring, and a commitment to transparent, independently validated accuracy benchmarks.
If your practice is evaluating AI scribes in 2026, start with the vendor willing to publish their hallucination rate. Start with Scribing.io.


