Posted on
Apr 10, 2026
Automating Level 4 and Level 5 Billing Charts: How AI Scribes Eliminate Upcoding and Underbilling
Automating Level 4 and Level 5 Billing Charts: How AI Scribes Eliminate the Upcoding-Underbilling Trap
Summary: Level 4 (99214) and Level 5 (99215) E/M codes represent the majority of revenue for most outpatient practices—and the highest audit risk. Medical billing managers are caught between two costly failures: upcoding that triggers payer audits and penalties, and underbilling that silently drains revenue every single day. AI medical scribes solve this by generating documentation that matches the clinical complexity of each encounter in real time, giving coders the specificity they need to bill accurately. This guide breaks down the documentation thresholds for Level 4 and Level 5 charts, where automation fits into the coding workflow, how to audit-proof your practice without defaulting to underbilling, and what to look for in an AI scribe platform built for billing accuracy.
Every billing manager knows the tension: code too high and you invite an audit, code too low and you hemorrhage revenue in silence. The gap between a 99214 and a 99215 is where that tension reaches its breaking point—and for most outpatient practices, it's where the majority of billing dollars are decided. Platforms like Scribing.io are changing this dynamic by using ambient AI to capture clinical encounters with the specificity that Level 4 and Level 5 charts demand, giving coders documentation they can trust rather than documentation they have to interpret.
This guide is written for the billing managers who live in this compression zone daily. It walks through the documentation thresholds that separate 99214 from 99215 under current CMS guidelines, explains why underbilling may be costing your practice far more than you realize, and shows exactly how Scribing.io's AI scribe fits into a coding workflow built for both accuracy and audit resilience. No fabricated statistics, no theoretical scenarios—just a practical framework for one of the highest-stakes decisions in outpatient billing.
Table of Contents
Why Level 4 and Level 5 Charts Are the Highest-Stakes Coding Decision in Outpatient Billing
The Real Cost of Underbilling—And Why It's Harder to See Than Upcoding Risk
What Documentation Must Contain to Support Level 4 vs. Level 5 Coding
How AI Medical Scribes Automate Accurate Documentation for Level 4 and Level 5 Encounters
Audit-Proofing Your Practice Without Defaulting to Underbilling
What to Look for in an AI Scribe Platform Built for Billing Accuracy
Get Started Today
Why Level 4 and Level 5 Charts Are the Highest-Stakes Coding Decision in Outpatient Billing
Under the 2021 CMS E/M documentation guidelines, outpatient evaluation and management codes are determined by either medical decision-making (MDM) complexity or total time on the date of encounter. Among the five code levels, 99214 (moderate MDM) and 99215 (high MDM) dominate outpatient billing volume. These two codes represent the vast majority of E/M revenue for primary care, internal medicine, and most specialty practices.
The reason they're so consequential—and so difficult—is what billing managers call the "compression zone." The MDM thresholds separating moderate from high complexity are close enough that a meaningful percentage of encounters could legitimately fall on either side, depending on how the clinical note is written. A patient with multiple chronic conditions being managed with prescription medications might look like a clear 99214 in one note and a well-supported 99215 in another, depending entirely on the detail and structure of the documentation.
This ambiguity creates a dual risk that billing managers navigate constantly:
Upcoding risk: Billing a 99215 when the documentation only supports 99214 creates audit exposure. Payers use algorithms to flag practices with 99215 utilization rates that exceed peer benchmarks, and the consequences—recoupment, penalties, compliance investigations—are visible and painful.
Downcoding risk: Billing a 99214 when the clinical encounter genuinely supported 99215 leaves revenue on the table. Unlike audit penalties, this loss is invisible. No report flags it. No payer sends a letter saying "you should have billed higher."
The critical insight for billing managers is that this is fundamentally a documentation problem, not a coding problem. Coders can only assign the code that the clinical note supports. When a physician conducts a high-complexity encounter but documents it in a way that only clearly supports moderate complexity, the coder has no choice but to assign 99214. The clinical work happened; the documentation didn't capture it. This is exactly where family medicine practices feel the squeeze most acutely, because their encounter mix skews heavily toward the 99214/99215 boundary.
The Real Cost of Underbilling—And Why It's Harder to See Than Upcoding Risk
There is a profound psychological asymmetry in how billing departments experience upcoding versus underbilling. An upcoding audit is an event: it arrives with a letter, a demand for records, a timeline for response, and potential financial penalties. It generates meetings, stress, and organizational attention. Underbilling, by contrast, produces no event at all. It's the revenue that was never claimed, the complexity that was never reflected, the work that was never compensated.
To understand the scale of the problem, consider the reimbursement differential. Under the CMS Physician Fee Schedule, the gap between 99214 and 99215 reimbursement is substantial—billing managers can verify the current year's rates, but historically the difference has been significant enough that even a modest number of encounters coded one level below their supported complexity creates a material annual revenue shortfall. Multiply that per-encounter differential across a multi-provider practice seeing thousands of patients per month, and you begin to understand why underbilling is not a minor accounting issue. It is a structural revenue leak.
What makes this particularly insidious is the concept of "defensive coding"—the unwritten but widely practiced policy of billing conservatively to minimize audit risk. Billing managers are not wrong to be cautious. The penalties for upcoding are real and asymmetric: the punishment for billing too high vastly outweighs the non-punishment for billing too low. But over time, defensive coding becomes embedded in a department's culture. Coders learn that it's safer to default to 99214 when a chart is ambiguous, even if the encounter had all the hallmarks of high-complexity MDM. This caution is rational at the individual chart level but irrational at the practice level.
Underbilling also distorts your practice's data in ways that extend beyond immediate revenue:
Payer contract negotiations: If your coding patterns underrepresent the acuity of your patient population, you negotiate from a weaker position. Payers see a practice that handles moderate-complexity patients—because that's what the billing data shows.
Risk adjustment: For practices participating in value-based payment models, underdocumented complexity means underreported risk scores, which means lower capitated payments.
Quality metrics: Underbilling can mask the true clinical burden of your practice, affecting everything from staffing models to referral patterns.
The solution to underbilling is not to code more aggressively—that path leads directly to the audit risk that billing managers rightly fear. The solution is better documentation: notes that capture the full clinical complexity of the encounter so that coders can assign the code the encounter genuinely supports, with confidence that the chart will withstand scrutiny.
What Documentation Must Contain to Support Level 4 vs. Level 5 Coding
Under the current CMS framework—which replaced the legacy 1995/1997 documentation guidelines—outpatient E/M code selection is based on either MDM complexity or total time. Most encounters are coded using MDM, which is evaluated across three elements. A code level requires meeting or exceeding the threshold in at least two of the three elements. The AMA's CPT E/M guidelines provide the authoritative MDM table that both CMS and commercial payers reference.
Element 1: Number and Complexity of Problems Addressed
MDM Level | 99214 (Moderate) | 99215 (High) |
|---|---|---|
Problem Threshold | 1 or more chronic illnesses with mild exacerbation, progression, or side effects of treatment; OR 2 or more stable chronic illnesses; OR 1 undiagnosed new problem with uncertain prognosis; OR 1 acute illness with systemic symptoms | 1 or more chronic illnesses with severe exacerbation, progression, or side effects of treatment; OR 1 acute or chronic illness or injury that poses a threat to life or bodily function |
The documentation distinction here often comes down to specificity of language. A note that says "diabetes stable, continue current medications" supports moderate complexity. A note that says "diabetes with worsening A1c despite medication adjustment, new neuropathic symptoms requiring evaluation, considering insulin initiation" clearly supports high complexity. Both descriptions might apply to a real patient—but only the second version gives the coder what they need.
Element 2: Amount and Complexity of Data Reviewed and Analyzed
MDM Level | 99214 (Moderate) | 99215 (High) |
|---|---|---|
Data Threshold | Review of result(s) of each unique test, order of each unique test, OR assessment requiring independent interpretation of a test performed by another physician/QHP; Must meet a Category 1 item AND a Category 2 item, OR meet 3 Category 1 items | Review of result(s) of each unique test, independent interpretation of test(s) performed by another physician/QHP, discussion of management or test interpretation with external physician/QHP; Must meet a Category 1 item AND a Category 2 item, OR meet 3 Category 1 items, at a higher combination threshold |
Here, the difference often hinges on whether the physician's review of data is documented explicitly. A physician who reviews outside hospital records, independently interprets a dermatopathology report, and discusses a case with a consulting cardiologist has clearly met the high-complexity data threshold—but only if all three activities are documented in the note. When notes are written under time pressure, these details are frequently omitted.
Element 3: Risk of Complications and/or Morbidity or Mortality
MDM Level | 99214 (Moderate) | 99215 (High) |
|---|---|---|
Risk Threshold | Prescription drug management; Diagnosis or treatment significantly limited by social determinants of health; Decision about minor surgery with identified patient or procedure risk factors | Drug therapy requiring intensive monitoring for toxicity; Decision about elective major surgery with identified patient or procedure risk factors; Decision about hospitalization or escalation of care; Decision about emergency major surgery |
Risk is often the element that tips a chart from Level 4 to Level 5, and it's also the element most frequently underdocumented. When a physician prescribes a medication requiring intensive monitoring—warfarin, methotrexate, certain biologics—the clinical risk is real and the 99215 threshold is met. But the note must explicitly reflect the risk consideration, not just list the medication.
The Time-Based Alternative
For encounters where MDM is ambiguous but clinician time was substantial, the time-based coding option can be advantageous. Under current guidelines, 99214 requires 30-39 minutes of total time on the date of encounter, while 99215 requires 40-54 minutes. Total time includes face-to-face and non-face-to-face work—care coordination, documentation, order entry, referral management—performed on the same calendar day. This alternative is especially relevant for complex care management encounters in psychiatry and chronic disease management.
The documentation requirement is straightforward: the total time must be stated in the note. AI scribes can assist by tracking encounter duration and prompting for non-face-to-face time documentation.
What Auditors Look For—And What They Don't Find
Audit reviewers evaluate notes against the MDM table systematically. The most common reasons charts fail to support their billed code include:
Chronic conditions listed without specifying their status (stable, worsening, exacerbating)
Data reviewed without documentation of the source, type, or independent interpretation
Risk-level management decisions (hospitalization discussions, high-risk prescriptions) performed but not documented
Assessment and plan sections that are too brief to demonstrate the complexity of decision-making
Every one of these gaps is a documentation capture problem—not a clinical care problem. The complexity happened in the room. It just didn't make it into the note. This is precisely where Scribing.io's AI-powered documentation makes the greatest impact: by listening to the encounter and structuring the note with MDM-level specificity in real time.
How AI Medical Scribes Automate Accurate Documentation for Level 4 and Level 5 Encounters
The traditional documentation workflow for outpatient encounters follows a predictable pattern: the physician sees the patient, then writes or dictates a note after the encounter (or between encounters, or at the end of the day). Under time pressure, the note is compressed. Details that were discussed in the room are omitted. The complexity of the clinical reasoning is flattened into a few sentences. The note then arrives at the billing department, and the coder—working from the text alone—assigns a code that may underrepresent what actually occurred.
AI medical scribes interrupt this pattern at its origin. Platforms like Scribing.io use ambient listening technology to capture the clinical encounter in real time, then apply natural language processing to structure the note according to standard clinical documentation frameworks—including the MDM elements that drive E/M code selection.
The result addresses the core problem billing managers face:
Reduced documentation decay: The gap between what happens in the encounter and what ends up in the note is the single largest source of coding inaccuracy. When the AI scribe captures the conversation as it occurs—including the physician's clinical reasoning, the patient's reported symptoms, the data reviewed, and the management decisions made—that gap shrinks dramatically.
Structured MDM-relevant content: Well-designed AI scribes don't just transcribe. They organize clinical information into the structures coders rely on: problem lists with acuity qualifiers, data reviewed sections with source attribution, and assessment and plan sections that reflect the complexity of management decisions. This means coders spend less time interpreting and more time verifying.
Consistency across providers: One of the challenges billing managers face is provider variability. Some physicians write detailed notes naturally; others write notes that chronically underdocument. AI scribes normalize documentation quality across your provider panel, reducing the coder-by-coder, physician-by-physician variability that creates both audit risk and revenue inconsistency.
Time documentation for time-based coding: For encounters where the time-based coding pathway is appropriate, AI scribes can track encounter duration and facilitate documentation of total time, including non-face-to-face activities.
A legitimate concern billing managers raise about AI scribes is the risk of over-documentation—generating notes that include more detail than the encounter warrants, which could itself raise audit flags for "cloned" or inflated documentation. This is an important distinction: a well-designed AI scribe captures what clinically occurred, not what would maximize billing. The note should reflect the encounter as it happened, with the specificity to support the appropriate code—whether that's a 99213, 99214, or 99215. If the encounter was genuinely moderate complexity, the AI-generated note should reflect moderate complexity. The goal is accuracy, not maximization.
Audit-Proofing Your Practice Without Defaulting to Underbilling
Audit resilience and accurate billing are not opposing goals—they're the same goal. A practice that documents encounters with full clinical specificity and codes to the supported level is simultaneously maximizing legitimate revenue and minimizing audit exposure. The charts most vulnerable to audit are not the ones with high 99215 utilization; they're the ones where the documentation doesn't support the billed code.
Here's a framework billing managers can implement to shift from defensive coding to confident coding:
1. Establish Internal Audit Benchmarks
Track your practice's 99214/99215 split by provider and compare it against specialty-specific benchmarks published by CMS and specialty societies. Outliers in either direction deserve attention. A provider who bills 99215 at twice the specialty average needs chart review—but so does a provider who almost never bills 99215 despite managing complex patients.
2. Implement Pre-Billing Documentation Review for Ambiguous Charts
Rather than defaulting to the lower code when a chart is ambiguous, create a workflow for coder-physician clarification. When a note almost supports 99215 but is missing a specific documentation element, a targeted query can resolve the gap before the claim is submitted. AI-generated notes reduce the frequency of these queries because they capture more detail upfront, but the query workflow should still exist as a safety net.
3. Use AI-Generated Notes as Audit Documentation
One underappreciated advantage of AI scribe documentation is its consistency and traceability. Because the note is generated from the actual encounter conversation, it provides a reliable record of what was discussed and decided. In an audit scenario, this consistency is a significant asset—the note reflects the encounter as it occurred, rather than a physician's reconstructed summary written hours later.
4. Train Coders on the 2021+ MDM Framework Specifically
Many coding teams were trained under the 1995/1997 guidelines and have adapted to the 2021+ framework incrementally. Investing in focused training on the current MDM table—particularly the data and risk elements where most 99214/99215 ambiguity lives—pays dividends in both accuracy and coder confidence. The CMS Medicare Learning Network provides free, authoritative educational resources on the current E/M guidelines.
5. Monitor for Underbilling Patterns, Not Just Overbilling Patterns
Most compliance programs are designed to detect upcoding. Few are designed to detect underbilling. Add underbilling analysis to your routine audits: randomly sample charts coded as 99214 and review whether the documentation would have supported 99215. If you find a consistent pattern of undersupported downcoding, you've identified a revenue recovery opportunity that doesn't require any change in clinical practice—just better documentation capture.
For practices working within specific EHR ecosystems, AI scribe integration matters. Scribing.io's Epic integration and athenahealth compatibility ensure that AI-generated documentation flows directly into the chart without manual re-entry, preserving both efficiency and documentation integrity.
What to Look for in an AI Scribe Platform Built for Billing Accuracy
Not every AI scribe is built with billing accuracy as a core design consideration. Many are optimized for clinician speed or note volume, treating documentation as a productivity tool rather than a billing instrument. For billing managers evaluating AI scribe platforms, here are the capabilities that matter most for Level 4 and Level 5 coding accuracy:
MDM-aware note structuring: The platform should organize clinical information in a way that maps to the three MDM elements—problems addressed, data reviewed, and risk of management. Generic SOAP notes that don't differentiate between these elements create more work for coders, not less.
Problem-specificity in the assessment: The AI should capture qualifiers for each problem addressed—stable, worsening, exacerbating, new, acute—because these qualifiers determine the problem complexity level in the MDM table.
Data attribution: When data is reviewed (labs, imaging, external records), the note should document the source and the physician's interpretation, not just the result. This is the data element that most frequently separates moderate from high complexity.
Risk documentation: The platform should capture management decisions that carry risk—high-risk prescriptions, hospitalization decisions, surgical planning—with enough specificity that the risk level is auditably clear.
ICD-10 alignment: AI scribe platforms that integrate with ICD-10 coding tools can help ensure that the diagnoses documented in the note align with the complexity of the E/M code, reducing the mismatch between diagnosis coding and E/M coding that triggers audit flags.
Clinician review workflow: The AI scribe should generate a draft note that the physician reviews and finalizes—not a final note that bypasses clinician oversight. Physician attestation is an essential element of both clinical accuracy and audit resilience.
EHR integration: The note must flow into the EHR seamlessly. Manual copy-paste workflows introduce errors, create version-control problems, and undermine the documentation quality that the AI scribe was designed to provide.
The right AI scribe platform doesn't replace your coding team's judgment—it gives them better raw material to work with. When coders receive notes that are structured, specific, and complete, they can code with confidence rather than caution. That confidence is what closes the gap between defensive underbilling and accurate billing.
Get Started Today
The gap between 99214 and 99215 is a documentation gap, and every day it goes unaddressed, your practice leaves legitimate revenue uncollected while your coders navigate unnecessary ambiguity. Scribing.io's AI scribe captures clinical encounters with the MDM-level specificity that Level 4 and Level 5 charts require—giving your billing team notes they can code confidently, your providers documentation that reflects their clinical work, and your practice the audit resilience that comes from getting it right the first time.


