Most health plans believe they’re doing risk adjustment the right way.

They take compliance seriously. They avoid unsupported codes. They’ve invested in NLP tools to improve speed and consistency. From the outside, the process looks sound.

And yet, many of these same organizations continue to face a frustrating reality: confirmable HCCs are still being missed. RAF scores flatten. Revenue feels constrained. Audit pressure increases. And leadership struggles to explain why the results don’t match the amount of effort being applied.

This usually isn’t a people problem. It’s rarely a documentation problem. More often, it’s a visibility problem.

When playing it safe quietly becomes expensive

CMS has been clear for years: don’t submit diagnoses that aren’t supported by the medical record. That guidance matters, and most risk adjustment programs are built around it for good reason.

But in many organizations, the pendulum has swung so far toward avoiding false positives that another issue has quietly taken hold. Teams become conservative to a fault. Specificity is sacrificed even when documentation supports it. Complex or rare diagnoses are skipped because they feel risky. Everyone assumes downstream reviews will catch what matters.

On paper, this approach looks compliant. In practice, it slowly drains revenue. Not because codes are wrong, but because many of the right ones are never captured in the first place, and even more problematic, to their highest specificity.

Why accuracy metrics don’t tell the whole story

One of the most confusing moments for risk adjustment leaders is when accuracy reports look strong, audits come back clean, and yet financial performance still feels off.

That disconnect exists because accuracy and completeness are not the same thing.

You can be fairly accurate with the codes you submit and still miss meaningful reimbursement. Rare or complex HCCs can slip through unnoticed. Combinations that affect RAF may never surface. The most specific codes supported by the record may be bypassed in favor of safer, less detailed options.

Revenue leakage doesn’t announce itself as an error. It shows up quietly as opportunity that never makes it into the submission.

Where NLP helps — And where it falls short

To manage volume and speed, many health plans have adopted NLP-based coding tools. These tools can definitely improve efficiency. They reduce manual effort and bring consistency to common patterns. But they also introduce blind spots that aren’t always obvious.

Most machine-learning NLP systems work by predicting what a diagnosis might be based on probability. They don’t confirm what is explicitly documented in the record. That distinction is subtle, but in a CMS-regulated environment, it matters a great deal.

Probabilistic models tend to perform well on common conditions and typical documentation. They struggle more with nuance. Rare diagnoses, complex combinations, and subtle clinical language are easier to miss. When codes are inferred rather than directly tied to explicit documentation, audit defensibility becomes harder to explain and trust erodes.

Over time, model drift, changing templates, and evolving provider behavior can further degrade performance, often without obvious warning signs. Equally problematic are the yearly CMS updates that require these models  months of training to get up to speed, leading to time, efficiency and financial losses.

The hidden risk of missing specificity

Missing HCCs is only part of the story. Failing to capture the most specific code supported by the documentation creates its own set of problems.

Less specificity means less reimbursement, even when the record supports more. It can also increase audit scrutiny. CMS expects specificity. Submitting vague or incomplete codes when documentation supports greater detail doesn’t necessarily reduce risk. In some cases, it creates it.

This is where many organizations feel stuck. They believe they’re choosing between speed, safety, and revenue. But the real issue isn’t the tradeoff. It’s the way diagnoses are being surfaced and validated.

The pattern many teams eventually recognize

If RAF scores plateau despite increasing review effort, if accuracy looks good but revenue lags, if NLP works well for common cases but struggles at the edges, revenue leakage may already be happening.

These aren’t signs that a program is failing. They’re signs that the current approach has structural blind spots. The system is doing exactly what it was designed to do, just not everything the organization needs it to do.

The one idea worth remembering

If there’s one takeaway from this discussion, it’s this:

Most revenue leakage in risk adjustment doesn’t come from submitting the wrong codes. It comes from failing to identify all the right ones clearly, specifically, and defensibly.

Health plans aren’t losing money because they’re careless. They’re losing money because their tools and workflows don’t surface everything the documentation already supports.

Once that realization sets in, the conversation changes. It’s no longer just about compliance or speed. It becomes a question of visibility.

And for many organizations, that question is the first step toward understanding why effort keeps increasing while results stay stubbornly flat.