State Agencies · RHTP-05.04

Performance Measurement

Article Cross References (7) References (11)

Performance measurement should enable learning and accountability. RHTP requires states to track progress, report outcomes, and demonstrate that federal investment produces results. The logic is unassailable: taxpayers deserve evidence that their dollars accomplish stated purposes. CMS requires reporting to ensure states implement as promised. States need data to identify what works and adjust what does not.

In practice, measurement often becomes theater rather than learning. States with limited capacity spend resources producing reports that no one reads. States with sophisticated systems may game metrics rather than improve outcomes. The burden of measurement falls hardest on the least-resourced states, consuming energy that could fund services. Meaningful accountability, where measurement actually improves programs, remains rare.

This article examines the fundamental tension between accountability demands and capacity realities. CMS prescribes measurement requirements that assume capabilities many states lack. States produce compliant reports that satisfy federal oversight without generating useful information. The gap between required and meaningful measurement reveals how accountability systems can undermine the very outcomes they purport to measure.

Analytical uncertainty pervades this assessment. We can describe what states report but rarely assess whether reporting improves implementation. The counterfactual, what would happen without measurement requirements, cannot be tested. States may claim that measurement burden harms programs, but this claim is difficult to verify. This article surfaces these uncertainties while assessing available evidence.

The Fundamental Tension
#

The Case for Rigorous Measurement
#

Public funds require accountability. Federal grants represent taxpayer dollars transferred to state governments for specified purposes. Without measurement, there is no way to verify that funds achieved their purposes. States could claim success while delivering nothing. Measurement enables oversight that protects public investment.

What gets measured gets managed. The management adage reflects real organizational dynamics. Activities that are tracked receive attention. Activities that are not tracked get neglected. Requiring measurement focuses state attention on outcomes that matter, shifting energy from activities that feel productive to activities that produce results.

Learning requires data. Transformation involves trying approaches whose effectiveness is uncertain. Without measurement, states cannot distinguish approaches that work from approaches that fail. The iterative improvement that successful transformation requires depends on feedback loops that measurement creates.

Peer comparison enables improvement. Standardized measurement across states enables identification of high performers whose practices can inform others. States struggling with workforce recruitment can learn from states achieving better retention. Such learning requires comparable data that only standardized measurement produces.

Federal credibility depends on demonstrated results. RHTP represents a $50 billion investment in rural health transformation. Congressional support for continued funding depends on evidence that investment produces outcomes. Without measurement demonstrating impact, future appropriations become vulnerable to criticism that RHTP wasted federal funds.

The Case for Measurement Restraint
#

Capacity varies dramatically. Some states have sophisticated data infrastructure, experienced evaluation staff, and measurement systems developed through prior federal programs. Other states have one part-time employee managing RHTP data alongside other responsibilities. Requiring sophisticated measurement from under-resourced agencies produces compliance exercises, not useful information.

Measurement burden diverts implementation resources. Staff hours spent collecting data, writing reports, and responding to CMS inquiries are staff hours not spent implementing programs. States already stretched thin must choose between doing the work and documenting the work. Excessive measurement requirements guarantee that documentation wins.

Process metrics substitute for outcome metrics. Outcomes are hard to measure. Health outcomes take years to materialize. Attribution is contested. Data sources are incomplete. Facing these challenges, states measure what they can rather than what matters: activities conducted, trainings delivered, technologies deployed. These process metrics create the appearance of accountability while revealing little about actual impact.

Gaming corrupts metrics. When measurement has consequences, organizations optimize for metrics rather than outcomes. States learn which indicators CMS tracks most closely and focus effort there. Metrics that can be influenced through coding changes or definitional adjustments receive creative attention. The measured system diverges from the actual system.

Low-capacity states produce low-quality data regardless of requirements. Demanding sophisticated measurement from agencies that lack measurement capability does not create capability. It creates reports that satisfy formal requirements while containing unreliable information. CMS receives data; the data is not meaningful.

What Would Resolve the Tension
#

Evidence on measurement effectiveness would inform this debate but largely does not exist. We lack rigorous research comparing program outcomes under different measurement regimes. Do programs with more extensive measurement requirements achieve better outcomes than programs with lighter requirements? Does measurement burden correlate with implementation success or failure? These questions remain unanswered.

Comparative assessment of state reporting quality would clarify the capacity distribution. How many states produce reports that enable meaningful learning? How many produce compliance documents with little analytical value? Systematic assessment of report quality would reveal where measurement serves its purpose and where it becomes theater.

Attribution methodology remains contested. Even if states collect accurate data on health outcomes, attributing changes to RHTP investment rather than other factors (economic shifts, demographic changes, unrelated policy changes) requires analytical sophistication that few states possess. Measurement systems cannot produce credible accountability without credible attribution.

Why the Tension Cannot Be Fully Resolved
#

Accountability and capacity exist in genuine tension. Requiring measurement from states that lack capacity produces low-quality data. Exempting low-capacity states from measurement requirements eliminates accountability for large portions of federal investment. Neither approach is satisfactory.

The political economy of measurement reinforces compliance orientation. Federal officials face criticism for programs that lack accountability mechanisms. State officials face consequences for failing to submit required reports. Both sets of officials are rewarded for the appearance of accountability, regardless of whether accountability mechanisms improve outcomes. The incentive structure produces measurement theater even when all participants recognize its limitations.

Learning and accountability serve different purposes that sometimes conflict. Learning requires honest acknowledgment of failure. Accountability systems penalize failure acknowledgment. States that truthfully report that an approach did not work risk funding reductions, enhanced monitoring, and political criticism. States that frame setbacks as “implementation challenges” while claiming ultimate success protect themselves from consequences. Measurement systems designed for accountability may impede the learning that transformation requires.

Federal Framework
#

CMS Performance Measurement Requirements
#

RHTP cooperative agreements specify extensive measurement and reporting requirements. States must submit quarterly progress reports documenting activity against plan milestones, financial expenditures, emerging challenges, and course corrections. Reports follow standardized templates enabling cross-state comparison while allowing narrative explanation of state-specific circumstances.

Annual performance reviews assess whether states meet stated objectives and maintain policy alignment. Reviews consider both quantitative metrics and qualitative implementation quality. Poor performance triggers enhanced monitoring, technical assistance requirements, or funding adjustments.

Standard metrics span multiple domains:

Process metrics track activities conducted: number of telehealth consultations provided, community health workers trained, providers receiving loan repayment, patients receiving transportation assistance. These metrics are relatively straightforward to collect and verify.

Output metrics measure immediate products: technology platforms deployed, training programs established, regional networks formed, care coordination agreements executed. Outputs represent intermediate results between activities and outcomes.

Outcome metrics assess ultimate impact: emergency department utilization, preventable hospitalizations, maternal mortality, access to primary care, workforce retention. These metrics require longer timeframes, sophisticated data systems, and attribution methodology that distinguishes RHTP effects from other influences.

The Office of Rural Health Transformation reviews performance annually against stated metrics, policy adherence, and resource deployment efficiency. States that underperform face consequences including funding reductions, enhanced reporting requirements, mandatory corrective action plans, or in extreme cases, cooperative agreement termination.

How Federal Requirements Intensify Tensions
#

Federal metrics assume data infrastructure that many states lack. Calculating emergency department utilization rates for rural populations requires linked claims data, geographic identifiers, and analytical capability. States without all-payer claims databases or geographic information systems cannot produce these calculations with the precision federal templates assume.

Quarterly reporting timelines compress data collection windows. States must submit reports within 30 days of quarter end. Data from subawardees may not arrive until weeks after quarter close. Quality review takes additional time. The compression produces reports based on incomplete information, submitted primarily to meet deadlines rather than to inform decisions.

Standardized templates obscure important variation. State circumstances differ dramatically. A template designed for general applicability cannot capture state-specific context that makes numbers meaningful. A 15% increase in telehealth utilization means something different in Alaska (where telehealth is essential) than in New Jersey (where alternative access exists). Standardization enables comparison at the cost of context.

Where Federal Specifications Assume Nonexistent Capacity
#

The gap between required and feasible measurement varies by state. Capacity assessment identifies four state clusters:

High-capacity states (approximately 10-12 states) have robust data infrastructure from prior federal programs, experienced evaluation staff, established relationships with academic partners for analysis, and leadership that values data-informed decision-making. These states can meet CMS requirements and potentially use measurement for genuine learning.

Moderate-capacity states (approximately 15-20 states) have basic data systems and some analytical staff but lack the sophistication CMS requirements assume. These states can produce compliant reports but struggle to use measurement for program improvement.

Low-capacity states (approximately 10-15 states) have minimal data infrastructure, no dedicated evaluation staff, and limited analytical capability. These states produce reports primarily through heroic individual effort, often by staff for whom RHTP reporting is one of many responsibilities.

Very-low-capacity states (approximately 5-8 states) lack the basic prerequisites for meaningful measurement. Reports from these states reflect compliance exercises with limited connection to actual program performance.

CMS requirements do not adjust for this variation. All states receive the same templates, the same deadlines, and the same expectations. The fiction that all states can produce equivalent measurement underlies requirements that make sense for high-capacity states and burden low-capacity states without generating useful information.

State Measurement Approaches
#

States have adopted varied approaches to performance measurement, reflecting different assessments of compliance burden, learning value, and capacity constraints.

Measurement Approach Assessment
#

Approach	Characteristics	Example States	Compliance Burden	Learning Value	Gaming Risk
Minimal Compliance	Basic required metrics only; no additional analysis	Several small states	Low	Low	Low
Administrative Data	Claims and administrative records as primary source	Ohio, Pennsylvania, Florida	Moderate	Moderate	Moderate
Real-Time Monitoring	Dashboards with rapid feedback loops	California, North Carolina	High (requires infrastructure)	High (when capacity exists)	Variable
Community-Defined	Community-identified indicators alongside federal metrics	New Mexico, tribal programs	Variable	Potentially high	Lower

Minimal Compliance Approach
#

States following minimal compliance collect only what CMS requires and invest no additional resources in measurement beyond federal mandates. Reports satisfy formal requirements without generating information useful for program improvement.

Several smaller states with limited administrative capacity have adopted this approach, recognizing that their resources cannot support sophisticated measurement while simultaneously implementing programs. They prioritize implementation over documentation, accepting that their reports will lack the analytical depth of larger states.

The approach has clear advantages. Resources flow to services rather than measurement. Staff focus on program delivery rather than data collection. The compliance burden, while still significant, does not overwhelm limited capacity.

The approach has significant limitations. States learn nothing from their own experience. Problems persist because they are not detected. Successful approaches are not identified for replication. The state operates blind, hoping that implementation choices prove wise but lacking evidence to assess.

Administrative Data Approach
#

States with all-payer claims databases and robust Medicaid data systems use administrative records as the primary measurement source. Claims data reveal utilization patterns, emergency department visits, hospitalizations, and procedure volumes without requiring new data collection.

Ohio, Pennsylvania, and Florida exemplify this approach. Their existing data infrastructure, developed for Medicaid oversight and health planning, provides measurement capacity that RHTP can leverage. Staff analyze existing data rather than collecting new data, reducing burden while enabling meaningful assessment.

The approach works well for utilization metrics but struggles with patient experience, access to care, and outcome measures that claims data cannot capture. A patient who cannot get an appointment generates no claim. A patient whose condition worsened due to care delays may generate claims that look similar to a patient whose condition worsened despite excellent care. Administrative data reveal what happened but often not why.

Gaming risk is moderate. Providers can influence claims through coding practices. Changes in reported diagnoses or procedure codes can shift metrics without changing actual care. States relying heavily on administrative data must monitor for coding drift that distorts measurement.

Real-Time Monitoring Approach
#

States with sophisticated data infrastructure have implemented dashboards providing rapid feedback on key indicators. Program managers can see weekly or monthly trends rather than waiting for quarterly reports. Problems are detected quickly. Successful approaches become visible promptly.

California’s RHTP monitoring system integrates data from multiple sources: Medicaid claims, hospital discharge data, vital statistics, provider surveys, and subawardee reports. Dashboards present synthesized information enabling comparison across regions, initiatives, and time periods.

North Carolina similarly invested in monitoring infrastructure, building on systems developed for Medicaid transformation. The state tracks hub network formation, NCCARE360 referral completion, and workforce pipeline progression through integrated platforms.

The approach requires substantial investment. Dashboard development, data integration, and ongoing maintenance consume resources. States lacking prior infrastructure investment cannot implement real-time monitoring quickly. The approach is available only to states that made prior investments that RHTP now leverages.

Learning value is high when capacity exists because rapid feedback enables course correction. States identify implementation challenges while adjustments remain possible rather than discovering problems after years of ineffective operation.

Gaming risk is variable. Real-time visibility can detect gaming attempts quickly. Alternatively, sophisticated actors can identify which metrics receive dashboard attention and game those specifically.

Community-Defined Measurement
#

Some states incorporate community-identified indicators alongside federal metrics. Community advisory bodies identify what outcomes matter most to rural residents. State measurement systems track these community priorities even when they differ from federal specifications.

New Mexico’s RHTP measurement includes indicators developed through tribal consultation: traditional healing access, cultural competency of services, community health worker engagement in tribal communities. These metrics do not appear on CMS templates but matter deeply to communities served.

The approach recognizes that federal metrics may miss what communities value. Rural residents may care less about emergency department utilization rates than about whether they can see a provider without driving two hours. Community-defined measurement captures these priorities.

Compliance burden is variable. If community indicators align with required metrics, no additional burden results. If community priorities diverge substantially from federal specifications, states must maintain parallel measurement systems.

Gaming risk is potentially lower because community members observe their own communities. A state cannot claim improved access if community members experience continued difficulty obtaining care. Community oversight provides accountability that federal monitoring cannot.

Why States Chose Different Approaches
#

Prior infrastructure investment largely determines current options. States that invested in data systems during ACA implementation, Medicaid expansion, or previous federal grants have capacity that states lacking such investment cannot quickly develop. Measurement approach reflects accumulated capability more than current strategic choice.

Political context shapes measurement investment. Governors and legislators who value evidence-based policy support measurement infrastructure. Those skeptical of government programs may resist data collection as bureaucratic overhead.

Federal program history matters. States with experience in federal programs emphasizing evaluation (CDC grants, HRSA cooperative agreements, CMS innovation models) developed capacity that transfers to RHTP. States whose federal experience involved lighter measurement requirements lack this foundation.

The Measurement Paradox in Practice
#

The following vignettes illustrate how identical measurement requirements produce radically different functions depending on state capacity.

Vignette: Two States, Same Requirements, Completely Different Realities
#

State A has a robust Office of Health Analytics with 12 full-time evaluation staff. The office developed through a decade of federal grants requiring sophisticated measurement, including a State Innovation Model award and a Medicaid transformation grant. Staff have graduate training in epidemiology, health services research, and biostatistics. The state maintains an all-payer claims database, integrated vital statistics, and agreements with academic partners for complex analysis.

When RHTP reporting requirements arrived, State A’s measurement infrastructure absorbed them comfortably. CMS templates became one output among many from existing systems. Staff integrated RHTP indicators into dashboards already tracking Medicaid performance. Quarterly reports drew from analyses conducted for other purposes. The incremental burden was modest.

More importantly, State A uses measurement for program improvement. Monthly indicator reviews identify emerging problems. When emergency department utilization increased unexpectedly in one region, staff investigated, identified a gap in after-hours primary care access, and worked with regional partners to address it. The state adjusted workforce deployment based on retention data. Measurement drove decisions.

State B designated a program coordinator to manage all aspects of RHTP implementation. This person, previously responsible for a small chronic disease prevention program, now oversees a $200 million federal investment. Her other responsibilities did not decrease. She has no evaluation training, no analytical support, and no dedicated data systems.

When RHTP reporting requirements arrived, State B’s coordinator faced an impossible task. Collecting data from subawardees required developing new forms, explaining requirements to organizations with their own capacity constraints, and chasing submissions past deadlines. Quality review was impossible; there was no time and no expertise. The coordinator entered whatever data arrived into CMS templates, often with gaps and inconsistencies.

State B’s reports satisfy compliance requirements. They arrive on time (usually). They contain numbers in the required fields (mostly). CMS accepts them as fulfilling reporting obligations. But the reports have no connection to program decisions. No one in State B reads the reports after submission. No decisions change based on what the reports reveal. The data are not accurate enough to support meaningful analysis even if anyone had time to conduct it.

Both states submit quarterly reports. Both states receive the same CMS response acknowledging receipt. An outside observer reviewing the reports might not immediately recognize the difference. But State A’s measurement enables learning while State B’s measurement is pure theater: an elaborate performance that satisfies formal requirements while accomplishing nothing.

Alternative Perspectives
#

The Capacity Realism View
#

The capacity realism perspective argues that demanding sophisticated measurement from under-resourced agencies is pointless. Requirements designed for states with robust infrastructure become compliance burdens for states lacking such infrastructure. The pretense that all states can produce equivalent measurement generates bureaucratic waste without improving accountability.

Evidence supporting this view:

Low-capacity states consistently produce lower-quality data regardless of requirement stringency. Increasing requirements does not improve data quality; it increases burden without corresponding benefit.

Staff in low-capacity states describe measurement as the aspect of federal programs most disconnected from actual implementation. They view reporting as compliance exercise rather than useful activity.

No evidence demonstrates that measurement requirements improve outcomes in low-capacity states. The presumed mechanism, that measurement enables learning, does not function when measurement infrastructure does not support learning.

Evidence against this view:

Even minimal measurement may prevent the worst failures. States that must report some indicators cannot completely abandon program implementation. The requirement to document activity creates baseline accountability.

Measurement requirements have prompted some states to invest in capacity they would otherwise have neglected. The prospect of reporting failures has motivated infrastructure development.

Assessment: The capacity realism view has substantial merit. Complex measurement requirements burden states that can least afford them. Simpler, more focused measurement would serve transformation better than comprehensive requirements that low-capacity states cannot meaningfully implement. However, eliminating measurement entirely would remove even minimal accountability from federal investment.

The Gaming Inevitability View
#

The gaming inevitability perspective argues that any measurement system with consequences will be gamed. Organizations optimize for measured indicators rather than underlying outcomes. Resources flow to activities that improve metrics rather than activities that improve results. The more consequential the measurement, the more sophisticated the gaming.

Evidence supporting this view:

Historical examples abound. Hospital readmission penalties led to observation stays that avoided readmission classification without improving patient outcomes. Surgical mortality reporting led to risk aversion that denied care to sick patients. Teacher evaluation tied to test scores led to teaching to tests rather than genuine education.

RHTP metrics are similarly vulnerable. States can improve telehealth utilization counts through definition changes (what counts as a telehealth visit). Workforce recruitment metrics can improve through retention definition adjustments (how long must someone stay to count as retained). Gaming requires less effort than genuine improvement.

Evidence against this view:

Not all measurement is equally gameable. Some metrics resist manipulation. Physical infrastructure either exists or does not. Provider credentials can be verified. Hospital closures cannot be hidden.

Gaming requires sophistication. Low-capacity states may lack the analytical capability to identify gaming opportunities. Their reports may be inaccurate due to incapacity rather than strategic manipulation.

Assessment: Gaming risk is real and should inform measurement design. Metrics that resist manipulation should receive greater weight than metrics easily influenced through definitional adjustments. However, the existence of gaming does not eliminate measurement value entirely. Triangulating across multiple indicators, comparing self-reported data with independent sources, and focusing on harder-to-game metrics can maintain meaningful accountability despite gaming attempts.

The Learning Organization View
#

The learning organization perspective argues that measurement should serve learning rather than accountability. Traditional accountability measurement punishes failure acknowledgment, discouraging honest reporting. Learning-oriented measurement rewards failure identification because identifying what does not work enables improvement.

Evidence supporting this view:

Organizations that treat measurement as learning tool rather than judgment mechanism often produce better outcomes. Toyota’s production system, frequently cited as a management model, treats problems as opportunities for improvement rather than occasions for blame.

Some federal programs have experimented with learning-oriented evaluation. CMS Innovation Center models included “rapid cycle evaluation” designed to identify what works and what does not without penalizing states whose approaches proved ineffective.

Evidence against this view:

Public accountability cannot be eliminated. Taxpayers have legitimate interest in knowing whether their investment produced results. Learning-oriented measurement that never produces accountability creates its own problems.

The distinction between learning and accountability may be impossible to maintain in political environments. Even if program administrators adopt learning orientation, political opponents will use measurement data to attack programs they oppose.

Assessment: The learning organization view offers valuable insight. Measurement systems that enable honest failure acknowledgment are more likely to produce improvement than systems that punish failure. However, purely learning-oriented measurement without any accountability function is unlikely to survive political scrutiny. The challenge is designing systems that enable learning while maintaining sufficient accountability to sustain public support.

Implications for RHTP
#

Where Measurement Enables Implementation
#

Measurement supports implementation when:

States have infrastructure to produce accurate data. Measurement based on reliable information can reveal patterns that inform decisions.

Staff have time and capability to analyze measurement results. Data that are collected but never examined serve no purpose.

Organizational culture treats measurement as useful rather than burdensome. Leadership must value evidence and act on findings.

Feedback loops connect measurement to decisions. Identifying a problem matters only if the identification leads to response.

These conditions exist in perhaps 10-15 states. For these states, CMS measurement requirements may genuinely support transformation by requiring attention to outcomes that might otherwise be neglected.

Where Measurement Burdens Implementation
#

Measurement burdens implementation when:

Data collection consumes resources needed for service delivery. Staff choosing between providing care and documenting care face impossible tradeoffs.

Reports satisfy compliance without generating insight. The labor of measurement produces nothing useful.

Accuracy is impossible given available systems. Unreliable data mislead rather than inform.

These conditions characterize perhaps 20-25 states. For these states, CMS measurement requirements divert energy from implementation without corresponding benefit. The burden is not offset by learning value.

Warning Signs of Measurement Theater
#

Observable indicators suggest when measurement has become theater rather than learning:

Reports submitted at deadline without internal review. When reports go directly from data entry to CMS without anyone reading them, measurement serves compliance only.

Same narrative explanations quarter after quarter. When states copy prior explanations without updating, measurement has become rote exercise.

Metrics that never change despite varied implementation circumstances. Perfect consistency suggests data are constructed rather than collected.

No documented program changes based on measurement findings. When measurement never influences decisions, its purpose is performative.

Staff describing measurement as their most frustrating responsibility. When those closest to measurement view it as worthless, they are probably right.

What Meaningful Accountability Requires
#

Meaningful accountability, where measurement actually improves programs, requires:

Appropriate scope. Fewer metrics collected well produce more accountability than many metrics collected poorly. CMS should reduce measurement burden while increasing focus on indicators that matter most.

Capacity-matched expectations. Requirements should reflect state capability. Demanding sophisticated measurement from states lacking sophisticated capacity produces theater.

Learning orientation. States should be rewarded for identifying what does not work, not punished. Honest reporting serves transformation better than optimistic spin.

Verification mechanisms. Self-reported data should be validated through independent sources where possible. Trust but verify.

Consequences proportionate to capacity. States with measurement infrastructure that produce poor results should face different consequences than states lacking measurement infrastructure entirely.

Recommendations
#

For State Agencies
#

Invest in learning systems, not just compliance systems. The marginal dollar spent on measurement should improve implementation, not merely satisfy federal requirements. If measurement does not inform decisions, it serves no purpose beyond compliance.

Focus on fewer, more meaningful indicators. Attempting to track everything produces data on nothing. Identify the three to five indicators most likely to reveal whether transformation is occurring. Track those well.

Build measurement capacity as implementation infrastructure. Measurement capability is not overhead; it is essential infrastructure for transformation. States that can assess their own progress can adjust their approaches. States that cannot assess progress operate blind.

Connect measurement to decisions through formal processes. Require that measurement results be reviewed before major decisions. Create accountability for acting on measurement findings, not just for collecting data.

For CMS
#

Reduce measurement burden; focus on fewer, more meaningful indicators. Current requirements assume capacity that most states lack. Streamlining requirements would improve measurement quality by enabling states to focus on fewer metrics.

Differentiate requirements by state capacity. High-capacity states can produce sophisticated measurement. Low-capacity states cannot. Requiring the same outputs from dramatically different starting points produces compliance theater in low-capacity states.

Emphasize verification over self-reporting. Where possible, use administrative data that states cannot manipulate rather than self-reported metrics they can construct. Triangulate across data sources to identify discrepancies.

Create learning orientation by rewarding honest failure acknowledgment. States that identify approaches that did not work and adjust should receive positive recognition, not penalties. The goal is improvement, not the appearance of success.

Invest in state measurement capacity as a legitimate RHTP use. If measurement is essential for accountability, measurement infrastructure is legitimate investment. States should be encouraged to use RHTP funds to build capacity that enables meaningful measurement.

For Evaluators and Observers
#

Assess measurement system quality, not just metric availability. The existence of reported data does not indicate meaningful measurement. Evaluation should examine whether states can actually produce reliable information, not merely whether they submit reports.

Distinguish measurement that informs from measurement that performs. The same reports can serve learning or compliance depending on organizational context. Evaluation should assess how measurement is used, not merely what measurement exists.

Track measurement burden as implementation factor. States overwhelmed by measurement requirements may fail for reasons unrelated to their transformation approach. Evaluation should distinguish implementation failures from measurement-induced failures.

Examine gaming and its consequences. When states optimize for metrics rather than outcomes, evaluation should detect this pattern and assess its implications. Metrics that correlate with gaming susceptibility should receive less interpretive weight.

Transition to Article 5E
#

Performance measurement shapes and is shaped by federal-state relationships. Article 5E examines the tension between federal mandate and state autonomy that underlies RHTP’s cooperative agreement structure. States that produce sophisticated measurement may receive greater federal flexibility. States that produce compliance theater may receive enhanced oversight. The relationship between measurement capacity and federal trust creates dynamics that affect all aspects of RHTP implementation. The accountability demands examined in this article cannot be understood apart from the federal-state power dynamics that Article 5E addresses.

How this article connects to others in Blue Gray Matters.

Federal annual re-scoring processes create the accountability context for state measurement, with performance metrics directly affecting subsequent year allocations.

Evidence on transformation approach effectiveness depends on state measurement capacity, with weak measurement systems limiting what can be learned from RHTP implementation about what works in rural health.

Geographic equity collapse remains invisible in aggregate performance metrics that do not disaggregate by rurality tier, enabling states to report excellent outcomes while the highest-burden communities receive nothing.

Building sustainability beyond 2030 as Series 16 analyzes requires performance measurement systems that generate learning rather than compliance documentation — the gap this article identifies.

Does transformation planning match clinical reality — Series 11's synthesis question — cannot be answered by states whose measurement systems do not collect data on specific clinical conditions; aggregate metrics on access and utilization cannot reveal whether transformation investments address the disease burden that actually drives rural excess mortality.

Does transformation understand what rural people experience — Series 13's synthesis question — requires measurement approaches this article identifies as rare in state RHTP plans; patient experience measurement capturing navigation burden, dignity, and agency is not standard in the performance frameworks CMS accepts.

Population identification methodology in Series 9 is the measurement complement to the performance measurement framework this article analyzes — states that do not disaggregate performance data by population type cannot identify whether universal transformation approaches are reaching vulnerable populations or producing equity gaps that population-blind measurement conceals.

Sources cited in this article.

Centers for Medicare and Medicaid Services. "CMS Announces $50 Billion in Awards to Strengthen Rural Health in All 50 States." CMS Newsroom, 29 Dec. 2025.
Forvis Mazars. "Federal Grant Program Evaluation Part 1: Navigating New Regs." Forvis Mazars, 1 July 2025.
Government Accountability Office. "Grants Management: Enhancing Performance Accountability Provisions Could Lead to Better Results." GAO-06-1046, 29 Sept. 2006.
Government Accountability Office. "Managing for Results in Government." GAO, 2025.
Government Accountability Office. "Performance and Accountability Report, Fiscal Year 2024." GAO-25-900570, 2024.
Heinrich, Carolyn J. "Outcomes-Based Performance Management in the Public Sector: Implications for Government Accountability and Effectiveness." Public Administration Review, vol. 62, no. 6, 2002, pp. 712-725.
Moynihan, Donald P. The Dynamics of Performance Management: Constructing Information and Reform. Georgetown University Press, 2008.
National Academy for State Health Policy. "State Health Agency Capacity Assessment." NASHP, 2025.
Office of Management and Budget. "2 CFR Part 200: Uniform Administrative Requirements, Cost Principles, and Audit Requirements for Federal Awards." OMB, 2024.
Radin, Beryl A. Challenging the Performance Movement: Accountability, Complexity, and Democratic Values. Georgetown University Press, 2006.
U.S. Department of Health and Human Services. "45 CFR Part 75: Uniform Administrative Requirements, Cost Principles, and Audit Requirements for HHS Awards." HHS, 2024.

The Fundamental Tension#

The Case for Rigorous Measurement#

The Case for Measurement Restraint#

What Would Resolve the Tension#

Why the Tension Cannot Be Fully Resolved#

Federal Framework#

CMS Performance Measurement Requirements#

How Federal Requirements Intensify Tensions#

Where Federal Specifications Assume Nonexistent Capacity#

State Measurement Approaches#

Measurement Approach Assessment#

Minimal Compliance Approach#

Administrative Data Approach#

Real-Time Monitoring Approach#

Community-Defined Measurement#

Why States Chose Different Approaches#

The Measurement Paradox in Practice#

Vignette: Two States, Same Requirements, Completely Different Realities#

Alternative Perspectives#

The Capacity Realism View#

The Gaming Inevitability View#

The Learning Organization View#

Implications for RHTP#

Where Measurement Enables Implementation#

Where Measurement Burdens Implementation#

Warning Signs of Measurement Theater#

What Meaningful Accountability Requires#

Recommendations#

For State Agencies#

For CMS#

For Evaluators and Observers#

Transition to Article 5E#

The Fundamental Tension
#

The Case for Rigorous Measurement
#

The Case for Measurement Restraint
#

What Would Resolve the Tension
#

Why the Tension Cannot Be Fully Resolved
#

Federal Framework
#

CMS Performance Measurement Requirements
#

How Federal Requirements Intensify Tensions
#

Where Federal Specifications Assume Nonexistent Capacity
#

State Measurement Approaches
#

Measurement Approach Assessment
#

Minimal Compliance Approach
#

Administrative Data Approach
#

Real-Time Monitoring Approach
#

Community-Defined Measurement
#

Why States Chose Different Approaches
#

The Measurement Paradox in Practice
#

Vignette: Two States, Same Requirements, Completely Different Realities
#

Alternative Perspectives
#

The Capacity Realism View
#

The Gaming Inevitability View
#

The Learning Organization View
#

Implications for RHTP
#

Where Measurement Enables Implementation
#

Where Measurement Burdens Implementation
#

Warning Signs of Measurement Theater
#

What Meaningful Accountability Requires
#

Recommendations
#

For State Agencies
#

For CMS
#

For Evaluators and Observers
#

Transition to Article 5E
#