Transformation Approaches · RHTP-04.TD1

Evidence Rating Framework

Document Overview

Article Cross References (5) References (12)

Document Overview
#

Series 4 requires consistent methodology for evaluating evidence across twelve transformation domains. This technical document establishes the standard framework for assessing evidence quality, rural applicability, effect sizes, and implementation factors. Every Series 4 article applies these criteria to ensure comparable assessments across workforce development, telehealth, community health workers, payment innovation, and other RHTP implementation strategies.

The framework addresses a fundamental challenge: most healthcare evidence comes from urban settings, yet RHTP requires states to implement transformation strategies in communities that differ systematically from study populations. Rural America has older populations, higher chronic disease burden, fewer providers, greater distances, weaker infrastructure, and different social dynamics than the urban academic medical centers where most research occurs. Evidence that demonstrates effectiveness in Philadelphia or Houston may not transfer to rural Mississippi or Montana.

This framework provides tools for honest assessment. It enables Series 4 articles to distinguish between interventions with strong rural evidence, approaches extrapolated from urban research, and strategies that amount to faith-based implementation with little empirical foundation.

Evidence Hierarchy
#

Healthcare evidence varies dramatically in quality and reliability. The hierarchy below ranks study designs by their ability to establish causal relationships between interventions and outcomes.

Tier 1: Systematic Reviews and Meta-Analyses
#

Definition: Comprehensive synthesis of all available evidence on a specific question, using explicit methods to identify, select, and critically appraise relevant research, and to extract and analyze data from included studies.

Strengths: Aggregates findings across multiple studies, increases statistical power, identifies consistency or heterogeneity in results, reduces impact of single-study biases.

Limitations: Quality depends on underlying studies, publication bias affects included research, heterogeneity may limit pooling, reviews can become outdated as new evidence emerges.

Key Sources for Rural Health:

AHRQ Evidence Reports provide the gold standard for healthcare evidence synthesis
Cochrane Reviews offer rigorous methodology with transparent quality assessment
Campbell Collaboration covers social interventions relevant to SDOH

Application: When systematic reviews exist for a transformation approach, they anchor the evidence assessment. Series 4 articles begin evidence review by identifying relevant systematic reviews before examining individual studies.

Tier 2: Randomized Controlled Trials
#

Definition: Experimental study design where participants are randomly assigned to intervention or control groups, with outcomes compared between groups.

Strengths: Random assignment controls for confounding variables, establishes temporal sequence, enables causal inference when properly executed.

Limitations: Expensive and time-consuming, may have limited external validity, ethical constraints limit some research questions, rural recruitment often difficult.

Rural Considerations: RCTs conducted in rural settings carry more weight than urban trials. However, few RCTs specifically target rural populations. When rural subgroup analyses exist within larger trials, these provide valuable evidence despite smaller sample sizes.

Tier 3: Quasi-Experimental Designs
#

Definition: Studies that attempt to establish causal relationships without random assignment, using techniques such as difference-in-differences, regression discontinuity, propensity score matching, or interrupted time series.

Strengths: Feasible when randomization is impossible or impractical, can use administrative data, evaluates real-world program implementation.

Limitations: Cannot fully control confounding, requires strong assumptions, selection bias remains concern.

Application: CMS demonstration evaluations typically employ quasi-experimental designs. The Frontier Community Health Integration Project evaluation, for example, used comparison group methodology to assess outcomes. These designs provide valuable evidence when RCTs are unavailable.

Tier 4: Prospective Cohort Studies
#

Definition: Observational studies that follow groups over time, comparing outcomes between those exposed and not exposed to an intervention.

Strengths: Can establish temporal sequence, measures outcomes as they occur, useful for rare outcomes or long-term effects.

Limitations: Cannot establish causation, confounding by indication common, expensive and time-consuming.

Tier 5: Retrospective Analyses
#

Definition: Studies that look backward using existing data to examine relationships between past exposures and outcomes.

Strengths: Inexpensive, can be conducted quickly, uses readily available data, enables large sample sizes.

Limitations: Data quality issues, cannot establish causation, selection and information bias common, missing variables limit adjustment.

Application: Much rural health research uses retrospective analysis of Medicare claims, hospital discharge data, or state databases. These studies describe patterns but cannot establish that interventions caused observed outcomes.

Tier 6: Case Studies and Expert Opinion
#

Definition: Descriptive accounts of individual programs or settings, or guidance based on clinical experience and professional judgment.

Strengths: Can identify promising approaches, provides implementation details, captures contextual factors.

Limitations: Cannot establish effectiveness, selection bias in what gets published, anecdotes are not evidence.

Application: Many rural health “best practices” rest on case studies or expert consensus. Series 4 articles clearly distinguish these from evidence-based approaches.

Rural Evidence Assessment
#

The central challenge for Series 4 is assessing whether evidence generated elsewhere applies to rural settings. This section establishes criteria for evaluating rural evidence applicability.

Category A: Primary Rural Evidence
#

Definition: Studies conducted in or specifically addressing rural populations and settings.

Criteria:

Study population drawn from rural areas (using any standard rural definition)
Study sites located in rural communities
Analysis specifically examines rural context
Sample size adequate for rural-specific conclusions

Weight: Highest. Primary rural evidence directly addresses the implementation context. When available, this evidence takes precedence over extrapolation from urban research.

Examples:

AHRQ evidence report on rural telehealth
HRSA evaluation of National Health Service Corps in rural shortage areas
State-specific rural hospital outcomes research

Category B: Rural Subgroup Analysis
#

Definition: Studies conducted in mixed settings that include separate analysis of rural participants or sites.

Criteria:

Larger study includes both rural and urban populations
Subgroup analysis examines rural participants separately
Rural sample size adequate for meaningful conclusions (generally n > 100)
Interaction effects tested between rurality and intervention

Weight: Moderate to high. Subgroup analyses provide direct rural evidence but often lack power for definitive conclusions. When rural subgroups show different effects than overall results, this signals potential transferability concerns.

Limitations: Subgroup analyses are often exploratory, multiple comparisons inflate false positive risk, rural samples may not represent rural diversity.

Category C: Generalizable Urban Evidence
#

Definition: Studies conducted in urban settings with characteristics suggesting potential rural applicability.

Criteria:

Intervention mechanism does not depend on urban infrastructure
Target population characteristics overlap with rural populations
Implementation requirements feasible in rural settings
No obvious rural barriers to replication

Weight: Limited. Generalization from urban to rural requires explicit justification. Series 4 articles identify specific reasons why urban evidence might or might not transfer.

Red Flags for Non-Transferability:

Intervention requires specialist density unavailable rurally
Program model assumes public transportation
Effectiveness depends on patient volume rural sites cannot achieve
Cultural assumptions reflect urban rather than rural values

Category D: Non-Applicable Evidence
#

Definition: Studies from settings so different that rural application requires faith rather than evidence.

Criteria:

Implementation requires resources unavailable in rural areas
Target population differs substantially from rural demographics
Study setting involves infrastructure rural areas lack
No plausible mechanism for rural adaptation

Weight: None. Series 4 articles exclude this evidence from effectiveness assessments while noting its existence and limitations.

Effect Size Interpretation
#

Statistical significance does not equal practical importance. Series 4 articles assess whether observed effects are large enough to matter for rural health transformation.

Clinical Significance Thresholds
#

Mortality and Major Morbidity:

Large effect: Relative risk reduction > 25% or absolute risk reduction > 5 percentage points
Moderate effect: RRR 10-25% or ARR 2-5 percentage points
Small effect: RRR < 10% or ARR < 2 percentage points

Process Measures (screening rates, appointment adherence):

Large effect: Absolute improvement > 15 percentage points
Moderate effect: Absolute improvement 5-15 percentage points
Small effect: Absolute improvement < 5 percentage points

Patient-Reported Outcomes:

Effect sizes should exceed minimally important difference thresholds established for each measure
Generic thresholds: Cohen’s d > 0.5 (moderate), > 0.8 (large)

Cost-Effectiveness Benchmarks
#

Standard Thresholds:

Highly cost-effective: < $50,000 per quality-adjusted life year (QALY)
Cost-effective: $50,000-$150,000 per QALY
Marginally cost-effective: $150,000-$200,000 per QALY
Not cost-effective: > $200,000 per QALY

Return on Investment:

Claims of ROI > 3:1 within one year require rigorous documentation
Pre-post ROI calculations without comparison groups are unreliable
ROI depends heavily on payer perspective (Medicaid versus hospital versus society)

Confidence Intervals and Uncertainty
#

Series 4 articles report confidence intervals, not just point estimates. Wide confidence intervals that include both clinically meaningful benefit and harm indicate insufficient evidence regardless of statistical significance.

Interpretation Guide:

If 95% CI includes null effect: Cannot conclude intervention works
If 95% CI excludes null but includes trivial effects: Statistical but perhaps not practical significance
If entire 95% CI exceeds minimal important difference: Strong evidence of meaningful effect

Context Dependency
#

Effect sizes often vary by setting, population, and implementation quality. Series 4 articles note:

Effect modification: Does intervention work better or worse in certain subgroups?
Implementation fidelity: Were effects achieved with high-fidelity implementation?
Dose-response: Do larger doses or longer exposures produce larger effects?
Sustainability: Do effects persist after intervention ends?

Quality Indicators
#

Beyond study design, specific quality indicators determine how much weight to assign individual studies.

Sample Size and Statistical Power
#

Underpowered studies cannot detect real effects
Rule of thumb: Minimum 30 per group for continuous outcomes, larger for binary outcomes
Power calculations should be reported for primary outcomes
Rural subgroups often underpowered even in large studies

Follow-Up Duration
#

Short follow-up may capture honeymoon effects
Chronic disease management requires minimum 12-month follow-up
Behavior change interventions need assessment of maintenance
Many pilot programs report only 3-6 month outcomes

Outcome Measurement Validity
#

Patient-reported outcomes should use validated instruments
Claims-based outcomes subject to coding changes and gaming
Process measures are not health outcomes
Surrogate endpoints (HbA1c, blood pressure) should connect to clinical outcomes

Confounding Control
#

RCTs control confounding through randomization
Observational studies require multivariable adjustment
Unmeasured confounding remains concern in all observational research
Propensity scores and instrumental variables address some but not all confounding

Generalizability Assessment
#

Who was included and excluded from the study?
What settings participated?
How does study population compare to rural target population?
What implementation supports existed that may not be replicable?

Rating Matrix Template
#

Every Series 4 article includes a standardized evidence rating table. This section provides the template and coding instructions.

Standard Table Format
#

Intervention	Evidence Quality	Effect Size	Rural Evidence	Implementation Difficulty
[Specific intervention]	Strong/Moderate/Limited/Insufficient	Large/Moderate/Small/Unknown	Yes/Limited/No	High/Moderate/Low

Coding Definitions
#

Evidence Quality:

Strong: Multiple RCTs or rigorous quasi-experimental studies with consistent findings
Moderate: At least one RCT or strong quasi-experimental evidence with some consistency
Limited: Observational studies or demonstration projects with mixed findings
Insufficient: Case studies, expert opinion, or no published research

Effect Size:

Large: Effects clearly exceed minimal important difference; clinically meaningful
Moderate: Effects exceed minimal important difference but modest in magnitude
Small: Statistically significant but close to minimal important difference
Unknown: Effect size not estimable from available evidence

Rural Evidence:

Yes: Studies conducted in or specifically addressing rural settings
Limited: Some rural evidence but primarily urban-derived
No: Evidence base entirely from urban or suburban settings

Implementation Difficulty:

High: Requires substantial infrastructure, workforce, or organizational capacity
Moderate: Requires some adaptation but achievable with available resources
Low: Can be implemented with minimal additional resources or expertise

Extended Table for Complex Assessments
#

When evidence varies substantially by subpopulation or modality, use expanded format:

Intervention	Population/Setting	Evidence Quality	Effect Size	Rural Evidence	Implementation Difficulty	Notes
[Intervention]	[Specific context]	[Rating]	[Rating]	[Rating]	[Rating]	[Key caveats]

Application Guidance
#

Using the Framework in Series 4 Articles
#

Step 1: Identify Relevant Evidence

Search for systematic reviews first
Identify RCTs and quasi-experimental studies
Note rural-specific research separately
Document search strategy in article development

Step 2: Assess Each Study

Apply evidence hierarchy classification
Evaluate rural applicability category
Extract effect sizes with confidence intervals
Note quality indicators

Step 3: Synthesize Across Studies

Weight evidence by quality and applicability
Identify consistency or heterogeneity
Note evidence gaps explicitly
Distinguish strong from weak conclusions

Step 4: Complete Rating Matrix

Apply standardized coding definitions
Include all major intervention variants
Acknowledge uncertainty in ratings
Provide narrative explanation for non-obvious ratings

Step 5: Connect to RHTP Implementation

Assess state application alignment with evidence
Identify red flags where applications ignore evidence
Note promising elements with evidence support
Address sustainability implications

Common Pitfalls to Avoid
#

Treating all evidence as equal: A single pilot study does not counter a systematic review.

Ignoring rural applicability: Urban evidence requires explicit justification for rural application.

Conflating process and outcome evidence: Increased screening rates matter only if they improve health outcomes.

Accepting advocate claims: Organizations promoting interventions often overstate evidence.

Overlooking implementation difficulty: Interventions that work under ideal conditions may fail in practice.

Ignoring sustainability: Short-term grant-funded successes do not demonstrate long-term viability.

How this article connects to others in Blue Gray Matters.

The evidence ratings assigned here are the input data that Series 3's approach-fit analysis uses to assess whether state applications have selected approaches matched to their evidence profile and conditions.

Disease burden versus transformation investment alignment in Series 11 uses the evidence ratings this framework assigns — the mismatch between what states fund and what rural populations need is only measurable when approach evidence is arrayed against the disease burden that approach evidence should address.

Population-specific evidence assessment in Series 9 extends the rating framework documented here — the generic evidence ratings require population-specific adjustment when evidence for effectiveness in diverse populations differs from evidence for effectiveness in general rural populations.

Transformation scenario analysis in Series 16 uses evidence ratings assigned here to assess whether state RHTP investments are positioned to produce durable transformation outcomes.

Policy earthquake analysis in Series 12 creates an evidence question this framework helps answer — approaches with strong standalone evidence may have no evidence for delivery under simultaneous coverage erosion, workforce contraction, and safety net reduction.

Sources cited in this article.

Agency for Healthcare Research and Quality. "Grading the Strength of a Body of Evidence When Assessing Health Care Interventions." *Methods Guide for Effectiveness and Comparative Effectiveness Reviews*, AHRQ Publication No. 13-EHC117-EF, Jan. 2014, effectivehealthcare.ahrq.gov/products/methods-guidance-grading-evidence/methods.
Agency for Healthcare Research and Quality. "Methods Guide for Effectiveness and Comparative Effectiveness Reviews." AHRQ Publication No. 10(14)-EHC063-EF, Jan. 2014, effectivehealthcare.ahrq.gov/products/cer-methods-guide/overview.
Bekelman, David B., et al. "Primary Results of the Patient-Centered Disease Management (PCDM) for Heart Failure Study: A Randomized Clinical Trial." *JAMA Internal Medicine*, vol. 175, no. 5, 2015, pp. 725-732.
Guyatt, Gordon H., et al. "GRADE: An Emerging Consensus on Rating Quality of Evidence and Strength of Recommendations." *BMJ*, vol. 336, no. 7650, 2008, pp. 924-926.
Higgins, Julian P.T., et al. "Cochrane Handbook for Systematic Reviews of Interventions." Version 6.4, Cochrane, 2023, training.cochrane.org/handbook.
Institute of Medicine. "Finding What Works in Health Care: Standards for Systematic Reviews." National Academies Press, 2011.
Neumann, Peter J., et al. "Updating Cost-Effectiveness: The Curious Resilience of the $50,000-per-QALY Threshold." *New England Journal of Medicine*, vol. 371, no. 9, 2014, pp. 796-797.
Rural Health Information Hub. "Evidence-Based Toolkits for Rural Community Health." RHIhub, 2024, ruralhealthinfo.org/toolkits.
Rural Policy Research Institute. "RUPRI Health Panel: Evidence-Based Policy Analysis for Rural Health." RUPRI, 2024, rupri.org/health.
Schunemann, Holger J., et al. "GRADE Guidelines: A New Series of Articles in the Journal of Clinical Epidemiology." *Journal of Clinical Epidemiology*, vol. 64, no. 4, 2011, pp. 380-382.
Shea, Beverley J., et al. "AMSTAR 2: A Critical Appraisal Tool for Systematic Reviews That Include Randomised or Non-Randomised Studies of Healthcare Interventions, or Both." *BMJ*, vol. 358, 2017, j4008.
West, Suzanne, et al. "Systems to Rate the Strength of Scientific Evidence." *Evidence Report/Technology Assessment*, no. 47, AHRQ Publication No. 02-E016, Apr. 2002.

Document Overview#

Evidence Hierarchy#

Tier 1: Systematic Reviews and Meta-Analyses#

Tier 2: Randomized Controlled Trials#

Tier 3: Quasi-Experimental Designs#

Tier 4: Prospective Cohort Studies#

Tier 5: Retrospective Analyses#

Tier 6: Case Studies and Expert Opinion#

Rural Evidence Assessment#

Category A: Primary Rural Evidence#

Category B: Rural Subgroup Analysis#

Category C: Generalizable Urban Evidence#

Category D: Non-Applicable Evidence#

Effect Size Interpretation#

Clinical Significance Thresholds#

Cost-Effectiveness Benchmarks#

Confidence Intervals and Uncertainty#

Context Dependency#

Quality Indicators#

Sample Size and Statistical Power#

Follow-Up Duration#

Outcome Measurement Validity#

Confounding Control#

Generalizability Assessment#

Rating Matrix Template#

Standard Table Format#

Coding Definitions#

Extended Table for Complex Assessments#

Application Guidance#

Using the Framework in Series 4 Articles#

Common Pitfalls to Avoid#

Document Overview
#

Evidence Hierarchy
#

Tier 1: Systematic Reviews and Meta-Analyses
#

Tier 2: Randomized Controlled Trials
#

Tier 3: Quasi-Experimental Designs
#

Tier 4: Prospective Cohort Studies
#

Tier 5: Retrospective Analyses
#

Tier 6: Case Studies and Expert Opinion
#

Rural Evidence Assessment
#

Category A: Primary Rural Evidence
#

Category B: Rural Subgroup Analysis
#

Category C: Generalizable Urban Evidence
#

Category D: Non-Applicable Evidence
#

Effect Size Interpretation
#

Clinical Significance Thresholds
#

Cost-Effectiveness Benchmarks
#

Confidence Intervals and Uncertainty
#

Context Dependency
#

Quality Indicators
#

Sample Size and Statistical Power
#

Follow-Up Duration
#

Outcome Measurement Validity
#

Confounding Control
#

Generalizability Assessment
#

Rating Matrix Template
#

Standard Table Format
#

Coding Definitions
#

Extended Table for Complex Assessments
#

Application Guidance
#

Using the Framework in Series 4 Articles
#

Common Pitfalls to Avoid
#