Mapping Health Inequality: U.S. Counties Defying Economic Predictions

Life Expectancy: Actual vs Predicted

-8 years
(Worse than predicted) As Expected +8 years
(Better than predicted)

Green counties are healthier than their socioeconomic factors predict. Red counties perform worse than expected. Click any county for details.

Prediction Model

Type: Random Forest Regressor

Target: Life Expectancy

Accuracy (R²): Loading...

The model predicts each county's life expectancy based on 9 socioeconomic and health factors. The map shows where reality differs from predictions.
We chose a Random Forest because it captures nonlinear relationships without requiring feature scaling, making it easier to interpret differences in real world terms (years of life expectancy).

Design Rationale & Development Process

1. Design Rationale

Context and Purpose

Economic inequality shapes life expectancy in measurable ways, but national averages hide local nuance. This project invites the public to explore which U.S. counties exceed or fall short of expectations and provide exploration of why. Designed for public health students, policy analysts, and curious citizens, this visualization transforms raw health data into a story about resilience and disparity.

Why a Deviation Map?

We chose to implement a residual/deviation map rather than a standard choropleth because it directly addresses our research question: "Which counties defy economic predictions?" A traditional map showing just life expectancy or income would simply reveal the well-known correlation between wealth and health. Our approach reveals something more interesting: the exceptions to this rule.

Visual Encoding Decisions

Diverging color scale (red-white-green): We use a perceptually uniform diverging scale to encode positive (green) and negative (red) deviations from predictions. White represents counties performing as expected. This encoding immediately draws attention to outliers while maintaining clarity for the majority of counties near the center.
Geographic map layout: Preserves spatial relationships essential for identifying regional patterns and enables users to locate their own communities.
Quantitative scale: Deviations are measured in years of life expectancy, making the stakes concrete and relatable.

Interaction Techniques

Details-on-demand (tooltips): Hovering reveals county name, actual vs predicted values, and deviation amount without cluttering the map.
Modal drill-down: Clicking opens a detailed view with all 12 metrics, allowing users to investigate why a county might be outperforming or underperforming.
Dynamic filtering: Users can isolate overperformers, underperformers, or expected performers to focus their exploration.
View switching: Toggle to standard single-metric view for comparison and validation of our model-based approach.

Alternatives Considered

We evaluated four approaches (see checkpoint documentation):

Bivariate choropleth: Would show two metrics simultaneously but requires complex 2D color scales and doesn't directly leverage our ML model.
Linked multi-view: Would enable brushing across map + scatter plot but risks overwhelming users and increased development complexity.
Clustering/archetype map: Would reveal county types but loses granular quantitative information about deviation magnitude.

The deviation map was selected because it best balances analytical power, interpretability, and direct relevance to our research question.

Design Inspirations

Our approach draws from regression diagnostic visualizations (residual plots) but applies them to geographic data. Similar techniques appear in election forecasting ("over/underperforming polls") and real estate analysis ("above/below market rate"), but are rare in public health visualization.

Interpretation and Ethical Considerations

The labels “overperforming” and “underperforming” describe statistical deviation, not moral or cultural judgment. County level data can mask disparities within counties; a region that appears “healthy” overall may still contain underserved communities. The visualization should therefore prompt inquiry, not outright ranking. The color scale communicates direction clearly but simplifies complex realities, so context from socioeconomic history remains essential for interpretation.

2. Development Process

Team Workflow

Team Member	Primary Responsibilities	Estimated Hours
Harsh Arya	Data cleaning, Random Forest model implementation, feature engineering	15 hours
Gabrielle Despaigne	Exploratory analysis, color scale optimization, documentation, testing	16 hours
Camila Paik	D3.js map implementation, TopoJSON integration, interaction handlers	20 hours
Raghav Vasappanavara	UI/UX design, CSS styling, modal components, responsive layout	16 hours

Total effort: ~67 person-hours over 2 weeks

Technical Challenges

Data processing (8 hours): The County Health Rankings Excel file required extensive cleaning—column names varied across years, percentage encoding was inconsistent (some 0-1, some 0-100), and ~15% of counties had missing data for at least one metric. We implemented median imputation for model training.
Model integration (5 hours): Experimentation with Random Forest model implementation and optimization for 3,159 counties. Pre-computed predictions in Python and exported to JSON for efficient client-side rendering.
Map rendering performance (6 hours): Rendering 3,159 county paths caused lag on hover interactions. Optimized by simplifying TopoJSON geometry and using CSS transforms instead of re-rendering on hover.
Color scale design (4 hours): Finding a diverging scale that was colorblind-accessible, perceptually uniform, AND intuitively mapped to "good/bad" required testing multiple ColorBrewer palettes. Settled on RdYlGn with adjusted endpoints.

Tools & Technologies

Data processing: Python (pandas, scikit-learn, openpyxl)
Visualization: D3.js v7, TopoJSON
Frontend: Vanilla JavaScript (no frameworks), CSS Grid/Flexbox
Deployment: GitHub Pages

What Took the Most Time?

Surprisingly, data wrangling consumed nearly 30% of our time despite using a "clean" public dataset. The County Health Rankings data is comprehensive but not designed for direct machine learning use—it required significant preprocessing. The second largest time sink was interaction polish (tooltips, modals, smooth transitions), which took longer than the core map rendering.

Lessons Learned

Pre-compute expensive calculations (ML predictions) during data prep, not in-browser
Start with simplified geometry (TopoJSON compression) for large geodata
User testing revealed that our initial deviation thresholds (+/- 2 years) were too sensitive—adjusting to +/- 1 year made patterns clearer
Accessibility features (keyboard navigation, ARIA labels) should be built in from the start, not retrofitted

3. Future Enhancements

Given more time, we would add:

State-level aggregation view for mobile users (county-level too detailed on small screens)
Time-series animation showing how deviations change from 2020-2025
Exportable county comparison tool (select multiple counties, download PDF report)
Integration with Census data for demographic breakdowns within counties

4. Data Sources & References

County Health Rankings & Roadmaps. (2025). 2025 County Health Rankings National Data. Robert Wood Johnson Foundation & University of Wisconsin Population Health Institute. https://www.countyhealthrankings.org/
U.S. Census Bureau. (2024). Cartographic Boundary Files. https://www.census.gov/geographies/mapping-files/
Bostock, M. (2021). D3.js - Data-Driven Documents. https://d3js.org/

Life Expectancy: Actual vs Predicted

Life Expectancy

Prediction Model

County Name

Life Expectancy: Prediction vs Reality

All Health & Socioeconomic Indicators

Potential Contributing Factors

About This Visualization

What Are You Looking At?

How to Explore

Dataset & Methodology

Team