Design Rationale & Development Process
1. Design Rationale
Context and Purpose
Economic inequality shapes life expectancy in measurable ways, but national averages hide local nuance. This project invites the public to explore which U.S. counties exceed or fall short of expectations and provide exploration of why. Designed for public health students, policy analysts, and curious citizens, this visualization transforms raw health data into a story about resilience and disparity.
Why a Deviation Map?
We chose to implement a residual/deviation map rather than a standard choropleth because it directly addresses our research question: "Which counties defy economic predictions?" A traditional map showing just life expectancy or income would simply reveal the well-known correlation between wealth and health. Our approach reveals something more interesting: the exceptions to this rule.
Visual Encoding Decisions
- Diverging color scale (red-white-green): We use a perceptually uniform diverging scale to encode positive (green) and negative (red) deviations from predictions. White represents counties performing as expected. This encoding immediately draws attention to outliers while maintaining clarity for the majority of counties near the center.
- Geographic map layout: Preserves spatial relationships essential for identifying regional patterns and enables users to locate their own communities.
- Quantitative scale: Deviations are measured in years of life expectancy, making the stakes concrete and relatable.
Interaction Techniques
- Details-on-demand (tooltips): Hovering reveals county name, actual vs predicted values, and deviation amount without cluttering the map.
- Modal drill-down: Clicking opens a detailed view with all 12 metrics, allowing users to investigate why a county might be outperforming or underperforming.
- Dynamic filtering: Users can isolate overperformers, underperformers, or expected performers to focus their exploration.
- View switching: Toggle to standard single-metric view for comparison and validation of our model-based approach.
Alternatives Considered
We evaluated four approaches (see checkpoint documentation):
- Bivariate choropleth: Would show two metrics simultaneously but requires complex 2D color scales and doesn't directly leverage our ML model.
- Linked multi-view: Would enable brushing across map + scatter plot but risks overwhelming users and increased development complexity.
- Clustering/archetype map: Would reveal county types but loses granular quantitative information about deviation magnitude.
The deviation map was selected because it best balances analytical power, interpretability, and direct relevance to our research question.
Design Inspirations
Our approach draws from regression diagnostic visualizations (residual plots) but applies them to geographic data. Similar techniques appear in election forecasting ("over/underperforming polls") and real estate analysis ("above/below market rate"), but are rare in public health visualization.
Interpretation and Ethical Considerations
The labels āoverperformingā and āunderperformingā describe statistical deviation, not moral or cultural judgment. County level data can mask disparities within counties; a region that appears āhealthyā overall may still contain underserved communities. The visualization should therefore prompt inquiry, not outright ranking. The color scale communicates direction clearly but simplifies complex realities, so context from socioeconomic history remains essential for interpretation.
2. Development Process
Team Workflow
| Team Member | Primary Responsibilities | Estimated Hours |
|---|---|---|
| Harsh Arya | Data cleaning, Random Forest model implementation, feature engineering | 15 hours |
| Gabrielle Despaigne | Exploratory analysis, color scale optimization, documentation, testing | 16 hours |
| Camila Paik | D3.js map implementation, TopoJSON integration, interaction handlers | 20 hours |
| Raghav Vasappanavara | UI/UX design, CSS styling, modal components, responsive layout | 16 hours |
Total effort: ~67 person-hours over 2 weeks
Technical Challenges
- Data processing (8 hours): The County Health Rankings Excel file required extensive cleaningācolumn names varied across years, percentage encoding was inconsistent (some 0-1, some 0-100), and ~15% of counties had missing data for at least one metric. We implemented median imputation for model training.
- Model integration (5 hours): Experimentation with Random Forest model implementation and optimization for 3,159 counties. Pre-computed predictions in Python and exported to JSON for efficient client-side rendering.
- Map rendering performance (6 hours): Rendering 3,159 county paths caused lag on hover interactions. Optimized by simplifying TopoJSON geometry and using CSS transforms instead of re-rendering on hover.
- Color scale design (4 hours): Finding a diverging scale that was colorblind-accessible, perceptually uniform, AND intuitively mapped to "good/bad" required testing multiple ColorBrewer palettes. Settled on RdYlGn with adjusted endpoints.
Tools & Technologies
- Data processing: Python (pandas, scikit-learn, openpyxl)
- Visualization: D3.js v7, TopoJSON
- Frontend: Vanilla JavaScript (no frameworks), CSS Grid/Flexbox
- Deployment: GitHub Pages
What Took the Most Time?
Surprisingly, data wrangling consumed nearly 30% of our time despite using a "clean" public dataset. The County Health Rankings data is comprehensive but not designed for direct machine learning useāit required significant preprocessing. The second largest time sink was interaction polish (tooltips, modals, smooth transitions), which took longer than the core map rendering.
Lessons Learned
- Pre-compute expensive calculations (ML predictions) during data prep, not in-browser
- Start with simplified geometry (TopoJSON compression) for large geodata
- User testing revealed that our initial deviation thresholds (+/- 2 years) were too sensitiveāadjusting to +/- 1 year made patterns clearer
- Accessibility features (keyboard navigation, ARIA labels) should be built in from the start, not retrofitted
3. Future Enhancements
Given more time, we would add:
- State-level aggregation view for mobile users (county-level too detailed on small screens)
- Time-series animation showing how deviations change from 2020-2025
- Exportable county comparison tool (select multiple counties, download PDF report)
- Integration with Census data for demographic breakdowns within counties
4. Data Sources & References
- County Health Rankings & Roadmaps. (2025). 2025 County Health Rankings National Data. Robert Wood Johnson Foundation & University of Wisconsin Population Health Institute. https://www.countyhealthrankings.org/
- U.S. Census Bureau. (2024). Cartographic Boundary Files. https://www.census.gov/geographies/mapping-files/
- Bostock, M. (2021). D3.js - Data-Driven Documents. https://d3js.org/