2.1. Study Design and Workflow Overview
We conducted a retrospective observational study linking satellite-derived and gridded environmental features to LE across 3,084 counties of the contiguous United States (CONUS), over 2000–2019 (61,680 county-year observations). Alaska, Hawaii, and the US territories were excluded because their satellite coverage, gridded product availability, or LE reporting are not directly comparable to CONUS.
The methodology is organised as a four-stage pipeline (
Figure 1). Stage 1 acquires and harmonises 11 geospatial and orbital modalities at their native sensor resolutions. Stage 2 applies spatial aggregation, quality control, and temporal interpolation to produce a county-year tabular matrix. Stage 3 engineers a small set of cross-modal physically motivated features. Stage 4 fits and interprets a Random Forest regression with county-grouped cross-validation.
Spatial analyses used the Google Earth Engine (GEE) Python application programming interface [
18]. Machine learning was implemented in Python 3.9 using the
scikit-learn library [
19] and the
shap library [
20,
21]. TreeSHAP computation was parallelised on the University of Texas at Dallas Juno High-Performance Computing cluster (64 cores) [
22]. To verify computational reproducibility, the pipeline was executed on two independent cluster nodes and the cross-validated metrics were confirmed to be identical to three decimal places between runs; the full run-to-run reproducibility statistics are reported in
Appendix A.1.
2.2. Satellite and Environmental Data Sources (Predictor Variables)
To represent the environmental, infrastructural, and agricultural setting of each county, we constructed a multimodal feature matrix. Eleven complementary data streams were integrated. Each stream is described in the subsections below, with explicit attention to (i) the data source and product identifier, (ii) the spatial and temporal resolutions, (iii) the variables extracted and the reasoning for their inclusion, and (iv) any product-specific scaling required to convert raw archive values to physical units. A summary table is provided in
Section 2.2.12 after the individual product descriptions.
2.2.1. MODIS Land Surface Temperature (MOD11A2)
The Moderate Resolution Imaging Spectroradiometer (MODIS) Terra MOD11A2 product was used to extract land surface temperature (LST). LST is the radiometric temperature of the radiating land surface, retrieved from the upwelling thermal-infrared emission measured by the MODIS sensor. We used Collection 6.1 of the product [
23], distributed by the National Aeronautics and Space Administration Land Processes Distributed Active Archive Center. The product has a nominal spatial resolution of 1 km × 1 km. Its temporal resolution is an 8-day composite, in which each pixel reports the average of clear-sky retrievals across the 8-day window. We retained both the daytime channel (
LST_Day_1km) and the nighttime channel (
LST_Night_1km). The two channels were retained separately because the diurnal cycle of LST captures different physical regimes: the daytime channel records peak solar forcing on the surface energy balance, whereas the nighttime channel records the minimum thermal environment after radiative cooling. As reviewed in
Section 1.1, existing heat-mortality literature is dominated by daytime maxima, and the nighttime channel is the principal data product that allows the chronic nocturnal thermal pattern to be evaluated as a separate scientific question.
Raw MOD11A2 values are stored as scaled, unsigned 16-bit integers. The product-specific scaling, which is documented in the MOD11A2 user guide and is distinct from any general kelvin-to-celsius conversion, converts raw archive values to degrees Celsius according to
where the factor 0.02 is the MODIS-specific multiplicative scale factor that recovers the kelvin value, and the constant 273.15 then converts kelvin to degrees Celsius. We emphasise that Equation (
1) is a MODIS product-handling formula, not a generic temperature conversion. Quality control was performed using the MOD11A2 quality assurance band; pixels flagged as cloud-contaminated, low-LST-quality, or with emissivity error above 0.04 were excluded. After per-county statistical reduction (
Section 2.4), the MOD11A2 stream contributes 14 features, comprising seven summary statistics for the daytime channel and seven for the nighttime channel.
2.2.2. MODIS Vegetation Indices (MOD13A1)
The MODIS Terra MOD13A1 product was used to extract two vegetation indices: NDVI and the Enhanced Vegetation Index (EVI). NDVI is defined as
, where
and
are the surface reflectances in the near-infrared and red bands respectively. EVI is a related vegetation index that includes an aerosol correction term and saturates less strongly than NDVI in densely vegetated canopies [
24]. The MOD13A1 product has a spatial resolution of 500 m and a temporal resolution of 16 day composites. Both indices were retained because NDVI is the canonical proxy for photosynthetic biomass and crop vigour, while EVI provides a complementary signal that is less prone to saturation over closed canopies. After per-county statistical reduction, this stream contributes 14 features, comprising seven summary statistics for each index.
2.2.3. Landsat 8 and Landsat 9 Optical Imagery
Landsat 8 and Landsat 9, operated jointly by the United States Geological Survey and the National Aeronautics and Space Administration, provide high-resolution multispectral imagery. We used the Landsat 8/9 Collection 2 Tier 1 Surface Reflectance (SR) product. Within this product, “Collection 2” denotes the second-generation processing baseline of the Landsat archive, “Tier 1” indicates the highest geometric and radiometric quality category, and “Surface Reflectance” indicates that the data have been atmospherically corrected from top-of-atmosphere radiance. The spatial resolution is 30 m for the optical bands. The combined Landsat 8 and Landsat 9 constellation yields an effective temporal resolution of 8 days at the equator and reduces to roughly 16 days with the gap-filled composites we extracted. We retained six SR bands (coastal aerosol, blue, green, red, near-infrared, and shortwave infrared 1) and computed six derived spectral indices (NDVI, EVI, the Normalised Difference Water Index, the Normalised Difference Moisture Index, the Soil-Adjusted Vegetation Index, and the Bare Soil Index) that summarise vegetation, surface water, and bare-soil information complementary to the raw bands. Cloud masking was performed using the QA_PIXEL quality assessment band, which flags clouds, cloud shadow, and snow at the pixel level. The Landsat stream contributes 12 bands or indices × 7 summary statistics features.
2.2.4. Sentinel-2 Optical Imagery
The Sentinel-2 mission, operated by ESA under the Copernicus programme, comprises two satellites that provide multispectral optical imagery. We used the Sentinel-2 Level-2A (S2 L2A) product. In Copernicus nomenclature, “Level-1C” refers to top-of-atmosphere reflectance and “Level-2A” refers to surface reflectance after atmospheric correction. The constellation provides a 5-day revisit time at the equator, with spatial resolutions of 10 m for the visible and near-infrared bands and 20 m for the red-edge and shortwave infrared bands. Six SR bands (blue, green, red, near-infrared, and two shortwave infrared bands) and six derived indices (NDVI, EVI, the Normalised Difference Water Index, the Normalised Difference Moisture Index, the Soil-Adjusted Vegetation Index, and the Bare Soil Index) were extracted. Cloud masking used the Sentinel-2 QA60 quality band, which encodes opaque-cloud and cirrus-cloud bits at 60 m resolution. The Sentinel-2 stream contributes features. Sentinel-2 was included alongside Landsat because the two missions overlap in spectral coverage but differ in revisit time, spatial resolution, and acquisition geometry; using both supports robustness against any single-mission artefacts.
2.2.5. Sentinel-1 Synthetic Aperture Radar
Sentinel-1, also operated by ESA under the Copernicus programme, carries a C-band synthetic aperture radar (SAR) instrument. SAR is an active microwave sensor and is therefore independent of solar illumination and largely insensitive to cloud cover, which makes it complementary to the optical missions described above. We used the Sentinel-1 Ground Range Detected (GRD) product acquired in Interferometric Wide swath mode, with a spatial resolution of 10 m and a temporal resolution of 6 days when both Sentinel-1 satellites were in operation. The GRD product provides backscatter intensity in two polarisations: VV, in which the transmitted and received electromagnetic waves are both vertically polarised, and VH, in which the transmitted wave is vertically polarised and the received wave is horizontally polarised. These two channels are sensitive to different physical scattering mechanisms (surface roughness, soil moisture, and vegetation volume scattering). To capture spatial texture, four Grey Level Co-occurrence Matrix (GLCM) texture metrics were computed on each polarisation. The GLCM is a statistical description of how often pairs of pixel values occur at a specified spatial offset, and the derived texture metrics (contrast, correlation, energy, and homogeneity) summarise local roughness in a manner not available from optical sensors [
16,?]. The Sentinel-1 stream contributes
features.
2.2.6. USDA NASS Cropland Data Layer
The Cropland Data Layer (CDL) is a categorical land-cover product issued annually by the National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA). The product is generated by supervised classification of Landsat and other satellite imagery and is verified against state-level agricultural surveys [
25]. The CDL is provided at 30 m spatial resolution and is updated once per year, giving an annual temporal resolution. The classification scheme contains more than 130 individual land-cover classes, including crop-specific categories (for example, corn, soybeans, winter wheat, cotton) and structural categories (developed open space, developed low intensity, developed medium intensity, deciduous forest, evergreen forest, woody wetlands, herbaceous wetlands). For each county and each year, we computed the fraction of county area assigned to each CDL class. After removing classes that are absent from CONUS or whose nationwide coverage is below 0.01%, the CDL stream contributes 135 fractional coverage features.
2.2.7. JRC Global Surface Water
The Joint Research Centre (JRC) of the European Commission produces the Global Surface Water (GSW) dataset, version 1.4, which maps surface water extent and persistence at 30 m resolution over the period 1984 to the present [
26]. The product is derived from Landsat imagery and is updated annually. We extracted six summary metrics per county: the fractional area classified as permanent water, the fractional area classified as seasonal water, the maximum extent of any historical inundation, the change in seasonal water extent over the study period, the change in permanent water extent over the study period, and an aggregate transition class. These features capture both the static hydrological signature of each county and any longitudinal hydrological change, and contribute 6 features.
2.2.8. Copernicus Digital Elevation Model
The Copernicus Digital Elevation Model (DEM), at the GLO-30 spatial resolution of 30 m, is a static gridded product derived from the TanDEM-X mission [
27]. From this product we computed seven topographic summary statistics per county: the mean elevation, the standard deviation of elevation (a proxy for topographic ruggedness), and five elevation percentiles (10
th, 25
th, 50
th, 75
th, and 90
th). Elevation is a static covariate that captures lapse-rate temperature effects, drainage patterns, and exposure to extreme weather, all of which are potentially relevant to LE. The DEM stream contributes 7 features.
2.2.9. ESA CCI Soil Moisture
The ESA Climate Change Initiative (ESA CCI) Soil Moisture product is a long-term climate data record that merges active microwave retrievals (from ASCAT and ERS scatterometers) with passive microwave retrievals (from SMMR, SSM/I, TMI, AMSR-E, AMSR2, WindSat, and SMOS) [
28]. The merged product is provided at a spatial resolution of 0.25° (approximately 25 km at mid-latitudes) and a temporal resolution of 1 day. Soil moisture is reported in volumetric units (m
3/m
3) and is here multiplied by 100 to give a dimensionless index in the range 0–100 for compactness. From this stream we computed seven county-level summary statistics: the mean and standard deviation of soil moisture, and the 10
th, 25
th, 50
th, 75
th, and 90
th percentiles. Soil moisture captures inter-annual drought stress and waterlogging, both of which affect agricultural productivity and, indirectly, rural health. The ESA CCI stream contributes 7 features.
2.2.10. FAO Gridded Livestock of the World
The Food and Agriculture Organization (FAO) of the United Nations publishes the Gridded Livestock of the World (GLW) dataset, in which the 2010 epoch is denoted GLW3 and the 2015 and 2020 epochs are denoted GLW4 [
29]. The product provides gridded density estimates for eight species (cattle, pigs, chickens, ducks, sheep, goats, horses, and buffaloes) at approximately 10 km spatial resolution. Because the underlying livestock census is conducted at three discrete epochs (2010, 2015, 2020), the temporal coverage is sparse. Annual gridded densities were derived from the three available FAO GLW epochs (2010, 2015, and 2020) using a constrained interpolation strategy. To account for the dramatic shifts in livestock numbers and to avoid the biologically impossible negative densities that can arise from unconstrained linear extrapolation, we applied a hybrid approach: for years between 2010 and 2020, linear interpolation was used; for years prior to 2010, densities were held constant at the 2010 baseline. A sensitivity analysis comparing this hybrid approach against simple nearest-neighbour interpolation confirmed that the choice of temporal filling had negligible impact on the final predictive model (
Appendix A.1). Per-county statistics include mean density, weighted-mean density (weighted by latitude-corrected pixel area), standard deviation, standard error, and a usable-area fraction. The full derivation is given in Appendix
Table A1. After excluding 10 primary extensive metrics (raw pixel counts and aggregate sums) to prevent county-area confounding, the FAO stream contributes 32 features.
2.2.11. Engineered Cross-Modal Features
In addition to the individual data streams above, we constructed 25 cross-modal engineered features that combine variables from two or more streams. These features are intended to capture physical interactions that are not represented by any single stream alone. Each engineered feature is documented in Appendix
Table A2, with its definition and physical units. A subset of four illustrative engineered features is discussed in
Section 2.5.
2.2.12. Summary Table of Predictor Streams
The eleven streams described above, together with the engineered feature block, are summarised in
Table 1. Per county-year, the extracted summary statistics from each continuous raster comprise the mean and the standard deviation (denoted SD), together with the 10
th, 25
th, 50
th, 75
th, and 90
th percentiles, denoted P10, P25, P50, P75, and P90 respectively. The Source column identifies the satellite mission or institutional provider, and the Product column identifies the specific algorithmic data product (a named, versioned dataset published by the source) from which features were extracted.
2.4. Spatial Aggregation, Imputation, and Quality Control
To accommodate the scale of the dataset and resolve spatial misalignments between administrative county boundaries and continuous raster grids, we deployed a custom programmatic extraction pipeline. The initial geographic extraction of CONUS county shapefiles was filtered to intersect only with complete, non-suppressed LE records from IHME. Counties lacking viable agricultural footprints or contiguous satellite coverage were subsequently excluded, giving a final cohort of 3,084 counties. The pipeline used GEE alongside local spatial libraries (rasterio and geopandas) and enforced four quality-control steps prior to machine learning integration.
Dynamic scale-override. Rather than enforcing a single resolution across all modalities, which would distort data drawn from sensors with very different native resolutions, we applied resolution-aware extraction. High-resolution optical and SAR imagery were reduced at a 250 m baseline. Legacy sensors such as MODIS LST were processed at their native 1000 m resolution. A sensitivity analysis confirmed that the variance in county-level statistical means across extraction scales (between 100 m and 1000 m) remained below 2%.
Spatiotemporal reduction and feature expansion. To transition from high-frequency, pixel-level grids to annual county-level observations that match the yearly life expectancy target, distinct spatiotemporal reduction strategies were applied. On the GEE platform, all sub-annual satellite passes within a given calendar year were first temporally composited into an annual raster. To prevent statistical distortion, the temporal reducer was strictly mapped to the underlying physical data type: continuous radiometric fields (such as MODIS LST) were temporally averaged using a mean reducer to capture the annual thermodynamic baseline; optical and SAR reflectance arrays (Landsat, Sentinel-1, Sentinel-2, MODIS NDVI) were composited using a median reducer to strictly suppress transient cloud shadows and atmospheric noise; and categorical land-cover classifications (USDA CDL, JRC Global Surface Water) were reduced using a temporal mode. From these robust annualised rasters, we then extracted seven spatial summary statistics for each county-year: the mean, the standard deviation (SD), and the five percentiles P10, P25, P50, P75, P90. Thus, a 12-band input such as Landsat (6 SR bands plus 6 spectral indices) yields exactly features, and a 6-channel Sentinel-1 input (VV, VH, and 4 GLCM textures) yields features. For categorical land-cover grids (USDA CDL, JRC Water), frequency histograms were computed and normalised to yield the percentage county coverage for each discrete class (yielding the 135 continuous coverage fractions of the CDL). For FAO gridded livestock, spatial extraction initially produced 42 metrics; 10 primary extensive metrics (raw pixel counts and aggregate sums) were programmatically excluded to prevent county-area confounding, leaving 32 retained features. The few secondary extensive metrics that were retained (such as calculated head counts) were organically deprioritised by the Random Forest during training, with none ranking in the top 150 SHAP-attributed features.
Latitude-adjusted agricultural masking. Integration of FAO livestock density required spatial targeting to prevent density dilution in heterogeneous counties containing extensive deserts or urban centres. Using National Land Cover Database (NLCD) classifications, a spatial mask excluded open water, high-intensity development, and barren land. Pixel areas were weighted by latitude to correct for projection distortions.
Imputation and null preservation. A strict distinction was maintained between true missing data (sensor downtime or out-of-epoch years, preserved as NaN) and confirmed physical absence (for example, zero crop coverage, encoded as 0). For counties with a near-zero viable agricultural area ( km2) that nonetheless reported livestock census data, an Inverse Distance Weighting (IDW) algorithm—which assigns proportionally greater mathematical influence to geographically closer data points—imputed baseline densities using neighbouring counties within an 11 km radius. Missing inter-census years were resolved via boundary-limited linear interpolation.
Temporal baseline stability. Mann–Kendall trend tests [
30], a nonparametric monotonic trend test, were applied to the 20-year baseline and confirmed no significant systemic drift in national NDVI (
,
) or nighttime LST (
,
). Because Landsat 8/9 and Sentinel-1/2 came online between 2013 and 2015, their spatial signatures were backward-filled at the county level to provide a static infrastructural baseline; dynamic temporal variation was carried by the continuous MODIS, USDA CDL, and ESA CCI SM records.
2.5. Feature Engineering and Preprocessing
To capture interactions across the data streams, we constructed cross-modal engineered features. Four illustrative examples, of the 25 documented in Appendix
Table A2, are given here:
Nighttime Cooling Efficiency (NCE): , where is a numerical stabiliser that prevents division by zero in counties with very low daytime LST.
Thermal Vegetation Index (TVI): , which combines nighttime thermal load with the vegetation fraction.
Diurnal Temperature Range: , the simple difference between day and night LST.
-
Impervious Surface Heat Index: , in which the
impervious-surface fraction is weighted by nighttime thermal load.
For all 61,680 county-year observations, global preprocessing was applied prior to cross-validation: infinite values were replaced with
NaN, followed by median imputation and winsorisation at the 0.1th/99.9th percentiles. Because GroupKFold places all 20 years of any given county exclusively in either the training or the test set, this global approach introduces negligible leakage. A sensitivity check using strict per-fold preprocessing (intra-county backward-fill and forward-fill plus training-fold-only imputation inside the GroupKFold loop) was performed and produced near-identical results; the comparison is reported in
Appendix A.1.
Supervised feature pruning. To reduce dimensionality and computational cost for SHAP attribution, we employed a two-stage supervised process. First, a preliminary Random Forest baseline (100 estimators, depth 20) was fitted to the global dataset using life expectancy as the target. Features registering a native impurity-based importance score below 0.001 were pruned from the feature matrix. This reduced the raw input from 450 to 193 features. Second, this pruned matrix serves as the exclusive input for the production model (
Section 2.6), ensuring that all reported SHAP attributions and cross-validation metrics reflect the same consistent, high-impact feature space.
2.6. Machine Learning Framework
Machine learning frameworks have become a standard tool for extracting nonlinear multivariate patterns from Earth observation arrays [
16]. Random Forest [
31] was selected as the production learner because it: (i) captures nonlinear threshold patterns without pre-specified functional forms; (ii) regularises across many features through bagging and random feature subsampling; (iii) detects cross-modal interactions through recursive partitioning; and (iv) admits exact and efficient SHAP computation for tree models [
21]. The production model is a Random Forest regressor fitted with hyperparameters that were optimised by a coarse grid search on the training folds only; the full optimisation procedure, the per-modality settings, and a comparison against an XGBoost alternative are reported in
Appendix A.4.
To prevent spatial leakage, we used a 5-fold county-grouped cross-validation scheme (GroupKFold,
) in which the county FIPS code is the grouping variable [
15]. Under this scheme, the 3,084 counties are partitioned into five disjoint county blocks, and each fold uses four blocks (approximately 80% of counties) for training and the remaining block (approximately 20% of counties) for testing. The county assignment to blocks is randomised at the county level under
random_state ; it is not a sequential “first 80% / last 20%” split, and it is not a random sample of individual county-year rows. Because all 20 years of any given county are kept together in the same block, no county-year observation appears in both training and test sets within a fold. All performance metrics reported in
Section 3 are means across the five held-out test blocks.
Performance metrics. Model performance is evaluated using four standard regression metrics: the coefficient of determination (
), the mean absolute error (MAE), the root mean squared error (RMSE), and the mean absolute percentage error (MAPE):
where
is the observed LE for county-year
i,
is the predicted value,
is the mean of
across the test set, and
n is the number of county-year observations in the test set. Moran’s
I, a test for spatial autocorrelation, was computed on fold-level residuals using queen-contiguity weights.
2.7. SHAP Interpretation Framework
TreeSHAP [
20,
21] was applied to the final combined Random Forest trained on all data, under the decomposition
, where
is the model’s expected output and
is the SHAP value (the attribution) for feature
j. SHAP values were computed on
stratified county-year observations using the
TreeExplainer interface, with
feature_perturbation = ’interventional’ and a 500-observation marginal background. We emphasise that SHAP values represent predictive associations within the fitted model and do not establish causal mechanisms.
Inflection points were estimated by computing the numerical derivative () of a Locally Weighted Scatterplot Smoothing (LOWESS) curve fitted to the attribution scatter, using an adaptive bandwidth of 0.30 to 0.40. Saturation thresholds were then located along the same response curves using the Kneedle elbow-detection algorithm.