Preprint
Article

This version is not peer-reviewed.

Nighttime Thermal Patterns and County Life Expectancy: A 20-Year Multimodal Satellite Fusion for the Contiguous United States

Submitted:

23 May 2026

Posted:

25 May 2026

You are already at the latest version

Abstract
Satellite-derived environmental features can predict county-level life expectancy (LE) across the contiguous United States with a mean absolute error of 1.08 years over two decades, without using any census or sociodemographic inputs. We assembled 61,680 county-year observations across 3,084 counties from 2000–2019, integrating features from 11 satellite and gridded data streams. The data streams include the Moderate Resolution Imaging Spectroradiometer (MODIS) land surface temperature and vegetation indices, Sentinel-1 synthetic aperture radar, Sentinel-2 and Landsat optical imagery, the United States Department of Agriculture (USDA) Cropland Data Layer, the European Commission Joint Research Centre (JRC) Global Surface Water layer, the Copernicus Digital Elevation Model, the European Space Agency Climate Change Initiative (ESA CCI) soil moisture record, and the Food and Agriculture Organization (FAO) gridded livestock densities. After a supervised pruning step that removed low-importance variables, a Random Forest regressor was trained and evaluated using 5-fold cross-validation grouped by county. The grouping places all 20 years of each county exclusively in either the training set or the test set, which prevents spatial information leakage between folds. Coefficient of determination, mean absolute error, and root mean squared error are reported as R² = 0.631 ± 0.013, MAE = 1.08 ± 0.02 years, and RMSE = 1.48 ± 0.04 years. Moran's I, a measure of residual spatial autocorrelation, is 0.0988 (p = 0.001), which supports geographic generalisation. Multimodal fusion reduces unexplained variance by approximately one-third relative to the strongest single-modality baseline (MODIS land surface temperature alone, R² = 0.442). TreeSHAP attribution analysis reveals a feature hierarchy in which nighttime land surface temperature features carry roughly 6.16× the cumulative attribution weight of all daytime channels combined. The model response shows a protective inflection near a minimum overnight temperature of about 7.5°C. Because all input streams are globally available, the framework is extensible to regions where civil registration and vital statistics systems are incomplete, and supports satellite-based monitoring of United Nations Sustainable Development Goal (UN SDG) Target 3.9.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Life expectancy at birth (LE) is one of the most integrative summary statistics of population health. It reflects the combined influence of environmental stress, disease burden, economic deprivation, and behavioural risk. Despite its importance, the physical conditions associated with shorter or longer LE remain poorly characterised at a resolution useful for public-health action. Conventional LE surveillance relies on civil registration and vital statistics (CRVS) systems. These systems lag conditions by two to three years, aggregate to coarse units that can mask local disparities, and remain incomplete or absent across much of the developing world [1,2]. Concurrently, climate change is reorganising thermal environments [3,4], agricultural intensification is reshaping rural landscapes [5], and urbanisation is concentrating heat exposure in vulnerable neighbourhoods [6]. Communities whose physical environments differ in health-relevant ways may therefore receive no targeted intervention until mortality patterns appear in already-dated statistics.
Satellite Earth observation provides a complementary surveillance approach. The Moderate Resolution Imaging Spectroradiometer (MODIS), the Sentinel-1 and Sentinel-2 missions operated by the European Space Agency (ESA), and the Landsat 8 and Landsat 9 missions operated by the United States Geological Survey produce continuous, globally consistent measurements at weekly to monthly intervals. Together, these missions monitor land surface temperature (LST), vegetation health, agricultural land use, water dynamics, soil moisture, and landscape structure. Several of these measurements have a direct interpretive bridge to physiological health. Nighttime LST is a direct radiometric measurement of the minimum thermal environment in which residents must thermoregulate during sleep. The Normalised Difference Vegetation Index (NDVI), a physically grounded index of photosynthetic activity, tracks crop productivity, ambient air quality, and the attenuation of urban heat by canopy cover. The ESA Climate Change Initiative (ESA CCI) soil moisture product quantifies the hydrological state of agricultural land. The present study does not attempt to replace traditional epidemiological models that rely on detailed behavioural and socioeconomic covariates. Rather, the question we address is whether fusing these orbital and gridded streams, without any census input, can establish a viable environmental baseline. Such a capability test is critical for determining whether Earth observation can provide an actionable LE proxy for data-scarce regions in which CRVS coverage is incomplete, and supports monitoring of Target 3.9 of the United Nations Sustainable Development Goals (UN SDG 3.9), which aims to reduce deaths from hazardous chemicals and environmental contamination [7].

1.1. Gaps in the Existing Literature

Prior environmental-health studies have predominantly examined one or two variables in isolation. In a notable example of vegetation-only work, James et al. [8] reviewed the evidence linking greenness exposure, measured chiefly by NDVI, to cardiovascular outcomes; a complementary review by Gascon et al. [9] catalogued mental-health benefits of long-term residential exposure to green and blue spaces. Such studies rarely integrate thermal stress or agricultural context. In a multicountry study of heat-mortality, Gasparrini et al. [10] established that extreme temperatures are associated with excess deaths, but the analysis focuses almost exclusively on daytime maxima and acute heat waves. Chronic nighttime thermal patterns remain largely uncharacterised at the national scale. Agricultural-health studies document respiratory risks near concentrated animal feeding operations (CAFOs) [11], and livestock-driven water contamination [12], but lack the national coverage needed to identify nonlinear dose-response patterns. The result is a fragmented literature that isolates atmospheric, agricultural, and built-environment exposures. The physical and physiological interactions between these exposures, such as how chronic nocturnal thermal burden is mitigated by local canopy cover or exacerbated by agricultural land use, are therefore obscured. Consequently, no prior study has quantified the full cross-modal environmental hierarchy at the national scale, nor produced the continuous, satellite-derivable threshold values needed to guide policy [13,14]. A second limitation concerns model validation. Most satellite-based health studies apply standard random cross-validation, which can conflate spatial memorisation with genuine geographic generalisation [15].

1.2. Contributions and Objectives

This work builds directly upon two prior contributions from our group. The first is our earlier study on machine learning applications in geosciences and remote sensing [16], which established the analytic framework for fusing heterogeneous Earth observation streams. The second is our recently published study in Air [17], in which an interpretable machine-learning framework, combining XGBoost with SHAP attribution, identified formaldehyde exposure as leading atmospheric and meteorological predictors of county-level life expectancy across the contiguous United States, alongside socioeconomic and behavioural covariates. The present study extends these foundations in three ways. We move from a primarily atmospheric-chemistry sensing domain to a surface-environment sensing domain, anchored in radiometric Earth observation; we expand the data fusion from an atmospheric and socioeconomic panel to 11 orbital and gridded streams; and we introduce a stringent county-grouped cross-validation protocol that prevents spatial information leakage between folds.
Within that scope, the objectives of this study are:
  • to assemble a fully reproducible, multimodal satellite and gridded data archive at the county-year scale for the contiguous United States over the period 2000–2019;
  • to evaluate whether the assembled features, without any sociodemographic input, support a predictive model of county-level LE that generalises across geographically held-out counties;
  • to identify which environmental modalities and individual features carry the strongest predictive associations with LE, using a model-agnostic attribution framework (SHapley Additive exPlanations, SHAP); and
  • to test whether the framework can deliver continuous, satellite-derivable threshold values suitable for translation into screening criteria for data-scarce regions.

2. Materials and Methods

2.1. Study Design and Workflow Overview

We conducted a retrospective observational study linking satellite-derived and gridded environmental features to LE across 3,084 counties of the contiguous United States (CONUS), over 2000–2019 (61,680 county-year observations). Alaska, Hawaii, and the US territories were excluded because their satellite coverage, gridded product availability, or LE reporting are not directly comparable to CONUS.
The methodology is organised as a four-stage pipeline (Figure 1). Stage 1 acquires and harmonises 11 geospatial and orbital modalities at their native sensor resolutions. Stage 2 applies spatial aggregation, quality control, and temporal interpolation to produce a county-year tabular matrix. Stage 3 engineers a small set of cross-modal physically motivated features. Stage 4 fits and interprets a Random Forest regression with county-grouped cross-validation.
Spatial analyses used the Google Earth Engine (GEE) Python application programming interface [18]. Machine learning was implemented in Python 3.9 using the scikit-learn library [19] and the shap library [20,21]. TreeSHAP computation was parallelised on the University of Texas at Dallas Juno High-Performance Computing cluster (64 cores) [22]. To verify computational reproducibility, the pipeline was executed on two independent cluster nodes and the cross-validated metrics were confirmed to be identical to three decimal places between runs; the full run-to-run reproducibility statistics are reported in Appendix A.1.

2.2. Satellite and Environmental Data Sources (Predictor Variables)

To represent the environmental, infrastructural, and agricultural setting of each county, we constructed a multimodal feature matrix. Eleven complementary data streams were integrated. Each stream is described in the subsections below, with explicit attention to (i) the data source and product identifier, (ii) the spatial and temporal resolutions, (iii) the variables extracted and the reasoning for their inclusion, and (iv) any product-specific scaling required to convert raw archive values to physical units. A summary table is provided in Section 2.2.12 after the individual product descriptions.

2.2.1. MODIS Land Surface Temperature (MOD11A2)

The Moderate Resolution Imaging Spectroradiometer (MODIS) Terra MOD11A2 product was used to extract land surface temperature (LST). LST is the radiometric temperature of the radiating land surface, retrieved from the upwelling thermal-infrared emission measured by the MODIS sensor. We used Collection 6.1 of the product [23], distributed by the National Aeronautics and Space Administration Land Processes Distributed Active Archive Center. The product has a nominal spatial resolution of 1 km × 1 km. Its temporal resolution is an 8-day composite, in which each pixel reports the average of clear-sky retrievals across the 8-day window. We retained both the daytime channel (LST_Day_1km) and the nighttime channel (LST_Night_1km). The two channels were retained separately because the diurnal cycle of LST captures different physical regimes: the daytime channel records peak solar forcing on the surface energy balance, whereas the nighttime channel records the minimum thermal environment after radiative cooling. As reviewed in Section 1.1, existing heat-mortality literature is dominated by daytime maxima, and the nighttime channel is the principal data product that allows the chronic nocturnal thermal pattern to be evaluated as a separate scientific question.
Raw MOD11A2 values are stored as scaled, unsigned 16-bit integers. The product-specific scaling, which is documented in the MOD11A2 user guide and is distinct from any general kelvin-to-celsius conversion, converts raw archive values to degrees Celsius according to
LST C = LST raw × 0.02 273.15 ,
where the factor 0.02 is the MODIS-specific multiplicative scale factor that recovers the kelvin value, and the constant 273.15 then converts kelvin to degrees Celsius. We emphasise that Equation (1) is a MODIS product-handling formula, not a generic temperature conversion. Quality control was performed using the MOD11A2 quality assurance band; pixels flagged as cloud-contaminated, low-LST-quality, or with emissivity error above 0.04 were excluded. After per-county statistical reduction (Section 2.4), the MOD11A2 stream contributes 14 features, comprising seven summary statistics for the daytime channel and seven for the nighttime channel.

2.2.2. MODIS Vegetation Indices (MOD13A1)

The MODIS Terra MOD13A1 product was used to extract two vegetation indices: NDVI and the Enhanced Vegetation Index (EVI). NDVI is defined as ( ρ NIR ρ Red ) / ( ρ NIR + ρ Red ) , where ρ NIR and ρ Red are the surface reflectances in the near-infrared and red bands respectively. EVI is a related vegetation index that includes an aerosol correction term and saturates less strongly than NDVI in densely vegetated canopies [24]. The MOD13A1 product has a spatial resolution of 500 m and a temporal resolution of 16 day composites. Both indices were retained because NDVI is the canonical proxy for photosynthetic biomass and crop vigour, while EVI provides a complementary signal that is less prone to saturation over closed canopies. After per-county statistical reduction, this stream contributes 14 features, comprising seven summary statistics for each index.

2.2.3. Landsat 8 and Landsat 9 Optical Imagery

Landsat 8 and Landsat 9, operated jointly by the United States Geological Survey and the National Aeronautics and Space Administration, provide high-resolution multispectral imagery. We used the Landsat 8/9 Collection 2 Tier 1 Surface Reflectance (SR) product. Within this product, “Collection 2” denotes the second-generation processing baseline of the Landsat archive, “Tier 1” indicates the highest geometric and radiometric quality category, and “Surface Reflectance” indicates that the data have been atmospherically corrected from top-of-atmosphere radiance. The spatial resolution is 30 m for the optical bands. The combined Landsat 8 and Landsat 9 constellation yields an effective temporal resolution of 8 days at the equator and reduces to roughly 16 days with the gap-filled composites we extracted. We retained six SR bands (coastal aerosol, blue, green, red, near-infrared, and shortwave infrared 1) and computed six derived spectral indices (NDVI, EVI, the Normalised Difference Water Index, the Normalised Difference Moisture Index, the Soil-Adjusted Vegetation Index, and the Bare Soil Index) that summarise vegetation, surface water, and bare-soil information complementary to the raw bands. Cloud masking was performed using the QA_PIXEL quality assessment band, which flags clouds, cloud shadow, and snow at the pixel level. The Landsat stream contributes 12 bands or indices × 7 summary statistics = 84 features.

2.2.4. Sentinel-2 Optical Imagery

The Sentinel-2 mission, operated by ESA under the Copernicus programme, comprises two satellites that provide multispectral optical imagery. We used the Sentinel-2 Level-2A (S2 L2A) product. In Copernicus nomenclature, “Level-1C” refers to top-of-atmosphere reflectance and “Level-2A” refers to surface reflectance after atmospheric correction. The constellation provides a 5-day revisit time at the equator, with spatial resolutions of 10 m for the visible and near-infrared bands and 20 m for the red-edge and shortwave infrared bands. Six SR bands (blue, green, red, near-infrared, and two shortwave infrared bands) and six derived indices (NDVI, EVI, the Normalised Difference Water Index, the Normalised Difference Moisture Index, the Soil-Adjusted Vegetation Index, and the Bare Soil Index) were extracted. Cloud masking used the Sentinel-2 QA60 quality band, which encodes opaque-cloud and cirrus-cloud bits at 60 m resolution. The Sentinel-2 stream contributes 12 × 7 = 84 features. Sentinel-2 was included alongside Landsat because the two missions overlap in spectral coverage but differ in revisit time, spatial resolution, and acquisition geometry; using both supports robustness against any single-mission artefacts.

2.2.5. Sentinel-1 Synthetic Aperture Radar

Sentinel-1, also operated by ESA under the Copernicus programme, carries a C-band synthetic aperture radar (SAR) instrument. SAR is an active microwave sensor and is therefore independent of solar illumination and largely insensitive to cloud cover, which makes it complementary to the optical missions described above. We used the Sentinel-1 Ground Range Detected (GRD) product acquired in Interferometric Wide swath mode, with a spatial resolution of 10 m and a temporal resolution of 6 days when both Sentinel-1 satellites were in operation. The GRD product provides backscatter intensity in two polarisations: VV, in which the transmitted and received electromagnetic waves are both vertically polarised, and VH, in which the transmitted wave is vertically polarised and the received wave is horizontally polarised. These two channels are sensitive to different physical scattering mechanisms (surface roughness, soil moisture, and vegetation volume scattering). To capture spatial texture, four Grey Level Co-occurrence Matrix (GLCM) texture metrics were computed on each polarisation. The GLCM is a statistical description of how often pairs of pixel values occur at a specified spatial offset, and the derived texture metrics (contrast, correlation, energy, and homogeneity) summarise local roughness in a manner not available from optical sensors [16,?]. The Sentinel-1 stream contributes ( 2 polarisations + 4 textures ) × 7 statistics = 42 features.

2.2.6. USDA NASS Cropland Data Layer

The Cropland Data Layer (CDL) is a categorical land-cover product issued annually by the National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA). The product is generated by supervised classification of Landsat and other satellite imagery and is verified against state-level agricultural surveys [25]. The CDL is provided at 30 m spatial resolution and is updated once per year, giving an annual temporal resolution. The classification scheme contains more than 130 individual land-cover classes, including crop-specific categories (for example, corn, soybeans, winter wheat, cotton) and structural categories (developed open space, developed low intensity, developed medium intensity, deciduous forest, evergreen forest, woody wetlands, herbaceous wetlands). For each county and each year, we computed the fraction of county area assigned to each CDL class. After removing classes that are absent from CONUS or whose nationwide coverage is below 0.01%, the CDL stream contributes 135 fractional coverage features.

2.2.7. JRC Global Surface Water

The Joint Research Centre (JRC) of the European Commission produces the Global Surface Water (GSW) dataset, version 1.4, which maps surface water extent and persistence at 30 m resolution over the period 1984 to the present [26]. The product is derived from Landsat imagery and is updated annually. We extracted six summary metrics per county: the fractional area classified as permanent water, the fractional area classified as seasonal water, the maximum extent of any historical inundation, the change in seasonal water extent over the study period, the change in permanent water extent over the study period, and an aggregate transition class. These features capture both the static hydrological signature of each county and any longitudinal hydrological change, and contribute 6 features.

2.2.8. Copernicus Digital Elevation Model

The Copernicus Digital Elevation Model (DEM), at the GLO-30 spatial resolution of 30 m, is a static gridded product derived from the TanDEM-X mission [27]. From this product we computed seven topographic summary statistics per county: the mean elevation, the standard deviation of elevation (a proxy for topographic ruggedness), and five elevation percentiles (10th, 25th, 50th, 75th, and 90th). Elevation is a static covariate that captures lapse-rate temperature effects, drainage patterns, and exposure to extreme weather, all of which are potentially relevant to LE. The DEM stream contributes 7 features.

2.2.9. ESA CCI Soil Moisture

The ESA Climate Change Initiative (ESA CCI) Soil Moisture product is a long-term climate data record that merges active microwave retrievals (from ASCAT and ERS scatterometers) with passive microwave retrievals (from SMMR, SSM/I, TMI, AMSR-E, AMSR2, WindSat, and SMOS) [28]. The merged product is provided at a spatial resolution of 0.25° (approximately 25 km at mid-latitudes) and a temporal resolution of 1 day. Soil moisture is reported in volumetric units (m3/m3) and is here multiplied by 100 to give a dimensionless index in the range 0–100 for compactness. From this stream we computed seven county-level summary statistics: the mean and standard deviation of soil moisture, and the 10th, 25th, 50th, 75th, and 90th percentiles. Soil moisture captures inter-annual drought stress and waterlogging, both of which affect agricultural productivity and, indirectly, rural health. The ESA CCI stream contributes 7 features.

2.2.10. FAO Gridded Livestock of the World

The Food and Agriculture Organization (FAO) of the United Nations publishes the Gridded Livestock of the World (GLW) dataset, in which the 2010 epoch is denoted GLW3 and the 2015 and 2020 epochs are denoted GLW4 [29]. The product provides gridded density estimates for eight species (cattle, pigs, chickens, ducks, sheep, goats, horses, and buffaloes) at approximately 10 km spatial resolution. Because the underlying livestock census is conducted at three discrete epochs (2010, 2015, 2020), the temporal coverage is sparse. Annual gridded densities were derived from the three available FAO GLW epochs (2010, 2015, and 2020) using a constrained interpolation strategy. To account for the dramatic shifts in livestock numbers and to avoid the biologically impossible negative densities that can arise from unconstrained linear extrapolation, we applied a hybrid approach: for years between 2010 and 2020, linear interpolation was used; for years prior to 2010, densities were held constant at the 2010 baseline. A sensitivity analysis comparing this hybrid approach against simple nearest-neighbour interpolation confirmed that the choice of temporal filling had negligible impact on the final predictive model (Appendix A.1). Per-county statistics include mean density, weighted-mean density (weighted by latitude-corrected pixel area), standard deviation, standard error, and a usable-area fraction. The full derivation is given in Appendix Table A1. After excluding 10 primary extensive metrics (raw pixel counts and aggregate sums) to prevent county-area confounding, the FAO stream contributes 32 features.

2.2.11. Engineered Cross-Modal Features

In addition to the individual data streams above, we constructed 25 cross-modal engineered features that combine variables from two or more streams. These features are intended to capture physical interactions that are not represented by any single stream alone. Each engineered feature is documented in Appendix Table A2, with its definition and physical units. A subset of four illustrative engineered features is discussed in Section 2.5.

2.2.12. Summary Table of Predictor Streams

The eleven streams described above, together with the engineered feature block, are summarised in Table 1. Per county-year, the extracted summary statistics from each continuous raster comprise the mean and the standard deviation (denoted SD), together with the 10th, 25th, 50th, 75th, and 90th percentiles, denoted P10, P25, P50, P75, and P90 respectively. The Source column identifies the satellite mission or institutional provider, and the Product column identifies the specific algorithmic data product (a named, versioned dataset published by the source) from which features were extracted.

2.3. Target Variable: Life Expectancy at the Under-One Age Group

The target variable for the model is county-level LE, obtained from the Institute for Health Metrics and Evaluation (IHME) US County-Level Mortality dataset [1]. This dataset provides annual life-table estimates for every US county over the period 2000–2019, based on harmonised vital-statistics records. Records were extracted with race_name = Total, sex_name = Both, and age_name = <1 year. The <1 year category corresponds to life expectancy estimated at the under-one age group, which is the IHME life-table category most closely aligned with the conventional epidemiological concept of “life expectancy at birth” while remaining grounded in a directly observed age stratum. Federal Information Processing Standards (FIPS) county codes were standardised to a five-digit format to support deterministic joins with the satellite feature matrix. Observed LE values across CONUS counties span 65.3 to 92.3 years.

2.4. Spatial Aggregation, Imputation, and Quality Control

To accommodate the scale of the dataset and resolve spatial misalignments between administrative county boundaries and continuous raster grids, we deployed a custom programmatic extraction pipeline. The initial geographic extraction of CONUS county shapefiles was filtered to intersect only with complete, non-suppressed LE records from IHME. Counties lacking viable agricultural footprints or contiguous satellite coverage were subsequently excluded, giving a final cohort of 3,084 counties. The pipeline used GEE alongside local spatial libraries (rasterio and geopandas) and enforced four quality-control steps prior to machine learning integration.
Dynamic scale-override. Rather than enforcing a single resolution across all modalities, which would distort data drawn from sensors with very different native resolutions, we applied resolution-aware extraction. High-resolution optical and SAR imagery were reduced at a 250 m baseline. Legacy sensors such as MODIS LST were processed at their native 1000 m resolution. A sensitivity analysis confirmed that the variance in county-level statistical means across extraction scales (between 100 m and 1000 m) remained below 2%.
Spatiotemporal reduction and feature expansion. To transition from high-frequency, pixel-level grids to annual county-level observations that match the yearly life expectancy target, distinct spatiotemporal reduction strategies were applied. On the GEE platform, all sub-annual satellite passes within a given calendar year were first temporally composited into an annual raster. To prevent statistical distortion, the temporal reducer was strictly mapped to the underlying physical data type: continuous radiometric fields (such as MODIS LST) were temporally averaged using a mean reducer to capture the annual thermodynamic baseline; optical and SAR reflectance arrays (Landsat, Sentinel-1, Sentinel-2, MODIS NDVI) were composited using a median reducer to strictly suppress transient cloud shadows and atmospheric noise; and categorical land-cover classifications (USDA CDL, JRC Global Surface Water) were reduced using a temporal mode. From these robust annualised rasters, we then extracted seven spatial summary statistics for each county-year: the mean, the standard deviation (SD), and the five percentiles P10, P25, P50, P75, P90. Thus, a 12-band input such as Landsat (6 SR bands plus 6 spectral indices) yields exactly 12 × 7 = 84 features, and a 6-channel Sentinel-1 input (VV, VH, and 4 GLCM textures) yields 6 × 7 = 42 features. For categorical land-cover grids (USDA CDL, JRC Water), frequency histograms were computed and normalised to yield the percentage county coverage for each discrete class (yielding the 135 continuous coverage fractions of the CDL). For FAO gridded livestock, spatial extraction initially produced 42 metrics; 10 primary extensive metrics (raw pixel counts and aggregate sums) were programmatically excluded to prevent county-area confounding, leaving 32 retained features. The few secondary extensive metrics that were retained (such as calculated head counts) were organically deprioritised by the Random Forest during training, with none ranking in the top 150 SHAP-attributed features.
Latitude-adjusted agricultural masking. Integration of FAO livestock density required spatial targeting to prevent density dilution in heterogeneous counties containing extensive deserts or urban centres. Using National Land Cover Database (NLCD) classifications, a spatial mask excluded open water, high-intensity development, and barren land. Pixel areas were weighted by latitude to correct for projection distortions.
Imputation and null preservation. A strict distinction was maintained between true missing data (sensor downtime or out-of-epoch years, preserved as NaN) and confirmed physical absence (for example, zero crop coverage, encoded as 0). For counties with a near-zero viable agricultural area ( < 0.1  km2) that nonetheless reported livestock census data, an Inverse Distance Weighting (IDW) algorithm—which assigns proportionally greater mathematical influence to geographically closer data points—imputed baseline densities using neighbouring counties within an 11 km radius. Missing inter-census years were resolved via boundary-limited linear interpolation.
Temporal baseline stability. Mann–Kendall trend tests [30], a nonparametric monotonic trend test, were applied to the 20-year baseline and confirmed no significant systemic drift in national NDVI ( τ = 0.06 , p = 0.72 ) or nighttime LST ( τ = 0.11 , p = 0.54 ). Because Landsat 8/9 and Sentinel-1/2 came online between 2013 and 2015, their spatial signatures were backward-filled at the county level to provide a static infrastructural baseline; dynamic temporal variation was carried by the continuous MODIS, USDA CDL, and ESA CCI SM records.

2.5. Feature Engineering and Preprocessing

To capture interactions across the data streams, we constructed cross-modal engineered features. Four illustrative examples, of the 25 documented in Appendix Table A2, are given here:
  • Nighttime Cooling Efficiency (NCE):  NCE = ( P 90 , day P 10 , night ) / ( P 90 , day + ε ) , where ε = 10 3 is a numerical stabiliser that prevents division by zero in counties with very low daytime LST.
  • Thermal Vegetation Index (TVI):  TVI = LST night × ( 1 NDVI ) , which combines nighttime thermal load with the vegetation fraction.
  • Diurnal Temperature Range:  LST day LST night , the simple difference between day and night LST.
  • Impervious Surface Heat Index:  ( f dev _ med + f dev _ high ) × LST night , in which the
    impervious-surface fraction is weighted by nighttime thermal load.
For all 61,680 county-year observations, global preprocessing was applied prior to cross-validation: infinite values were replaced with NaN, followed by median imputation and winsorisation at the 0.1th/99.9th percentiles. Because GroupKFold places all 20 years of any given county exclusively in either the training or the test set, this global approach introduces negligible leakage. A sensitivity check using strict per-fold preprocessing (intra-county backward-fill and forward-fill plus training-fold-only imputation inside the GroupKFold loop) was performed and produced near-identical results; the comparison is reported in Appendix A.1.
Supervised feature pruning. To reduce dimensionality and computational cost for SHAP attribution, we employed a two-stage supervised process. First, a preliminary Random Forest baseline (100 estimators, depth 20) was fitted to the global dataset using life expectancy as the target. Features registering a native impurity-based importance score below 0.001 were pruned from the feature matrix. This reduced the raw input from 450 to 193 features. Second, this pruned matrix serves as the exclusive input for the production model (Section 2.6), ensuring that all reported SHAP attributions and cross-validation metrics reflect the same consistent, high-impact feature space.

2.6. Machine Learning Framework

Machine learning frameworks have become a standard tool for extracting nonlinear multivariate patterns from Earth observation arrays [16]. Random Forest [31] was selected as the production learner because it: (i) captures nonlinear threshold patterns without pre-specified functional forms; (ii) regularises across many features through bagging and random feature subsampling; (iii) detects cross-modal interactions through recursive partitioning; and (iv) admits exact and efficient SHAP computation for tree models [21]. The production model is a Random Forest regressor fitted with hyperparameters that were optimised by a coarse grid search on the training folds only; the full optimisation procedure, the per-modality settings, and a comparison against an XGBoost alternative are reported in Appendix A.4.
To prevent spatial leakage, we used a 5-fold county-grouped cross-validation scheme (GroupKFold, K = 5 ) in which the county FIPS code is the grouping variable [15]. Under this scheme, the 3,084 counties are partitioned into five disjoint county blocks, and each fold uses four blocks (approximately 80% of counties) for training and the remaining block (approximately 20% of counties) for testing. The county assignment to blocks is randomised at the county level under random_state  = 42 ; it is not a sequential “first 80% / last 20%” split, and it is not a random sample of individual county-year rows. Because all 20 years of any given county are kept together in the same block, no county-year observation appears in both training and test sets within a fold. All performance metrics reported in Section 3 are means across the five held-out test blocks.
Performance metrics. Model performance is evaluated using four standard regression metrics: the coefficient of determination ( R 2 ), the mean absolute error (MAE), the root mean squared error (RMSE), and the mean absolute percentage error (MAPE):
R 2 = 1 i ( y i y ^ i ) 2 i ( y i y ¯ ) 2 , MAE = 1 n i | y i y ^ i | ,
RMSE = 1 n i ( y i y ^ i ) 2 , MAPE = 100 n i | y i y ^ i | y i ,
where y i is the observed LE for county-year i, y ^ i is the predicted value, y ¯ is the mean of y i across the test set, and n is the number of county-year observations in the test set. Moran’s I, a test for spatial autocorrelation, was computed on fold-level residuals using queen-contiguity weights.

2.7. SHAP Interpretation Framework

TreeSHAP [20,21] was applied to the final combined Random Forest trained on all data, under the decomposition f ( x ) = ϕ 0 + j = 1 p ϕ j , where ϕ 0 is the model’s expected output and ϕ j is the SHAP value (the attribution) for feature j. SHAP values were computed on n = 5 , 000 stratified county-year observations using the TreeExplainer interface, with feature_perturbation = ’interventional’ and a 500-observation marginal background. We emphasise that SHAP values represent predictive associations within the fitted model and do not establish causal mechanisms.
Inflection points were estimated by computing the numerical derivative ( Δ SHAP / Δ feature ) of a Locally Weighted Scatterplot Smoothing (LOWESS) curve fitted to the attribution scatter, using an adaptive bandwidth of 0.30 to 0.40. Saturation thresholds were then located along the same response curves using the Kneedle elbow-detection algorithm.

3. Results

We present the results in three blocks. We first validate model performance: spatial accuracy and bias structure across CONUS (Figure 2 and Figure 3), temporal stability over two decades (Figure 4), and the gain from multimodal fusion (Figure 5). We then interpret the trained model: we rank the global importance of all environmental features (Figure 6) and examine the leading patterns, including the dominance of nighttime LST features (Figure 7), the threshold structure of the leading predictors (Figure 8), nonlinear agricultural inflection points (Figure 9), and a temperature-dose-dependent forest attenuation effect (Figure 10).

3.1. Spatial Prediction Fidelity

Figure 2 presents the spatial prediction fidelity of the production model for the year 2000, shown as an illustrative single-year snapshot in four panels. Year 2000 is the first year of the study period and is chosen here only as a representative slice: it allows a direct side-by-side choropleth comparison between the observed and the predicted LE surface on a single, geographically interpretable map. The aggregated multi-year performance, which uses all 20 years and all five GroupKFold folds, is reported in Section 3.2 (Figure 3, spatial MAE and residual bias) and in Section 3.3 (Figure 4, year-stratified R 2 , MAE, and RMSE). Panel (A) shows the observed IHME LE choropleth at the county scale, with darker shading marking higher LE and the Southern health deficit (Mississippi Delta and Appalachian corridor) visible in red. Panel (B) shows the corresponding model-predicted LE choropleth on the same colour scale; the model recovers the Southern health deficit, Rust Belt deterioration, and the metropolitan-to-rural LE gradient without any census input. National mean predicted LE (76.37 yr) closely matches the observed national mean (76.17 yr), indicating low aggregate bias. Panel (C) is a scatter of predicted versus observed LE across n = 3 , 084 counties; the Pearson correlation is r = 0.782 , R 2 = 0.601 , the mean absolute error is 0.97  yr, and the ordinary-least-squares regression slope is 0.587 . The compressed prediction range (71.0–84.3 yr) relative to the observed range (65.3–86.6 yr) reflects the standard mean-regression behaviour of Random Forests and is not specific to any geographic region. Panel (D) shows the residual histogram (actual minus predicted): the distribution is near-Gaussian (mean = 0.207  yr, skewness = 0.84 ), with 63% of counties falling within ± 1  yr and 90% within ± 2  yr of the model prediction.

3.2. Spatial Accuracy and Bias Structure

Figure 3 characterises prediction precision and bias direction aggregated over all 20 years and five folds. The mean residual across all county-years is 0.007 yr (skewness = 0.44 ), indicating low aggregate bias. Over-prediction concentrates in a small number of Indigenous reservation counties, where structural health determinants are not encoded in the satellite feature set (Section 4.6). Under-prediction appears in a smaller number of high-socioeconomic-status counties where healthcare access exceeds what the satellite features capture. Moran’s I = 0.0988 ( p = 0.001 ) indicates low residual spatial autocorrelation, which supports geographic generalisation: the model’s errors are not systematically clustered into adjacent counties.

3.3. Temporal Generalisation: 20-Year Stability

Figure 4 shows year-stratified performance across 2000–2019. The coefficient of determination shows no significant trend ( β = 0.0005 /yr, p = 0.35 ), suggesting no systematic drift in geographic generalisation across six sensor generations. The MAE rose modestly ( β = + 0.0137  yr/yr, p < 0.001 ), from 0.97  yr in 2000 to about 1.25  yr in 2019; the RMSE rose similarly ( β = + 0.0181  yr/yr, p < 0.001 ). The most plausible explanation for this modest widening is that localised, non-environmental mortality shocks during this period, including the opioid epidemic [32] and rapid economic polarisation [1], are not captured by the satellite feature set. The stable R 2 through 2014–2019 also indicates that the addition of Sentinel-1/2 and Landsat 8/9 within this window does not introduce a discontinuity; the MODIS-anchored feature set sustains generalisation across the full study period.

3.4. Multimodal Fusion Benefit: Ablation Study

Figure 5 and Appendix Table A4 present the ablation study, in which the production model is compared against each single-modality baseline. Single-modality R 2 values range from 0.066 (JRC Global Surface Water) to 0.442 (MODIS LST). The 25-feature engineered set alone delivers R 2 = 0.581 ± 0.015 , MAE = 1.16  yr, exceeding every single raw modality, including the 135-feature USDA CDL block ( R 2 = 0.433 ). The full fusion (production) model achieves R 2 = 0.631 ± 0.013 , MAE = 1.08 ± 0.02  yr, RMSE = 1.48  yr, approaching the IHME census-based benchmark of R 2 0.82  [1]. The normalised fusion gain over the strongest single-modality model is:
Fusion Gain = R combined 2 R LST 2 1 R LST 2 = 0.631 0.442 1 0.442 32 % .
Table 2 summarises per-fold performance of the production model.

3.5. Global Feature Importance Hierarchy

Figure 6 presents the top-20 SHAP attributions of the production model, computed on a stratified random sample of n = 5 , 000 county-year observations drawn from across all 20 years of the study period (so the SHAP attributions are not specific to any single year). Nighttime thermal features dominate the hierarchy. NCE (mean | SHAP | = 0.309  yr) ranks first, followed by Nighttime LST 10th percentile ( 0.180  yr), Nighttime LST 25th percentile ( 0.151  yr), Nighttime LST Mean ( 0.091  yr), and Nighttime LST 75th percentile ( 0.074  yr); the top five positions are entirely nocturnal. Developed (Medium Intensity) % ( 0.074  yr) and TVI ( 0.052  yr) rank sixth and eighth respectively. Aggregated across all 14 MODIS LST features, the nighttime channels carry Σ | SHAP | = 0.860  yr versus 0.140  yr for all daytime channels, a 6.16 × ratio. SHAP rankings track the native Random Forest impurity importance closely (Spearman ρ = 0.89 ). To verify that the ranking is not an artefact of a single data partition, feature importance was recomputed within each of the five cross-validation folds: nighttime thermal features consistently retained the top predictive positions across all held-out sets.

3.6. The Nighttime Thermal Pattern

Figure 7 isolates the principal empirical finding by focusing on the MODIS LST features. Panel (A) ranks all 14 MODIS LST features by mean | SHAP | : nighttime variables occupy the top six positions (NCE: 0.309  yr; Night T 10th: 0.180  yr; Night T 25th: 0.151  yr; Night Mean: 0.091  yr; Night T 75th: 0.074  yr; Night T 90th: 0.052  yr), while all daytime variables rank below. Panel (B) shows the cumulative SHAP attribution of each channel as features are added in rank order, and confirms the channel-level totals: nighttime Σ | SHAP | = 0.860  yr versus daytime Σ | SHAP | = 0.140  yr, a 6.16 × ratio. Daytime LST is not absent from the model; rather, the nocturnal thermal baseline carries the larger share of the predictive signal at the county-aggregated scale used here.
The SHAP dependence profile for Nighttime LST 10th percentile (Figure 8, panel B) is associated with a protective response below approximately 7.5 °C; the model response transitions to a penalty above roughly 10 to 12 °C. By contrast, the daytime LST 10th percentile (panel F) carries mean | SHAP | = 0.022  yr, roughly 14 times smaller than NCE. We note a possible physiological reading, namely that nighttime conditions determine the window during which radiative cooling can support core temperature decline, a prerequisite for sleep-associated cardiovascular and metabolic recovery [33]. We offer this as context for interpretation, not as a causal claim, since our data are spatially aggregated and observational.

3.7. Threshold Structure of the Leading Predictors

Figure 8 contrasts raw bivariate associations with SHAP-attributed partial responses for three of the leading predictors. For Nighttime LST 10th percentile (panels A and B), the raw Pearson correlation is r = 0.500 ( n = 61 , 540 county-years), and the SHAP profile shows a protective response below approximately 7.5 °C (mean | SHAP | = 0.180  yr). For TVI (panels C and D), r = + 0.537 and the SHAP profile identifies an inflection near TVI 4.22 (mean | SHAP | = 0.052  yr). For Daytime LST Mean (panels E and F), r = 0.309 and the SHAP profile suggests a two-threshold pattern near 15.5 ° C and 27.83°C, with mean | SHAP | = 0.022  yr. The raw correlation ranking (Night T > TVI > Day LST) tracks the SHAP ranking, which supports the interpretation that the SHAP attributions reflect genuine signal in the model rather than artefacts of the attribution algorithm.

3.8. Agricultural Threshold Patterns

Figure 9 shows three nonlinear inflection points identified in the SHAP derivatives (panels D–F), all dynamically estimated. Pig density shows an inflection near 2,387 head/km2 that marks a transition from rapid to diminishing positive association, with saturation near 15,574 head/km2. The positive association for developed open space saturates near 8% county fraction, where moderate park and suburban open-space coverage no longer adds further benefit. Cattle density shows a non-linear inflection characterized by diminishing returns as densities increase, with saturation near 1775 head/km2. Negative LE associations begin to appear at moderate cattle densities, a pattern consistent with prior reports of diffuse nitrogen loading and groundwater contamination at densities below the conventional regulatory CAFO threshold [12]. Because the cattle inflection is satellite-derivable and updatable annually, it could serve as a remote-sensing screening criterion for ground-level review in data-scarce regions.

3.9. Forest Cover and Heat-Associated Attenuation

Figure 10 quantifies the cross-modal interaction between deciduous forest cover and thermal patterns. In high-forest counties (deciduous forest fraction above 20%), the same daytime thermal forcing is associated with little heat-driven LE penalty in the SHAP partial response, whereas in low-forest counties (below 5%) the penalty is present (panel A). Deciduous cover was used because its seasonal canopy dynamics and high evapotranspiration rates are particularly relevant to summer thermal extremes. The buffering pattern is temperature-dose-dependent (panel B): it is negligible in cool quartiles Q1 and Q2 and rises to Δ = 0.012  yr in the hottest quartile Q4. The relative attenuation reaches roughly 125% in Q4. We note that this percentage is computed against a small SHAP baseline; it is best read as a near-complete offset of the daytime heat-associated penalty in the hottest quartile, rather than as a macroscopic shift in LE.

4. Discussion

4.1. Principal Findings in Context

This study indicates that multimodal satellite remote sensing can predict county-level LE across CONUS with MAE = 1.08  yr ( R 2 = 0.631 ± 0.013 ), without any census or sociodemographic input. The county-grouped cross-validation design is conservative: standard random cross-validation would inflate R 2 by approximately 0.04 to 0.07 due to spatial autocorrelation [15], so the reported value is a lower bound on geographic generalisation. The approximately 32% fusion gain over the strongest single-modality baseline indicates real cross-modal complementarity. Even individually weak contributors (Sentinel-1 SAR: R 2 = 0.111 ; soil moisture: R 2 = 0.111 ; JRC Global Surface Water: R 2 = 0.066 ) add information that is not redundant with the leading modalities. Stability across 20 years ( R 2 trend p = 0.35 ) supports the interpretation that the patterns reflect persistent environmental signal. Moran’s I = 0.0988 ( p = 0.001 ) supports geographic generalisation against severe spatial leakage. The remaining gap relative to census-based models [1] is consistent with socioeconomic, behavioural, and healthcare-access factors that correlate with, but are not captured by, satellite measurements.

4.2. The Nighttime Thermal Pattern

The most consequential finding of this study is the 6.16 × ratio of nighttime to daytime LST in cumulative SHAP attribution ( 0.860  yr versus 0.140  yr). The SHAP dependence profile is associated with a protective response below approximately 7 . 5 C, and transitions to a penalty beyond roughly 10 to 12 °C. This window is consistent with the temperature range over which radiative cooling can support core temperature decline without activating evaporative compensation, the physiological setting for sleep-associated cardiovascular restoration [33].
While public-health attention has tended to focus on acute daytime heat, the orbital record points to the nocturnal baseline as the larger satellite-observable signal at the county-year scale. Three points are worth emphasising. First, existing literature focuses on daytime maxima and acute heat waves [10]; the equivalent daytime feature in our model carries mean | SHAP | = 0.022  yr, roughly 14 times smaller than NCE. Second, nighttime LST is the canonical signature of the urban heat island (UHI), which is most pronounced at night because built-up surfaces retain daytime heat while vegetated surfaces cool more rapidly [34]. Third, the pattern is continuous and dose-dependent in our SHAP profile, rather than tied to discrete heat-wave events.
For policy translation, the inflection in the SHAP response near a minimum overnight temperature of about 7.5 to 10 C provides a satellite-observable, annually updatable criterion that could help prioritise residential cooling assistance, nocturnal heat-alert systems, and urban-greening choices that allow radiative cooling at night. Because the analysis is observational and ecological, we frame this as a screening criterion rather than as a causal recommendation.

4.3. Agricultural Thresholds and Forest Attenuation

The nonlinear thresholds in the SHAP profiles add to a literature that often assumes linear dose-response in agricultural epidemiology. The saturation of the developed-open-space association near 8% county fraction indicates that moderate park and suburban green coverage is associated with measurable benefit that levels off above this fraction. The cattle density profile reveals a non-linear inflection characterized by diminishing returns as densities increase, suggesting that regulatory frameworks built around large-CAFO thresholds may underrepresent diffuse moderate-density burdens, consistent with reports of nitrogen loading at densities below the conventional regulatory threshold [12]. Because this inflection is satellite-derivable, it offers a data-driven entry point for tiered regulatory screening.
When interpreting these patterns, built-environment and agricultural metrics in a satellite-only model often serve as broad proxies for regional economic structure and rurality, rather than direct individual exposures. For example, the saturation of the developed-open-space association likely captures the “suburban health advantage” associated with proximity to healthcare infrastructure and physical activity amenities, without the penalties of high-intensity industrial urbanisation. Similarly, the differing SHAP profiles across livestock species may reflect regional variations in agricultural industrialisation and rural infrastructure rather than species-specific physiological effects.
The forest attenuation pattern (approximately 125% relative attenuation in the hottest quartile, with Δ = 0.012  yr in absolute SHAP terms) provides satellite-verifiable arithmetic for the role of forest cover in thermally stressed counties. As above, we read this as a co-benefit signal that can enter cost-benefit analyses for nature-based adaptation, not as evidence of a direct causal effect.

4.4. Global Extensibility and Call to Action for UN SDG 3.9

All 11 input data streams used here are globally available and continuously updated: MODIS has been operational since 2000, Sentinel-1 and Sentinel-2 since 2014, Landsat since the 1970s, the ESA CCI Soil Moisture record since 1978, and FAO livestock products for the major producing nations. The only country-specific requirement is a subnational LE dataset for calibration, which is available through IHME for many nations and through Demographic and Health Surveys for low- and middle-income countries. Because satellite data quality is independent of national vital-statistics infrastructure, the framework is in principle extensible to data-scarce regions where census-based surveillance is most limited. It can therefore support monitoring of UN SDG Target 3.9 in nations where civil registration and vital statistics coverage is incomplete. The deployment of this framework in such regions, paired with locally collected vital statistics where available, would provide a near-real-time complement to traditional surveillance that is unaffected by the institutional gaps that limit CRVS coverage.

4.5. Algorithmic Equity and the Spectral Shadow

Prediction errors concentrate in a small number of Indigenous reservation counties (for example, Oglala Lakota, SD; Todd, SD; Sioux, ND; Buffalo, SD; Roosevelt, MT), with residual magnitudes that substantially exceed the national mean. These residuals are not random noise. They are the imprint of structural health determinants that have no satellite signature: chronic underfunding of the Indian Health Service [35], multi-generational nutritional insecurity, historical trauma, and the long-running policy legacy of allotment, relocation, and underinvestment. We refer to this systematic pattern as the spectral shadow: a class of mortality drivers that operate at the community scale but cast no radiometric signature, and are therefore invisible to any orbital sensor.
The implication for deployment is direct. The framework presented here is not appropriate for use in isolation in structurally marginalised communities, where it would systematically over-predict LE and could thereby misdirect resources away from communities with the greatest unmet need. In any deployment, predictions for such communities must be paired with locally collected vital statistics, community-informed adjustment, and an explicit equity audit. Framed positively, the spectral shadow is itself a useful diagnostic: counties in which the model substantially over-predicts LE are precisely those in which social determinants vastly outweigh environmental ones, and the magnitude of the residual is a coarse measure of that gap.

4.6. Limitations

Several limitations qualify the findings. First, the analysis is ecological: county-level associations do not establish individual-level causation [36], and residual confounding by unmeasured socioeconomic variables is unavoidable. Second, satellite-derived features such as nighttime LST and developed-open-space fraction are not purely environmental; they covary with historical socioeconomic policies (such as redlining and infrastructural investment patterns), so the model partly reflects the physical signature of long-standing wealth and policy disparities rather than isolated climatic forcing. The model should therefore be read as identifying spatial associations within the joint environmental and socioeconomic landscape, not as isolating causal pathways.
Third, while county-grouped cross-validation strictly prevents spatial leakage by holding out entire counties, it does not isolate shared national temporal trends. Because the training set spans the full 20-year period, the model may partially absorb longitudinal macroeconomic or national health trends rather than purely local spatial variation. Fourth, gridded products such as ESA CCI soil moisture (0.25°) and FAO livestock (∼10 km) have spatial supports that can encompass multiple small counties. This inherent grid-sharing smooths intra-regional variance, which means these specific features capture broader regional gradients rather than strict county-level boundaries. Fifth, while high-resolution sensors (Sentinel-2, Landsat 8/9) are absent before 2014, within-fold imputation and the stable R 2 trend indicate that the continuous MODIS and ESA CCI records effectively anchor generalisation across the full study period. Future work will incorporate a broader suite of atmospheric and meteorological arrays, building on concurrent framework extensions from our group that evaluate U.S. life expectancy against extended atmospheric chemistry and aerosol suites [37].

5. Conclusions

Satellite remote sensing offers a near-real-time complement to traditional vital statistics for population health surveillance. After importance-based pruning, 193 features drawn from 11 data streams across 3,084 CONUS counties and 20 years yield R 2 = 0.631 ± 0.013 , MAE = 1.08 ± 0.02  yr, and RMSE = 1.48 ± 0.04  yr on geographically held-out counties, without any demographic input. The residual spatial autocorrelation is low (Moran’s I = 0.0988 , p = 0.001 ).
The leading finding is that the chronic nocturnal thermal environment, rather than acute daytime heat, carries the largest satellite-observable share of the variance associated with county longevity. Nighttime LST features carry roughly 6.16 × the cumulative SHAP attribution of daytime channels, with a model-response inflection near a minimum overnight temperature of about 7 . 5 C. The threshold is satellite-observable and annually updatable. Nonlinear agricultural inflection points (including a clear threshold profile for cattle density and saturation of developed open space near 8%) and a temperature-dose-dependent forest attenuation pattern (deciduous cover above 20% offsetting the heat-associated SHAP penalty in the hottest quartile) add cross-modal patterns that linear models tend to miss but that gridded products make directly accessible.
Because all 11 input streams are globally available and continuously updated, the framework requires only a subnational LE dataset for calibration. It is therefore extensible to data-scarce regions, and supports satellite-based monitoring of UN SDG Target 3.9 where civil registration coverage is limited.

Author Contributions

Conceptualisation, D.J.L. and F.A.; Methodology, D.J.L., F.A., S.R., S.S., J.W., and P.M.; Software, F.A., S.R., and S.S.; Formal analysis, F.A. and D.J.L.; Data curation, F.A., S.R., S.S., and D.J.L.; Writing, original draft, F.A.; Writing, review and editing, F.A., S.R., D.J.L., S.S., A.A., J.W., and P.M.; Visualisation, F.A. and A.A.; Supervision, D.J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Analysis code is publicly available at https://github.com/albertfaiz/Multimodal_geo_fusion_FM (accessed 9 February 2026). IHME life expectancy data are available at https://ghdx.healthdata.org/. Processed county-level feature matrices will be deposited on Zenodo upon acceptance: https://doi.org/10.5281/zenodo.19229752.

Acknowledgments

The authors thank the Office of Information Technology Cyberinfrastructure Research Computing group at the University of Texas at Dallas for High-Performance Computing resources used in this study: https://utdallas.edu/oit/departments/circ/ (accessed 3 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Declaration of Generative AI in Scientific Writing

During the preparation of this work, generative AI tools were utilized solely to assist with technical code optimization and the generation of design assets for the visual abstract and data refinery figures. All scientific content, text, and data interpretations were authored entirely by the researchers.

Abbreviations

BSI Bare Soil Index
CAFO Concentrated Animal Feeding Operation
CDL Cropland Data Layer
CCI Climate Change Initiative (ESA programme)
CONUS Contiguous United States
CRVS Civil Registration and Vital Statistics
DEM Digital Elevation Model
DHS Demographic and Health Surveys
ESA European Space Agency
EVI Enhanced Vegetation Index
FAO Food and Agriculture Organization
FIPS Federal Information Processing Standards
GEE Google Earth Engine
GLCM Grey Level Co-occurrence Matrix
GLW Gridded Livestock of the World
GRD Ground Range Detected (Sentinel-1 product)
GSW Global Surface Water (JRC)
HPC High-Performance Computing
IHME Institute for Health Metrics and Evaluation
IW Interferometric Wide (Sentinel-1 mode)
JRC Joint Research Centre (European Commission)
LE Life Expectancy
LST Land Surface Temperature
LOWESS Locally Weighted Scatterplot Smoothing
MAE Mean Absolute Error
MAPE Mean Absolute Percentage Error
MODIS Moderate Resolution Imaging Spectroradiometer
NASS National Agricultural Statistics Service (USDA)
NCE Nighttime Cooling Efficiency
NDMI Normalised Difference Moisture Index
NDVI Normalised Difference Vegetation Index
NDWI Normalised Difference Water Index
NLCD National Land Cover Database
RF Random Forest
RMSE Root Mean Squared Error
SAR Synthetic Aperture Radar
SAVI Soil-Adjusted Vegetation Index
SDG Sustainable Development Goal (United Nations)
SHAP SHapley Additive exPlanations
SM Soil Moisture
SR Surface Reflectance
TVI Thermal Vegetation Index
UHI Urban Heat Island
USDA United States Department of Agriculture
USGS United States Geological Survey

Appendix A. Extended Methodology and Data Tables

Appendix A.1. Run-to-Run Reproducibility and Preprocessing Sensitivity

To verify that the cross-validated metrics reported in Section 3 were not artefacts of a particular execution, three sensitivity checks were performed during the model-development phase. None of these results are reported as headline findings; they are recorded here as integrity checks supporting the reproducibility of the production pipeline.
Cross-node reproducibility. The full pipeline (data extraction, preprocessing, GroupKFold splitting, Random Forest fitting, and TreeSHAP attribution) was executed on two independent nodes of the University of Texas at Dallas Juno High-Performance Computing cluster [22]. The cross-validated R 2 , MAE, and RMSE values from the two runs were identical to three decimal places ( Δ R 2 < 0.001 , Δ MAE  < 0.005  yr), confirming determinism under the fixed random_state  = 42 setting.
Livestock interpolation sensitivity. Snapshot years of the FAO Gridded Livestock product (2010, 2015, 2020) were linearly interpolated to annual values in the production pipeline (Section 2.2.10). Substituting nearest-neighbour interpolation for linear interpolation produced a difference in cross-validated R 2 of Δ R 2 < 0.003 , indicating that the predictive signal is robust to the choice of inter-epoch interpolation scheme.
Preprocessing-leakage sensitivity. Global median imputation and winsorisation were applied prior to cross-validation in the production pipeline (Section 2.5). Repeating the analysis with strict per-fold preprocessing (intra-county backward-fill and forward-fill, plus training-fold-only imputation inside the GroupKFold loop) produced a difference in cross-validated R 2 of Δ R 2 = + 0.013 and a difference in MAE of Δ MAE  = 0.01  yr. Because GroupKFold places all 20 years of any county exclusively in either the training or the test partition, the global approach introduces negligible leakage in practice; the per-fold variant was nonetheless verified.

Appendix A.2. Livestock Feature Derivation

Table A1. Derivation of FAO gridded livestock features. Primary extensive metrics (raw cell counts and aggregate sums) were programmatically excluded to prevent area-bias prior to model fitting. Calculated head counts were retained in the feature space but were organically deprioritised by the Random Forest algorithm during training.
Table A1. Derivation of FAO gridded livestock features. Primary extensive metrics (raw cell counts and aggregate sums) were programmatically excluded to prevent area-bias prior to model fitting. Calculated head counts were retained in the feature space but were organically deprioritised by the Random Forest algorithm during training.
Variable Plain Explanation Technical Definition Unit
Mean Density Average number of animals per km2 in a county. mean_density c , s , y = 1 N c i = 1 N c d i , s , y where d i , s , y is density in pixel i, species s, year y, and N c is the number of pixels in county c. heads/km2
Weighted Mean Density Density adjusted for true pixel size, accounting for latitude projection distortions. wm c , s , y = i d i , s , y · A i i A i where A i is pixel area (km2). heads/km2
Standard Deviation Spread of livestock density across pixels within the county. σ c , s , y = 1 N c i ( d i , s , y wm c , s , y ) 2 . heads/km2
Standard Error Uncertainty of the mean density. S E c , s , y = σ c , s , y N c . heads/km2
Growth Rate (Density) Percentage increase or decrease in density across years. g w m ( t ) = wm c , s , t wm c , s , t 1 | wm c , s , t 1 | × 100 . %/year
Usable Area Percent Share of county land deemed suitable for livestock, excluding water, barren, and high-development pixels. % usable c , s = usable_area c , s total_area c × 100 . %
Total / Average Weighted Mean Aggregate livestock density summaries across all tracked species. s S wm c , s , y ,     1 | S | s S wm c , s , y . heads/km2
Head Count Estimated total absolute number of animals in the county. head_count c , s , y = wm c , s , y × usable_area c . heads
Sources: IHME life expectancy [1]; FAO GLW (2010 GLW3; 2015/2020 GLW4) [29]; US Census TIGER/Line; USGS MRLC NLCD 2019. Species set S includes cattle, pig, chicken, duck, sheep, goat, horse, and buffalo (subject to GLW regional coverage).

Appendix A.3. Engineered Cross-Modal Features

Table A2. Engineered cross-modal features (25 variables) used in the production model.
Table A2. Engineered cross-modal features (25 variables) used in the production model.
S.N. Variable Unit
1 Thermal Vegetation Index (TVI), night LST × (1 − NDVI) unitless
2 Diurnal Temperature Range, daytime minus nighttime LST mean K
3 Nighttime Cooling Efficiency (NCE), ( P 90 , day P 10 , night ) / ( P 90 , day + ε ) unitless
4 NDVI–LST Divergence, NDVI SD / daytime LST SD unitless
5 Agricultural Greenness Ratio, NDVI mean / (corn + soybean pct) unitless
6 Livestock Heat Exposure, cattle sum × night LST mean animals·K km−2
7 Impervious Surface Heat Index, (dev med + dev high pct) × night LST unitless
8 Forest Heat Buffer, deciduous forest pct × max(0, day LST − 20) unitless
9 Soil Moisture Deficit, max(0, 6.0 − soil mean) m3m−3
10 Soil Moisture Excess, max(0, soil mean − 8.5) m3m−3
11 Wetland Flood Risk Index, (woody + herbaceous wetland pct) × soil mean unitless
12 SAR–NDVI Structural Index, S1 VV mean / NDVI mean unitless
13 Water Permanence Index, permanent water / (permanent + seasonal water) unitless
14 Elevation Thermal Modifier, night LST − (DEM mean × 0.0065) K
15 Topographic Roughness × LST, DEM SD × daytime LST SD unitless
16 Livestock Species Diversity, Shannon entropy across 8 species Shannon
17 Crop Diversity Index, Shannon entropy across USDA crop classes Shannon
18 Seasonal NDVI Amplitude, NDVI P90 − NDVI P10 unitless
19 Cross-Sensor NDVI Consistency, Landsat NDVI mean − S2 NDVI mean unitless
20 Wet Bulb Temperature Proxy, day LST × (1 + NDMI mean) K
21 Blue–Green Proximity Index, forest + permanent water + dev open pct unitless
22 High-Intensity Urban Stressor, dev high pct / (dev low pct + 1) unitless
23 Industrial Agriculture Proxy, (corn + soybean pct) / (other crops + sweet corn + 1) unitless
24 Livestock Pollution Load, (pig + chicken + cattle sum) / usable area head km−2
25 Landscape Complexity, 1 / (USDA stdDev + ε ) unitless

Appendix A.4. Hyperparameter Optimisation, Alternative Learners, and Ablation Results

Hyperparameters for both the production Random Forest and the per-modality baselines were optimised by coarse grid search on the training folds only, using the GroupKFold partitioning described in Section 2.6. The final selected settings for the combined production model are n_estimators  = 2000 , max_depth  = 40 , min_samples_split  = 3 , min_samples_leaf  = 2 , max_samples  = 0.9 , max_features  = 0.3 , and random_state  = 42 . The full per-modality grid-search results are tabulated in Table A3.
An XGBoost learner [38] was also evaluated as an alternative; it produced a marginally higher test-set R 2 ( + 0.008 , within the fold-level standard deviation of ± 0.013 ). The Random Forest model was retained because (i) the XGBoost gain was within fold-level standard deviation; and (ii) Random Forest provides robust uncertainty calibration through bootstrap aggregation. Neither model exceeded R 2 = 0.64 , which indicates that the performance ceiling reflects the satellite feature set rather than the specific learner. The ablation comparison of single-modality baselines against the multimodal fusion model is given in Table A4.
Table A3. Per-modality Random Forest hyperparameters selected by grid search. random_state  = 42 throughout.
Table A3. Per-modality Random Forest hyperparameters selected by grid search. random_state  = 42 throughout.
Modality Features n_est. max_depth max_feat.
USDA CDL 135 3,000 10 0.8
Landsat 84 3,000 8 0.8
Sentinel-2 84 2,500 8 0.8
Sentinel-1 SAR 42 2,500 10 0.8
MODIS LST 14 2,500 10 0.8
MODIS NDVI/EVI 14 2,500 10 0.8
FAO Livestock 32 1,000 30 0.5
DEM 7 1,500 8 0.8
Soil Moisture 7 1,500 8 0.8
JRC Water 6 1,500 8 0.8
Combined (raw) 450 2,000 40 0.3
Combined (pruned, production) 193 2,000 40 0.3
Table A4. Ablation comparison of single-modality models against the multimodal fusion model. Values are means ± standard deviations from 5-fold county-grouped cross-validation. Modalities are ordered by R 2 .
Table A4. Ablation comparison of single-modality models against the multimodal fusion model. Values are means ± standard deviations from 5-fold county-grouped cross-validation. Modalities are ordered by R 2 .
Modality Features R 2  (mean ± SD) MAE (yr) RMSE (yr)
Combined (production) 193 0 . 631 ± 0 . 013 1 . 08 ± 0 . 02 1.48
Engineered 25 0.581 ± 0.015 1.16 ± 0.02 1.58
MODIS LST 14 0.442 ± 0.021 1.36 ± 0.04 1.83
USDA CDL 135 0.433 ± 0.011 1.38 ± 0.01 1.84
Livestock 42 0.396 ± 0.012 1.44 ± 0.03 1.90
NDVI/EVI 14 0.350 ± 0.029 1.49 ± 0.02 1.97
DEM 7 0.207 ± 0.014 1.68 ± 0.04 2.18
Landsat 84 0.173 ± 0.011 1.74 ± 0.03 2.22
Sentinel-1 42 0.111 ± 0.006 1.82 ± 0.03 2.30
Soil 7 0.111 ± 0.043 1.80 ± 0.06 2.30
Sentinel-2 84 0.079 ± 0.004 1.85 ± 0.04 2.35
JRC Water 6 0.066 ± 0.009 1.88 ± 0.04 2.36

References

  1. Dwyer-Lindgren, L.; Bertozzi-Villa, A.; Stubbs, R.W.; Morozoff, C.; Mackenbach, J.P.; van Lenthe, F.J.; Mokdad, A.H.; Murray, C.J.L. Inequalities in life expectancy among US counties, 1980 to 2014: temporal trends and key drivers. JAMA Intern. Med. 2017, 177, 1003–1011. [Google Scholar] [CrossRef] [PubMed]
  2. Murray, C.J.L.; Lopez, A.D. Measuring the global burden of disease. N. Engl. J. Med. 2013, 369, 448–457. [Google Scholar] [CrossRef]
  3. Watts, N.; Amann, M.; Arnell, N.; et al. The 2021 report of the Lancet Countdown on health and climate change: code red for a healthy future. The Lancet 2021, 398, 1619–1662. [Google Scholar] [CrossRef]
  4. Ebi, K.L.; Capon, A.; Berry, P.; et al. Hot weather and heat extremes: health risks. The Lancet 2021, 398, 698–708. [Google Scholar] [CrossRef]
  5. Foley, J.A.; Ramankutty, N.; Brauman, K.A.; et al. Solutions for a cultivated planet. Nature 2011, 478, 337–342. [Google Scholar] [CrossRef]
  6. Seto, K.C.; Güneralp, B.; Hutyra, L.R. Global forecasts of urban expansion to 2030 and direct impacts on biodiversity and carbon pools. Proc. Natl. Acad. Sci. 2012, 109, 16083–16088. [Google Scholar] [CrossRef]
  7. United Nations. Sustainable Development Goal 3: Ensure healthy lives and promote well-being for all at all ages. Target 3.9: substantially reduce the number of deaths and illnesses from hazardous chemicals and air, water and soil pollution and contamination by 2030 Indicator 3.9.1: Mortality rate attributed to household and ambient air pollution. Resolution 70/1 of the United Nations General Assembly, 2015. [Google Scholar]
  8. James, P.; Banay, R.F.; Hart, J.E.; Laden, F. A review of the health benefits of greenness. Curr. Epidemiol. Rep. 2015, 2, 131–142. [Google Scholar] [CrossRef] [PubMed]
  9. Gascon, M.; Triguero-Mas, M.; Martínez, D.; et al. Mental health benefits of long-term exposure to residential green and blue spaces: a systematic review. Int. J. Environ. Res. Public Health 2015, 12, 4354–4379. [Google Scholar] [CrossRef]
  10. Gasparrini, A.; Guo, Y.; Hashizume, M.; et al. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. The Lancet 2015, 386, 369–375. [Google Scholar] [CrossRef] [PubMed]
  11. Wing, S.; Horton, R.A.; Marshall, S.W.; et al. Air pollution and odor in communities near industrial swine operations. Environ. Health Perspect. 2008, 116, 1362–1368. [Google Scholar] [CrossRef]
  12. Mallin, M.A. Industrial animal production—a rapidly growing threat to health and the environment. Environ. Health Perspect. 2000, 108, A528–A531. [Google Scholar] [CrossRef]
  13. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  14. Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef]
  15. Roberts, D.R.; Bahn, V.; Ciuti, S.; et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]
  16. Lary, D.J.; Alavi, A.H.; Gandomi, A.H.; Walker, A.L. Machine learning in geosciences and remote sensing. Geosci. Front. 2016, 7, 3–10. [Google Scholar] [CrossRef]
  17. Shrestha, S.; Lary, D.J.; Ruwali, S.; Ahmad, F. The External Exposome and Life Expectancy: Formaldehyde as a Leading Predictor in U.S. Counties. Air 2026, 4, 10. [Google Scholar] [CrossRef]
  18. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  19. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  20. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) 2017, Vol. 30, 4765–4774. [Google Scholar]
  21. Lundberg, S.M.; Erion, G.; Chen, H.; et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
  22. Office of Information Technology–Cyberinfrastructure Research Computing. Juno High Performance Computing Cluster. The University of Texas at Dallas Supported in part by the TRECIS CC* Cyberteam under NSF grant No. 2019135. 2024. [Google Scholar]
  23. Wan, Z.; Hook, S.; Hulley, G. MOD11A2 MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1km SIN Grid V006; Technical report; NASA EOSDIS Land Processes DAAC, 2015. [Google Scholar] [CrossRef]
  24. Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
  25. Boryan, C.; Yang, Z.; Mueller, R.; Craig, M. Monitoring US agriculture: the US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program. Geocarto Int. 2011, 26, 341–358. [Google Scholar] [CrossRef]
  26. Pekel, J.F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
  27. European Space Agency. Copernicus DEM: Copernicus Digital Elevation Model Product Handbook; Technical report; Airbus Defence and Space, 2021. [Google Scholar]
  28. Gruber, A.; Scanlon, T.; van der Schalie, R.; Wagner, W.; Dorigo, W. Evolution of the ESA CCI Soil Moisture climate data records and their underlying merging methodology. Earth Syst. Sci. Data 2019, 11, 717–739. [Google Scholar] [CrossRef]
  29. Gilbert, M.; Nicolas, G.; Cinardi, G.; et al. Global distribution data for cattle, buffaloes, horses, sheep, goats, pigs, chickens and ducks in 2010. Sci. Data 2018, 5, 180227. [Google Scholar] [CrossRef] [PubMed]
  30. Mann, H.B. Nonparametric tests against trend. Econometrica 1945, 13, 245–259. [Google Scholar] [CrossRef]
  31. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  32. Case, A.; Deaton, A. Deaths of Despair and the Future of Capitalism; Princeton University Press: Princeton, NJ, 2020. [Google Scholar]
  33. Obradovich, N.; Migliorini, R.; Mednick, S.C.; Fowler, J.H. Nighttime temperature and human sleep loss in a changing climate. Sci. Adv. 2017, 3, e1601555. [Google Scholar] [CrossRef]
  34. Heaviside, C.; Macintyre, H.; Vardoulakis, S. The urban heat island: implications for health in a changing environment. Curr. Environ. Health Rep. 2017, 4, 296–305. [Google Scholar] [CrossRef] [PubMed]
  35. Jones, D.S. The persistence of American Indian health disparities. Am. J. Public Health 2006, 96, 2122–2134. [Google Scholar] [CrossRef] [PubMed]
  36. Schwartz, S. The fallacy of the ecological fallacy: the potential misuse of a concept and the consequences. Am. J. Public Health 1994, 84, 819–824. [Google Scholar] [CrossRef] [PubMed]
  37. Ruwali, S.; Lary, D.J.; Shrestha, S.; Ahmad, F.; Aker, A.; Waczak, J. Multi-Modal Sensing for Estimating U.S. Life Expectancy: Identifying Environmental Determinants and Disparities (2003-2019). In SciProfiles; Preprint, 2026. [Google Scholar]
  38. Chen, T.; Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Figure 1. Schematic of the four-stage multimodal processing and machine learning workflow. Stage 1 performs distributed acquisition and cloud masking of Earth observation imagery on the Google Earth Engine platform. Stage 2 reduces pixel-level rasters to county-year summary statistics and constructs cross-modal engineered features. Stage 3 partitions the dataset using GroupKFold ( K = 5 ) keyed on Federal Information Processing Standards (FIPS) county codes, so that all years of any given county appear exclusively in either training or test sets. Stage 4 fits a Random Forest regressor and computes TreeSHAP attributions on the trained model.
Figure 1. Schematic of the four-stage multimodal processing and machine learning workflow. Stage 1 performs distributed acquisition and cloud masking of Earth observation imagery on the Google Earth Engine platform. Stage 2 reduces pixel-level rasters to county-year summary statistics and constructs cross-modal engineered features. Stage 3 partitions the dataset using GroupKFold ( K = 5 ) keyed on Federal Information Processing Standards (FIPS) county codes, so that all years of any given county appear exclusively in either training or test sets. Stage 4 fits a Random Forest regressor and computes TreeSHAP attributions on the trained model.
Preprints 214967 g001
Figure 2. Spatial prediction fidelity for year 2000. Panels compare observed and predicted spatial life expectancy patterns and associated residual structure.
Figure 2. Spatial prediction fidelity for year 2000. Panels compare observed and predicted spatial life expectancy patterns and associated residual structure.
Preprints 214967 g002
Figure 3. Spatial accuracy and bias direction. Top: county-level mean absolute error across the contiguous United States. Teal stars mark the five best-predicted counties; orange circles mark the five worst-predicted counties, which are concentrated in data-sparse Indigenous reservation counties. Bottom: county-level mean residual (observed minus predicted), showing low aggregate geographic bias and low spatial autocorrelation.
Figure 3. Spatial accuracy and bias direction. Top: county-level mean absolute error across the contiguous United States. Teal stars mark the five best-predicted counties; orange circles mark the five worst-predicted counties, which are concentrated in data-sparse Indigenous reservation counties. Bottom: county-level mean residual (observed minus predicted), showing low aggregate geographic bias and low spatial autocorrelation.
Preprints 214967 g003
Figure 4. Year-stratified performance metrics over the period 2000–2019. Trajectories of the coefficient of determination R 2 (blue), the mean absolute error MAE (orange), and the root mean squared error RMSE (purple) are shown on dual axes. Annotated linear trend slopes ( β ) are reported with their associated p-values.
Figure 4. Year-stratified performance metrics over the period 2000–2019. Trajectories of the coefficient of determination R 2 (blue), the mean absolute error MAE (orange), and the root mean squared error RMSE (purple) are shown on dual axes. Annotated linear trend slopes ( β ) are reported with their associated p-values.
Preprints 214967 g004
Figure 5. Ablation study comparing single-modality and multimodal models. Panel (A) shows the coefficient of determination R 2 for each model, with the IHME census reference shown as a dashed line. Panel (B) shows the corresponding mean absolute error. Error bars indicate the standard deviation across five GroupKFold folds.
Figure 5. Ablation study comparing single-modality and multimodal models. Panel (A) shows the coefficient of determination R 2 for each model, with the IHME census reference shown as a dashed line. Panel (B) shows the corresponding mean absolute error. Error bars indicate the standard deviation across five GroupKFold folds.
Preprints 214967 g005
Figure 6. Global feature importance hierarchy. The SHAP beeswarm plot displays the top 20 environmental predictors based on n = 5 , 000 stratified county-year observations sampled across all 20 years. Features are ranked by mean | SHAP | . Each point represents one county-year observation. The horizontal position of each point encodes the SHAP value, that is, the contribution of that feature, in years of LE, for that county-year; the colour of each point encodes the underlying feature value, scaled from low (blue) to high (red). The central black vertical line marks zero SHAP. Points to the right of the line indicate that the feature increased the model’s predicted LE for that county-year, and points to the left indicate that it decreased the predicted LE. Reading the colour and side together gives the directional effect of each feature: blue points to the right mean that low feature values increased predicted LE; blue points to the left mean that low feature values decreased predicted LE; red points to the right mean that high feature values increased predicted LE; and red points to the left mean that high feature values decreased predicted LE.
Figure 6. Global feature importance hierarchy. The SHAP beeswarm plot displays the top 20 environmental predictors based on n = 5 , 000 stratified county-year observations sampled across all 20 years. Features are ranked by mean | SHAP | . Each point represents one county-year observation. The horizontal position of each point encodes the SHAP value, that is, the contribution of that feature, in years of LE, for that county-year; the colour of each point encodes the underlying feature value, scaled from low (blue) to high (red). The central black vertical line marks zero SHAP. Points to the right of the line indicate that the feature increased the model’s predicted LE for that county-year, and points to the left indicate that it decreased the predicted LE. Reading the colour and side together gives the directional effect of each feature: blue points to the right mean that low feature values increased predicted LE; blue points to the left mean that low feature values decreased predicted LE; red points to the right mean that high feature values increased predicted LE; and red points to the left mean that high feature values decreased predicted LE.
Preprints 214967 g006
Figure 7. The nocturnal thermal pattern. Panel (A) ranks all 14 MODIS land surface temperature features by mean | SHAP | , showing complete stratification between nighttime (dark navy) and daytime (orange) variables. Panel (B) shows the cumulative SHAP attribution as features are added in rank order, separated by channel.
Figure 7. The nocturnal thermal pattern. Panel (A) ranks all 14 MODIS land surface temperature features by mean | SHAP | , showing complete stratification between nighttime (dark navy) and daytime (orange) variables. Panel (B) shows the cumulative SHAP attribution as features are added in rank order, separated by channel.
Preprints 214967 g007
Figure 8. Threshold structure of the leading predictors. The left column (panels A, C, E) shows raw bivariate density between each feature and observed LE. The right column (panels B, D, F) shows SHAP-attributed partial dependence within the fitted model. Locally Weighted Scatterplot Smoothing (LOWESS) trends overlay the SHAP scatter to isolate the model’s threshold structure.
Figure 8. Threshold structure of the leading predictors. The left column (panels A, C, E) shows raw bivariate density between each feature and observed LE. The right column (panels B, D, F) shows SHAP-attributed partial dependence within the fitted model. Locally Weighted Scatterplot Smoothing (LOWESS) trends overlay the SHAP scatter to isolate the model’s threshold structure.
Preprints 214967 g008
Figure 9. Nonlinear agricultural and development thresholds. The top row (panels A, B, C) shows SHAP dependence as a function of each feature value. The bottom row (panels D, E, F) shows the corresponding first derivatives, used to locate inflection points. The curves show transitions from initial positive association to saturation, and to a negative association in the case of cattle density.
Figure 9. Nonlinear agricultural and development thresholds. The top row (panels A, B, C) shows SHAP dependence as a function of each feature value. The bottom row (panels D, E, F) shows the corresponding first derivatives, used to locate inflection points. The curves show transitions from initial positive association to saturation, and to a negative association in the case of cattle density.
Preprints 214967 g009
Figure 10. Deciduous canopies and heat-associated attenuation. Panel (A) shows the SHAP response to daytime LST, stratified by local deciduous forest cover density. Panel (B) shows the absolute attenuation gap, by temperature quartile, between the lowest-forest and highest-forest county strata.
Figure 10. Deciduous canopies and heat-associated attenuation. Panel (A) shows the SHAP response to daytime LST, stratified by local deciduous forest cover density. Panel (B) shows the absolute attenuation gap, by temperature quartile, between the lowest-forest and highest-forest county strata.
Preprints 214967 g010
Table 1. Multimodal data sources, resolutions, and raw feature counts. Satellite radiometry was processed via the Google Earth Engine platform; gridded livestock and mortality records were processed via local pipelines. Per county-year, summary statistics (mean, SD, P10, P25, P50, P75, P90) were extracted for continuous rasters. Abbreviations: SR, surface reflectance; GLCM, grey-level co-occurrence matrix; SM, soil moisture; SAR, synthetic aperture radar; SDG, Sustainable Development Goal. The 450 raw features were reduced to 193 for the production model after importance-based feature pruning (Section 2.5).
Table 1. Multimodal data sources, resolutions, and raw feature counts. Satellite radiometry was processed via the Google Earth Engine platform; gridded livestock and mortality records were processed via local pipelines. Per county-year, summary statistics (mean, SD, P10, P25, P50, P75, P90) were extracted for continuous rasters. Abbreviations: SR, surface reflectance; GLCM, grey-level co-occurrence matrix; SM, soil moisture; SAR, synthetic aperture radar; SDG, Sustainable Development Goal. The 450 raw features were reduced to 193 for the production model after importance-based feature pruning (Section 2.5).
Source Product Spatial Temporal Features
MODIS Terra MOD11A2 LST day and night 1 km 8-day 14
MODIS Terra MOD13A1 NDVI and EVI 500 m 16-day 14
Landsat 8/9 Collection 2 Tier 1 SR (6 bands + 6 indices) 30 m 16-day 84
Sentinel-2 Level-2A SR (6 bands + 6 indices) 10–20 m 5-day 84
Sentinel-1 GRD VV, VH + 4 GLCM textures 10 m 6-day 42
USDA NASS Cropland Data Layer (>130 classes) 30 m Annual 135
JRC Global Surface Water v1.4 30 m Annual 6
Copernicus GLO-30 DEM 30 m Static 7
ESA CCI Combined SM (m3/m3 × 100) 0.25 Daily 7
FAO GLW3/GLW4 livestock (8 species) ∼10 km 3 epochs 32
Engineered Cross-modal composites 25
Total raw features 450
After importance-based pruning (production model) 193
Table 2. Fold-level cross-validation performance for the 193-feature production model under 5-fold county-grouped cross-validation. All 20 years of each county appear exclusively in either the training set or the test set. Two independent High-Performance Computing runs produced fold-level R 2 values within Δ R 2 < 0.001 .
Table 2. Fold-level cross-validation performance for the 193-feature production model under 5-fold county-grouped cross-validation. All 20 years of each county appear exclusively in either the training set or the test set. Two independent High-Performance Computing runs produced fold-level R 2 values within Δ R 2 < 0.001 .
Fold R 2 MAE (yr) RMSE (yr) MAPE (%)
1 0.624 1.081 1.514 1.41
2 0.618 1.101 1.500 1.43
3 0.632 1.070 1.464 1.39
4 0.654 1.045 1.397 1.36
5 0.626 1.091 1.550 1.42
Mean ± SD 0 . 631 ± 0 . 013 1 . 08 ± 0 . 02 1 . 48 ± 0 . 04 1 . 40 ± 0 . 03
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated