A Re-evaluation of Climate Sensitivity

Equilibrium climate sensitivity (ECS) is the change in global mean temperature expected to result from doubling atmospheric CO2 concentration from pre-industrial levels. Extensive research during the past 40 years has not reduced the uncertainty associated with ECS. Sherwood et al. [1] applied Bayesian statistics to evidence from climate-process physics, historical observations and earlier proxies to reduce the range of ECS from 1.5 – 4.5 K to 2.6 – 4.1 K. This paper examines their methods and many of the assumptions they made. It also evaluates two additional periods in the Holocene to show that factors other than CO2 drove recent climate change. It identifies potential systematic errors resulting from adding non-equilibrium short-term adjustments to the radiative forcing of greenhouse gases and from underestimating the effects of solar irradiance, ocean currents and aerosols. These factors have resulted in estimates of the forcing by CO2 that far exceed the apparent effects in paleoclimate data.


Introduction
Greenhouse gases (GHGs) such as water vapour and CO2 warm the Earth by absorbing infrared radiation that would otherwise be lost to space. There are concerns that the increasing concentration of CO2 may cause unacceptably high global temperatures. The equilibrium climate sensitivity (ECS) provides a benchmark prediction of the change in near-surface air temperature when equilibrium is reached after abruptly doubling the concentration of CO2 from the pre-industrial level. ECS can be estimated either by comparing the measured change in global mean temperature against the associated change in CO2 levels (if appropriate corrections are made for other factors), or by estimating the change in temperature from physics-based climate models. Davis [2] examined 68 climate transitions during the past 425 Ma and found no significant relationship between global temperature and CO2 in 75% and a positive relationship in only 10% of the transitions. Climate models have produced a wide range of positive results.
In 1979 the National Research Council [3] attempted to standardize how ECS should be determined from climate models and to simplify calculations they assumed that the ice sheets and vegetation did not change. They estimated that ECS would be within the range of 1.5-4.5 K. Significant improvements to climate models since then have been accompanied by further standardization through the Coupled Model Intercomparison Project (CMIP). The Intergovernmental Panel on Climate Change (IPCC) summarized estimates of ECS made up to 2012 in its Fifth Assessment Report (AR5) [4]. These estimates were based on CMIP5 and earlier climate models and other calculations using actual historic measurements and prehistoric proxy data.
AR5 gave a detailed description of the difficulties experienced when attempting to estimate ECS using different lines of evidence [4]. It concluded that ECS is extremely unlikely to be less than 1° K and medium confidence that ECS is likely to be between 1.5° K and 4.5° K and very unlikely greater than 6° K. Research since then has not improved the confidence limits of ECS. One reason for this is that studies based on actual measurements and palaeoclimate proxies suggest ECS is in the lower part of the 'likely' range while the climate models suggest it is in the upper part of that range [5]. This divergence seems to be increasing as the newer CMIP6 climate models are producing values of ECS that are in a range generally almost one degree higher than CMIP5 model estimates [6].
The World Climate Research Program commissioned Sherwood et al.
[1] to examine ways to reduce this uncertainty before the Sixth Assessment Report of the IPCC is published in 2022. Sherwood et al. [1 page 4] adopted effective climate sensitivity, S, as their reference scenario. In contrast to ECS which may take millennia to reach equilibrium in climate models, S is determined by extrapolation from a period of 150 years when equilibrium is unlikely to be achieved. As changes due to slower temperature responses will not have been completed, S should be smaller than ECS. This is discussed in section 2. They calculated sensitivity from three independent lines of evidence: theoretical knowledge of the climate process; historic data; and paleoclimate data. After conversion into probability density functions (PDFs), they used Bayesian statistics to combine these estimates. Their review is likely to have a major influence on future climate change policies.
This paper examines whether the range of ECS proposed by Sherwood et al.
[1] is more realistic than previous estimates which appear to contradict much of the empirical paleoclimate data. Firstly, this paper summarises how they calculated S from three separate lines of evidence and combined these estimates using Bayesian statistics. This paper provides an alternative calculation of their historic line of evidence to demonstrate how dependent such calculations are on the selected values of even relatively minor forcing agents. It also uses their method to calculate ECS for two additional intervals of time during the Holocene to show how dependent the results are to forcing agents other than GHGs and the changing energy imbalance during these intervals. The paper then discusses possible issues relating to the addition of 'rapid adjustments' to the forcing from CO2 (section 4.1), data selected by Sherwood et al.
[1] for their three lines of evidence (section 4.2), various aspects of uncertainty including their choice to evaluate S rather than ECS (section 4.3), and the assumptions related to use of the energy balance equation (sections 4.4 and 4.5).

Materials and Methods
This paper is based on the same sources of data and mathematical equations that were described by Sherwood et al.
[1] unless a different source is specified. Sherwood et al.
[1] acknowledged that expert knowledge was used to select data for their study and to convert the data into PDFs. Reasons for selecting different values for some variables are considered in the Discussion. The remainder of this section describes their methods.
Although they were commissioned to refine the range of ECS that was published in AR5, Sherwood et al.
[1] adopted 'effective climate sensitivity', S, as their reference scenario. That defines S as half the surface temperature change required to restore the energy balance at the top of the atmosphere (TOA) after abruptly quadrupling CO2. This is determined by extrapolating the temperature -energy-imbalance regression established during the first 150 years of climate model runs to a later energy balance. The reasons they gave for choosing to estimate S were: its 150 year timeframe was relevant to predictions of warming this century; it was easier to calculate and therefore readily available from many climate models; and there was little difference between results for S and ECS. These two measures would be the same if forcing remained constant during the whole period between the abrupt increase in CO2 content and the system reaching equilibrium and the forcing effect of CO2 is exactly logarithmic. Neither constraint is met and the short time frame is likely to exclude some effects of slow feedback responses in addition to changes in forcing due to changing ice sheets and vegetation that were excluded from ECS. Sherwood et al. [1] proposed a 6% correction to convert S to ECS, based on a study by Rugenstein et al. [7] of 27 simulations involving abrupt changes of 2x to 16x CO2. This correction was significantly less than the 17% difference actually found between the median long-term equilibrium temperature change (approximating ECS) and half the change obtained from the extrapolated 150-year regressions for 4x CO2 simulations (defined as S) from 15 climate models [7]. The lower value was chosen because the 4x CO2 long-term equilibrium estimates were 3 to 18% larger than twice the equilibrium responses to 2x CO2 (estimates of true ECS) from five models (and 94% larger in another) [7]. [1] assumed an energy balance at the TOA between the radiative response, R, from a warmer climate system (characterized by the forced change in surface temperature, T) and the change in the radiative forcing, F, that caused the warming. They also assumed that R was a linear function of T, expressed as  x T, where  was the climate feedback parameter. This is the inverse of the climate sensitivity parameter which is also represented by  in AR5 [4]. Sherwood et al. included error terms to address feedback dependence on global temperature and regional changes in warming (1 + ) + patterns. Table 1 lists the main equations they used to evaluate S. All are variants of the energy balance equation so there is an unstated reliance on equilibrium conditions which is in conflict with their stated aim to calculate S. The supporting equations were used to create the likelihood distributions. Bayesian calculations were justified by Sherwood et al.
[1] because S depends on several variables and is expected to be a fixed quantity but its magnitude is uncertain. Traditional frequentist statistics cannot determine this uncertainty (but could provide a 'best' estimate with a realistic confidence interval for the error term). The Bayesian approach evaluates a posterior PDF for a parameter that is associated with measurable random variables by multiplying the PDF of the data (the likelihood function) for given parameter size by a selected prior distribution for the parameter. This prior may either be noninformative (commonly a uniform prior) or it may be a subjective representation of the current knowledge of the parameter [8]. Sherwood et al.
[1] chose uniform priors. This process depends on independence of the prior and the measurements [9,10].

Three lines of evidence used by Sherwood et al. [1].
The first line of evidence that Sherwood et al.
[1] used to develop S was the current knowledge of the climate process. To obtain an estimate of the effective climate sensitivity (Sproc) from the process data they required values for the 'effective' radiative forcing that would be caused by doubling CO2 concentration in the atmosphere, ∆F2xCO2, and the total climate feedback, . They determined ∆F2xCO2 was 4 + 0.3 Wm -2 by adding 'rapid' adjustments for changes in the stratosphere (0.9 Wm -2 ), troposphere (0.0 Wm -2 ) and surface albedo (0.2 Wm -2 ) to the instantaneous radiative forcing (2.9 Wm -2 ) that can be calculated accurately from line-by-line radiative transfer models [1]. Although the tropospheric adjustment would have subtracted 0.7 Wm -2 , this was completely offset by additional forcing attributed to water vapour and clouds [1 Figure 3].
The feedback parameter, , was calculated as a linear sum of six components (the Planck effect, tropospheric water vapor and lapse rate, various cloud effects, surface albedo, stratospheric water vapour, and a term to cover all other atmospheric variations including changes in ozone, dust, salt and smoke concentration). Sherwood et al.
[1] discussed each component in detail, drawing attention to the complexity and uncertainties associated with cloud feedback. However, all the components have significant uncertainties and most of the PDFs for the feedback terms [1 Table 1] depended to some extent on global climate models (GCMs). The sum of values they chose for these terms resulted in a normally distributed PDF for  with a mean and standard deviation of -1.30 + 0.44 Wm -2 . Their Bayesian analysis using the energy balance equation and a uniform prior for each element of  placed Sproc in the range 2.3-4.6 K with a maximum likelihood value of 2.6 K and a median value of 3.1 K.
The second line of evidence estimated Shist by comparing observations from an historical base period (1861-1889) with the present (2006-2018). Sherwood et al.
[1] chose this base period because it had low CO2 levels, little volcanism and enough measurements to generate a reliable estimate of global mean temperature. They adopted the change in global temperature between these periods (T = 1.03 + 0.14 K) as proposed by Cowtan & Way [11] based on measurements of surface air temperatures (SAT) and adjusted measurements of sea surface temperatures (SST) derived from GCMs. This estimate was chosen because it was corrected for missing data. However, Sherwood et al.
[1] adopted a higher uncertainty than the original [11] because a wide range of estimates has been published. The change in energy imbalance (N = 0.6 + 0.3 Wm -2 ) was from model estimates of ocean heating. They assumed it was independent of F in their statistical calculations, but that is highly unlikely as the ocean heating is caused by the increased forcing. (e) -Etminan et al. [14] using GHG values from (b) and (f); (f) -Dlugokencky [15]; (g) -Vieira et al. [16]; (h) -Kobashi et al. [17].
A median value of F (1.83 Wm -2 ) was calculated by Sherwood et al. using Monte Carlo sampling from the effective radiative forcing (ERF) of GHGs (i.e. the instantaneous forcing plus 'rapid' adjustments), aerosols, ozone, land use albedo, volcanic activity, stratospheric water vapor, contrails, black carbon and solar irradiance [1 Table 4]. They indicated that these selected values produced a maximum likelihood value of approximately 2.25 K and a median value of 3.11 K [1 Figure 11(b) and Table 5]. This maximum likelihood value is inconsistent with the value of Shist shown in Table 2 as 3.59 K that was obtained by substituting their values into the energy balance equation shown in column 2 of Table  1 for the historic line of evidence. An estimate of 2.25 K would result if the change in energy imbalance was set at zero. Their initial Bayesian estimate of Shist produced a maximum likelihood value of 2.5 K which they combined with a uniform prior on S to produce a distribution with a median value of 4.3 K for Shist. In doing this they followed the common practice of choosing truncated uniform priors for their Bayesian analysis.
Annan and Hargraves [9] indicated that this has no scientific basis and may be inappropriate, especially when the functions are nonsymmetric or unbounded. They noted that this problem is commonly addressed by imposing an arbitrary upper bound which directly affects the median and other percentiles of the posterior distribution. They provided an example where assuming a uniform prior on S produced a posterior maximum likelihood value of S approximately 47% larger than the value produced by assuming a uniform prior on . They show a more complex prior could reduce the difference and produce estimates closer to the non-Bayesian calculation of S.
To account for anomalies in the historic temperature data, Sherwood et al.
[1] incorporated a correction to the feedback term to account for the pattern effect which they attributed to ocean warming and cloud cover that are not in equilibrium with global warming. Their preferred value was 0.5 Wm -2 K -1 which was near the extreme upper end of a wide range of estimates (and was in addition to the change in the TOA energy imbalance from 0.2 Wm -2 to 0.8 Wm -2 over the selected time interval). The resulting maximum likelihood value of Shist then increased to 3.8 K, and the 5-95% range based on a truncated uniform prior on  became 2.8 -18.6 K with a median value of 8.5 K.
The third line of evidence came from paleoclimate records that have been developed for the Last Glacial Maximum (LGM; 19 to 23 ka) and the mid-Pliocene Warm Period (mPWP; 3.3 to 3.0 Ma). These periods were selected because they were respectively several degrees cooler and warmer than the present and their temperatures were relatively stable over periods measured in millennia. Sherwood et al.
[1] also estimated S from the Paleocene-Eocene Thermal Maximum (PETM, about 56 Ma) but did not use the resulting lower value (2 K) in their determination of S.
To estimate SLGM, Sherwood et al.
[1] calculated back in time from their pre-industrial base period to the LGM. The direction of each change has been reversed in this paper to show the actual temporal changes from the LMG to the base period. This results in a temperature increase, T, of 5 + 2 K and forcing increase, F, of 8.43 + 2 Wm -2 . F was slightly larger than the sum of ERFs for CO2 (that increased from 190 to 284 ppm), CH4 (375 to 808 ppb) and N2O (200 to 273 ppb) and forcing related to changes in vegetation, dust and shrinking ice sheets (including sea level change) which were respectively 2.27, 0.57, 0.28, 1.1, 1.0 and 3.2 Wm -2 . This increase resulted from converting forcing by CO2 to a proportion of ∆F2xCO2 so that the appropriate uncertainty could be allocated.
The ERFs for the GHGs were the stratosphere-adjusted radiative forcings from Etminan et al. [14 Table 1] increased by 5% to account for rapid tropospheric and albedo adjustments. Sherwood et al.
[1] increased the ERF for CH4 by an additional 45% to account for effects on stratospheric water vapour and ozone. They followed Hegerl et al. [18] in setting the radiative forcing from shrinking ice sheets at 3.2 ± 0.7 Wm −2 , despite this being much smaller than more recent estimates (e.g. 3.6 to 5.2 Wm −2 by Braconnot & Kageyama [19]. Using the energy balance equation, Sherwood, et al.
[1] found climate sensitivity was 2.4 K. They regarded this as a quasi-equilibrium estimate, 'ECS', and reduced it by applying their 6% correction. They applied another correction for non-linear feedback responses. Their Bayesian analysis proposed that 2.5 K was the most likely value of SLGM. The choice of a low value for the change in forcing since the LGM has inflated the magnitude of SLGM by approximately 15%.
Sherwood et al.
[1] used a similar time-reversed process to estimate SmPWP. When converted to temporal trends, their change in temperature, T, becomes -3 + 1 K, which has a high level of uncertainty and F becomes -2.2 + 0.6 Wm -2 . F was based on CO2 decreasing from 375 to 284 ppm plus an additional 40% for effects of other GHGs. They added a further 50% to cover all other unmeasured forcing agents, such as changes in ice sheets, vegetation, orbital cycles and tectonic events. They concluded that the maximum likelihood estimate of SmPWP was 3.2 K. They discussed how it could be larger if CO2 concentration was lower or temperature change was higher but did not consider that it could have been smaller if factors other than CO2 contributed more of the forcing.
Their estimate of SPETM was based on less reliable knowledge of conditions from further in the past. The temperature rise of 5 + 2 K was based on SST proxies covering the subsequent fall in temperature during the Early Eocene. Sherwood et al.
[1] assumed the rise was caused by a rapid increase in CO2 concentration from 900 to 2400 ppm inferred from a spike in the  13 C ratio. They did not consider that the ratio may have changed because of a sudden influx of CH4 (that has >20 times the global warming potential of CO2) [4]. They calculated the ERF for CO2 and applied the same correction (40%) for other GHGs that they used for mPMP. Despite acknowledging that paleogeography, global temperatures and vegetation were probably different and identifying other uncertainties relating to the effects of surface warming, lags in ocean warming, clouds, aerosols and missing or varying feedback mechanisms, Sherwood et al.
[1] specifically evaluated none of these factors. The feedback parameter was assumed to be the same as the present but a climate state correction was added to the variance. After making a quasi-equilibrium adjustment, their maximum likelihood estimate of SPETM was 2 K.
Combining the three lines of evidence (but not SPETM) through Bayesian statistics, Sherwood et al.
[1] estimated the baseline PDF for S had a median value of 3.1 K and a 66% confidence limit range from 2.6 to 3.9 K. They assumed a uniform prior on S but noted that was only justifiable if feedback elements were small or negatively correlated. Annan & Hargreaves [9] analyzed this process and showed that choosing a uniform prior on S would significantly increase the median and upper bound of the estimated range of S.  [4]; solar data from Vieira et al. [16]; and volcanic forcing from Kobashi et al. [17]. The values chosen for the changes in solar radiation and volcanic forcing are seven and effectively three times larger than the values chosen by Sherwood et al. who did not identify the sources of their chosen values [1]. These plausible alternative values reduced the estimate of S by almost 30% and show how susceptible these calculations are to changes in relatively minor parameters.

Alternative estimates from paleoclimates
This section presents estimates of S from two alternative periods that are more recent than the paleoclimate examples chosen by Sherwood et al.
[1] and therefore are likely to have more realistic data and less likelihood of any unidentified modifying factors. One was a period of cooling from the end of the mid-Holocene Warm (mHW, about 7 ka) to the Little Ice Age (LIA, about 0.3 ka) and the other was a 150-year period of warming from the LIA to the pre-industrial base period [4]. During both periods, the concentration of CO2 in the atmosphere rose [4,20]. Thus, there is an apparent conflict during the earlier period between the empirical observations and the assumption that CO2 drives temperature change. Both estimates are based on the energy balance equation which is sensitive to the values chosen for changes in forcing and energy imbalance.
During the first period, the global temperature decrease is taken to be 0.75 K based on an estimated change of 0.55 K from the mHW to the mean global temperature for the 19 th century [12] and a rise from the LIA to the 19 th century mean of 0.2 K estimated from AR5 [4]. It should be noted that the change of 0.55 K is based on SST which underestimates the current global mean temperature variation by a factor of 3 [21] so this choice of 0.75 K could underestimate the actual change by >1.0 K.
The changes in concentration of GHGs are from ice core measurements reported in AR5 [4]. The GHG forcings were calculated using equations in Etminan et al. [14]. The sources for the changes in other forcings are indicated in Table 2 except for aerosols, for which an arbitrary change of 0.1 W/m 2 is shown. A positive value was selected for this period of decreasing global temperature as the effects of clouds (which are a major factor in determining the aerosol forcing) would be expected to be the opposite of that observed more recently when the temperature was rising [1]. The values of S obtained using this small positive value or zero are 6.45 K and 5.31 K respectively. These are considered to be unrealistically high. If the aerosol forcing was -0.35 W/m 2 or -1.0 W/m 2 , the estimates of S would be 3.28 K or 1.67 K respectively. Although these estimates of S are more realistic, it is difficult to explain why the change in forcing by aerosols would be negative.
If a small positive change in the forcing by aerosols of approximately 0.1 W/m 2 is considered appropriate, we can examine the sensitivity of S to the change in N. If this energy imbalance is predominately caused by the lagging ocean heat uptake (or loss), N should be negative when SST is falling and positive when SST is rising [1 page 40]. Therefore, immediately after the mHW, N should be negative while it should be positive after the LIA. The change from the start of this period to the end would be positive. An estimate of the magnitude should relate to the rate of change in SST. If the rate of cooling since the mHW was similar to the rate of warming during pre-industrial times, a value of -0.2 W/m 2 would be appropriate immediately after the mHW. However, this results in an extremely high value for S of 46 K. The 'base case' shown in Table 2 assumes N in the mHW was -0.6 W/m 2 and this results in S being 6.45 K. For this approach to produce a value of S near 3 K, N would have to decrease by 1.335 W/m 2 between the mHW and LIA, implying that N during the mHW was 1.535 W/m 2 . This also seems improbable.
A possible explanation is that one or more other factors could have had a negative effect on forcing between mHW and LIA. The most likely factor was the decreasing summer insolation in the northern hemisphere during this period which was related to precession of the Earth's orbit and decreasing obliquity of the axis [22]. For reasons discussed in Section 4.2, a change in orbital forcing of at least -0.5 W/m 2 is considered plausible for this period and would result in S being 3.11 K. A larger decrease in forcing would be required to fit a more realistic estimate of the change in energy imbalance.
The second alternative paleoclimate period evaluated in Table 2 Table 2]. Other changes in forcing which have been affected by industrial activities were improbable during this period and have been set at zero. Using assumptions similar to those made by Sherwood et al.
[1] for the historic period, SLIA is estimated to be 1.08 K. A more consistent, larger value for SLIA would eventuate if the GHG forcings were smaller; N increased, or a significant negative forcing effect has been omitted. An additional negative forcing of -0.5 W/m 2 would result in SLIA being 3.33 K. Some negative forcing may relate to orbital effects.
The calculations in this section are not considered to be accurate estimates of S. The section is intended to emphasise the uncertainty associated with this method of estimating future climate. This is discussed in section 4.3.

Discussion
Sherwood et al.
[1] acknowledged that their results were clearly influenced by the selection of data but they regarded this as an advantage because of the expertise of their team. This applies equally to the data they included and what they chose to exclude, and any other factors that have not been identified or thoroughly studied. It is unclear why this would prevent or minimize the unavoidable risk of bias. An obvious weakness of using S instead of ECS is that it excludes the effects of deep ocean processes as well as changes to the ice sheets. Other cyclic events in the oceans and atmospheric are also excluded, largely because of their short duration. To demonstrate transparency, there are more than 100 references to assumptions in Sherwood et al. [1]. Most were justified, but some were supported by statements such as 'this "stratospheric adjustment" is well-understood'.

Rapid adjustments
The stratospheric adjustment is critical to their study as Table 1 shows that their calculation of S from all the lines of evidence depends directly on the radiative forcing attributed to doubling CO2, i.e. ∆F2xCO2. By including the stratospheric adjustment and other rapid adjustments, Sherwood et al. [1] increased the instantaneous forcing of CO2 (2.9 Wm -2 ) by 38% to obtain an ERF of 4 Wm -2 . These rapid adjustments may be irrelevant after 150 years, or later when equilibrium is reached for the following reasons.
The rapid stratospheric adjustment accounts for reduced radiation lost to space from the stratosphere soon after CO2 concentration is doubled because of local cooling. Observations show that CO2 mixes vertically through the troposphere within several days but takes several months to mix between hemispheres and through the stratosphere. As CO2 increases, both absorption and radiation at all levels in the atmosphere increase. The net effect is that outgoing radiation with wavelengths within the CO2 absorption band appears to come from higher levels in the atmosphere. Near the centre of the band, the radiation appears to come from the stratosphere where the positive temperature gradient produces a further increase in net upward radiation. The result is that more energy is radiated from the stratosphere than it is absorbing, resulting in cooling. There are predictions that radiative balance in the stratosphere would be restored within 40 days [23] and the Brewer-Dobson circulation [24] would reduce any transient thermal anomalies within months.
The negative temperature gradient in the troposphere has the opposite effect. The warming in the troposphere would be more quickly redistributed by strong convective systems that would reestablish the lapse rate to conform with slightly warmer surface temperatures and any change in water vapour content. Near the new energy balance, most of the troposphere would have slightly raised temperatures, the tropopause would be higher and the stratosphere would have slightly lower temperatures, but the temperature gradients would be virtually the same as they originally were. The energy lost to space in the CO2 absorption band would appear to be radiating from higher altitudes than initially, resulting in less radiation loss from the troposphere across the wings of the absorption band and increased loss from the stratosphere from near the centre of the band. Water vapour will also affect the energy loss. The equilibrium radiation loss to space over all wavelengths must be equal to what it was before the CO2 increase. Therefore, there is reason to doubt that the rapid adjustments would be appropriate in the energy balance equation after several decades.
Sherwood et al.
[1] did not include adjustments for slower changes to surface temperature related to changes in vegetation and sea ice (except for mPWP calculations) despite these changes probably being relevant in all long-term equilibrium calculations [4]. Satellite measurements may eventually measure the actual changes in the infrared energy lost from the Earth, but early attempts have been frustrated by excessive instrumental noise for wavelengths >14 microns [25].

Data selection
In their first line of evidence, Sherwood et al. [1 Table 1] estimated  was -1.3 + 0.44 Wm -2 K -1 from six feedback components that they assumed were entirely independent. These components were Planck feedback, water vapour, lapse rate, surface albedo, clouds and other changes in atmospheric composition including ozone, aerosol-cloud interactions and stratospheric water vapor. Although Sherwood et al.
[1] did not thoroughly assess co-dependence between the various feedback processes, they concluded that major errors or omissions were unlikely because interannual variability in globally averaged TOA net radiation implied that  was in the range they derived from their data. However, that was inconsistent with their recognition that emergent constraints developed from the current climate system generally produced smaller values of  [1 Table 2] and that the two methods were not entirely independent. They disregarded the emergent constraints because they lack specific physical interpretations and there was no procedure to combine them with their data to further refine the PDF of S.
In their second line of evidence, Sherwood et al. [1 Table 4] selected -1.179 Wm -2 as the preferred value for aerosol forcing from a wide range of published estimates. This was almost double the estimate used in AR5 [4]. They selected 0.017 Wm -2 as the change in solar irradiance during this interval. This is approximately an order of magnitude less than the changes reported by Vieira et al. [16] or Lean [26]. This discrepancy is partially reduced because Sherwood et al.
[1] allocated only a quarter as much forcing by stratospheric ozone as would have been allocated for ozone feedback based on interactive stratospheric photochemistry [27]. Sherwood et al.
[1] acknowledged significant uncertainty regarding transient energy imbalances related to oceans achieving thermal equilibrium and the pattern effect in global warming. The PDF for Shist had a most likely value of 2.5 K and a median of 3.1 K for a non-Bayesian analysis or a median of 4.3 K for a Bayesian analysis. The median increased because a uniform prior was selected [9,28]. Under these assumptions the mode provides an unbiased estimate of S.
Sherwood et al.
[1] identified several unresolved issues. They suggested that Shist may underestimate S because radiative feedbacks became less negative as equilibrium was approached due to warming pattern effects related to surface temperature variations and cloud effects. They emphasized that neither the global energy budget approach nor fitted dynamical models provided a purely observational constraint on Shist. They attributed values of Shist in the range 1.6-2.1 K from atmosphere-only model simulations to unrealistic warming patterns. These lower estimates were disregarded despite agreeing well with several global energy budget calculations [1,4 page 883]. They applied sensitivity tests to assess the impact of using different aerosol forcing estimates; a different base period (1850-1900); and unadjusted SST and found no major differences. They considered that the observed 1 K of historical warming provided strong evidence of S >1.5 K. That inherently assumes that virtually all historic warming was caused by CO2.
The third line of evidence was based on less certain data describing feedback and state dependence which required additional adjustments. Sherwood et al.
[1] assumed the apparent stability during the LGM and mPWP indicated equilibrium conditions, despite large centennial to millennial temperature variations during both periods [4,29] and longer variations that correlated with orbital parameters [30]. Sherwood et al.
[1] disregarded global mean orbital effects in their paleoclimate assessments because they were approximately 0.1 Wm -2 (but noted they could reach 9 Wm -2 locally). Others have shown annual variations of nearly 80 Wm -2 at high latitudes [22,31].
Orbital cycles were dismissed by Sherwood et al.
[1 page 55], because the global insolation remained relatively constant. However, empirical evidence shows the global mean monthly temperatures strongly correlate with the northern hemisphere variation [32]. The reason for this is discussed in section 4.4. Changes in solar output were also ignored during these comparatively long periods despite changing almost 2 Wm -2 during the recent millennia [16,26]. By choosing a low value for the change in forcing since the LGM, Sherwood et al. [1] have inflated the magnitude of SLGM by at least 15%.

Dealing with uncertainty
Sherwood et al.
[1] considered the data from the Paleocene-Eocene Thermal Maximum were too uncertain to use in estimating S. Even for the more recent mPWP and LGM they had to make significant assumptions.
They acknowledged the increased uncertainty in changes since the mPWP, specifically mentioning GHG concentrations, ice sheets, vegetation and tectonic changes. They provided no justification for assuming that CO2 had dominated forcing in the mPWP. They determined the total GHG forcing (2.2 Wm -2 ) by adding 40% to the forcing calculated for CO2. This seems to be inadequate as these other gasses were calculated to have contributed 56% as much forcing as CO2 for Shist despite evidence of a large increase in CO2 over that period. Sherwood et al.
[1] increased forcing another 50% to cover unmeasured forcing. However, that was only one third the additional forcing used in their SLGM calculation because they assumed less change in ice sheets. They did not consider collisional tectonism that closed the Isthmus of Panama [33] and redirected a warm East-West equatorial current northward to form the Gulf Stream [22] or changes in solar irradiance as potentially affecting global temperatures during this period.
In calculating the forcing from the LGM to the pre-industrial period, Sherwood et al.
[1] added a 5% correction to the stratospheric-adjusted radiative forcing for the GHGs [14] to cover other rapid changes. As discussed above, this may not be appropriate. They also added 45% to the forcing calculated for CH4 increasing from 375 to 808 ppm to account for its effect on stratospheric water vapour and ozone. This 0.2 Wm -2 adjustment is significantly larger than the net 0.014 Wm -2 change in these variables they used to calculate Shist when CH4 increased from 808 to 1825 ppm. They added specific forcings for ice sheets, vegetation and dust but disregarded solar irradiance and localized orbital effects. Their estimate of SLGM is directly dependent on these estimates and the estimated temperature change. It could also be affected by any significant omissions.
The consistency of the results from three lines of evidence could have resulted from co-dependence or data choice. Sherwood et al.
[1] examined potential co-dependence in detail. They justified using GCMs in each line of evidence because they were used for different purposes: to constrain feedback for the process line; to assess the historical pattern effect; and to estimate paleo-forcing. They also claimed co-dependence would be mitigated by the multiple lines of evidence used to establish key feedback mechanisms as they have different relationships to CO2 concentration. Similar arguments were applied to the radiative transfer functions and cloud physics, whereas aerosol errors were thought to be buffered. Their statistical tests showed that their treatment of the warming pattern effects had not produced significant co-dependence in calculating Sproc and Shist.
The decision to focus on S rather than ECS potentially introduced unnecessary errors. The conversion from S to ECS assumes that two doublings of the CO2 concentration will produce twice the effect of one doubling. That would be the case if the relationship between CO2 concentration and forcing was logarithmic (as proposed in the radiative forcing equation used by IPCC [4,18]) but it is inconsistent with the equation proposed by Etminan et al. [14] which contains additional linear and quadratic terms that depend on the concentration of CO2. It also is inconsistent with results from climate models which produce 4x CO2 equilibrium temperature changes between 3 and 94% larger than twice the 2x CO2 estimates [7]. The logarithmic relationship can only be derived from absorption band calculations without assuming unrealistic absorption line arrangements [34] and do not appear to adequately account for increasing radiation from wavelengths near the centre of the major absorption band [34].

Energy balance equation
The energy balance equation is fundamental to all three lines of evidence [1]. In reality an energy equilibrium is unlikely to occur because the forcing agents are either continuously changing (orbital, solar and cloud effects) or are intermittent and unpredictable (volcanic activity and ocean current variability). Therefore, the magnitude and sign of the energy imbalance is of critical importance and difficult to quantify for pre-historic times.
Despite claiming that they have calculated S which does not depend on reaching an equilibrium, Sherwood et al.
[1] used variants of the energy balance equation in all of their calculations. No allowance for N appears to have been made in their paleoclimate estimates and it appears to have been omitted from their non-Bayesian calculation of Shist. All calculations shown in Table 2 include estimates of N which are discussed in section 4.5.
The energy balance equation also depends on a linear relationship between the radiative response and the surface temperature. That assumption is only appropriate for small changes in temperature of the order of one percent, because for larger changes the nonlinear relationships between temperature and several feedback components become significant, especially Planck radiation which depends on the 4th power of T [34] and water vapour which has an exponential relationship to T [22].

Energy imbalance in the Holocene
The two new calculations presented here, based on paleoclimate reconstructions in the Holocene, are heavily dependent on assumed values of the global energy imbalance and aerosol forcing. If the imbalance is largely caused by slow heating or cooling of the oceans [1], it would be expected to be positive when global surface temperatures are rising and negative when they are falling. However, the absolute magnitude and variability of the imbalance are poorly constrained. The imbalance for the LIA that appears in Table 2 is assumed to be the same as that used by Sherwood et al. for the pre-industrial period because the global heat content was rising at a similar rate in both periods [13]. More recently SST has risen more rapidly. This is consistent with the current imbalance shown in Table 2 which is four times the magnitude shown for LIA and pre-industrial estimates.
The empirical data show that from the mHW to the LIA, global mean temperature was negatively associated with CO2 concentration. This implies that some other factor had more impact on temperatures than CO2. This factor might have been one, or a combination, of solar irradiance, orbital effects, volcanic activity, aerosol concentration or some other factor [34].
The orbital cycles have a maximum effect on insolation at 65 o N because of the obliquity and precession of the Earth's axis [22] and a higher abundance of land at that latitude. Recent instrumental observations show there is significantly more seasonal variation in land temperature than in SST [21]. As there is more than twice the area of land in the northern hemisphere than in the southern hemisphere, that difference explains why the mean monthly temperatures in the northern hemisphere vary more than twice as much as those in the southern hemisphere and the global mean temperatures are significantly influenced by the northern hemisphere temperature [32].
This dominance of the northern hemisphere temperature variations on the global trend is consistent with the data of Bova et al. [31] that show a strong correlation between proxy ocean temperature measurements and northern hemisphere seasonal insolation as far south as 40 o S latitude. Their suggestion that temperature proxies were influenced by local solar irradiance variation associated with orbital cycles is not supported by the empirical evidence. They ignored the extensive sea surface temperature data [12] and Antarctic ice core data [35] and dismissed contradictory results as 'artifacts' or an indication that alkenone production had a different relationship with the seasons in the southern hemisphere. This empirical evidence does not support their proposed corrections to the proxy data that would remove the fall in global temperature during the Holocene that I have used to support a negative value for N at the mHW.

Conclusions
This paper addressed the discrepancy between climate model predictions and empirical data relating to the relationship between global temperature and CO2 concentration in the atmosphere. A major conclusion of this paper is that the subjective data selection and methodology used by Sherwood et al.
[1] have affected their results. By choosing to include rapid adjustments in the ERF of CO2, all of their calculated values for ECS were up to 38% above what would be expected in a long-term equilibrium. If these adjustments were removed their calculated 66% probability range for ECS would be reduced from 2.6 -4.1K to 1.9 -3.0 K. All calculations by Sherwood et al.
[1] could be further reduced by 10 to 20% (or even >50%) by choosing alternative relevant data.
The estimate of Sproc by Sherwood et al.
[1] depends on many assumptions regarding six specified feedback components and their independence. They indicated that the cloud feedback effects were not well known. It is considered unlikely that water vapour and cloud effects are completely independent. Their estimate of Shist is heavily dependent on the values selected for forcing by various factors. Sherwood et al.
[1] selected values that were less positive (or more negative) than values in recent credible research papers for solar activity, volcanic activity and aerosol forcing. Table 2 shows that an estimate of Shist based on this alternative data could potentially be28% smaller than their calculation. Similarly, the inclusion of forcing due to orbital effects would have potentially decreased all their estimates of S. They also ignored tectonic changes such as the closing of the Isthmus of Panama which affected ocean currents during the mPWP.
All the estimates of climate sensitivity discussed in this paper are based on the energy balance equation. This imposed significant uncertainty because many factors cannot be directly determined for pre-historic times and estimates depend on GCMs. Furthermore, there is no evidence supporting the assumption that unforced variability is zero during any period. These problems become more significant the further back in time the studies investigate. For example, in Table 2, the value of N was selected as -0.6 Wm -2 during the mHW on the assumption that it would be negative because global temperature was falling. It had to be less than -0.135 Wm -2 to ensure the estimate of SpHW was positive. Values of N between these values produce unrealistically large values for S if the other parameters are accurate. This suggests that some important factors, such as orbital forcing, have not been included in these calculations.
Other  36]. Their practice of increasing the variance of many variables did nothing to improve the precision of their estimates and contributed to the breadth of their predicted range of ECS. Further reductions in the uncertainty of ECS will require improved understanding of changes in solar irradiance, ice albedo, orbital effects, cloud cover and CO2 solubility in the oceans. Empirical evidence of little or even an inverse relationship between global temperature and CO2 levels was described by Davis [2]. Just as a positive association between variables does not prove causality, the absence of an association may have various explanations. Confounding or modifying relationship may exist. In particular, reverse causation is plausible as rising temperature reduces solubility of CO2 and will drive it out of the oceans [2]. The evidence that changes in CO2 levels lag behind temperature changes in the ice cores and other proxies is consistent with that possibility.
The example examined in this paper from the Holocene when the global temperature fell while CO2 levels were rising confirms that factors other than CO2, such as solar irradiance, orbital effects, aerosol effects and changing ocean circulation cycles may have been responsible for this and other observed inverse relationships. More research is needed to identify these factors and to establish their current effects and potential future effects so they can make necessary improvements to the climate models.