Preprint Review Version 1 Preserved in Portico This version is not peer-reviewed

A Systematic Review of the Basis for WHO's New Recommendation for Limiting Aircraft Noise Annoyance

Version 1 : Received: 6 November 2018 / Approved: 7 November 2018 / Online: 7 November 2018 (15:21:07 CET)

A peer-reviewed article of this Preprint also exists.

Gjestland, T. A Systematic Review of the Basis for WHO’s New Recommendation for Limiting Aircraft Noise Annoyance. Int. J. Environ. Res. Public Health 2018, 15, 2717. Gjestland, T. A Systematic Review of the Basis for WHO’s New Recommendation for Limiting Aircraft Noise Annoyance. Int. J. Environ. Res. Public Health 2018, 15, 2717.


The new WHO Environmental Noise Guidelines for the European Region have recommendations for limiting noise exposure associated with adverse health effects. The limits are said to be based on a systematic review of existing evidence. This paper gives a systematic assessment of the presented evidence with respect to aircraft noise annoyance and demonstrates that the new guidelines are based on an arbitrary selection of existing studies comprising an imperfect and faulty set of data not representative for the general airport population.


aircraft noise; annoyance; dose-response; environment; WHO Guidelines


Public Health and Healthcare, Health Policy and Services

Comments (3)

Comment 1
Received: 22 November 2018
The commenter has declared there is no conflict of interests.
Comment: Comments by Guski, Schuemer and Schreckenberg:

Truls Gjestland opens his critique by stating that the WHO recommendation on aircraft noise "is based on a very imperfect and faulty set of data" (p. 1). We agree that the data set is not perfect. Which data set is already perfect? But we disagree with the more general view that it is not only "very imperfect", but even " faulty".

We like to comment on three major points:

1. The selection of studies was not restricted to studies "conducted from 2000 and onwards" (critique p. 2), Instead, "eligible were published studies (2000–2014)" (Guski et al. 2017, abstract). We would have been glad to incorporate further studies, e.g., three Norwegian studies mentioned in the critique, the British 9 airports study (SoNA 2014; see Ipsos-Mori 2015), and our own 4 airports study in the NORAH project (Umwelthaus 2015; Guski & Schreckenberg 2015), or the SiRENE study (Brink et al. 2016), but all of these studies were published after 2014.

The exclusion of 3 studies from the aircraft noise data set used for the exposure/response curves is explained on p. 10: “For 12 of the 15 aircraft noise studies, ERF of the relation between Lden and modelled %HA were available”. That is, the authors of the 3 studies excluded did not provide exposure response equations when asked to do so. The Norwegian authors declared that they used the CTL method (letter from 2015/06/25), which cannot directly be transformed to equations comparable to ours. There is no need to raise suspicion of manipulation.

2. The HYENA study provided annoyance data from respondents aged 45-70 years. This is due to the main goal of HYENA, to look after cardiovascular effects of environmental noise. In view of the sometimes reported relation between age and annoyance, the inclusion of this study may be seen as a weakness. However, it should be noted that the relation between age and environmental annoyance is curvilinear – if any. That is, respondents about 45 years of age report somewhat higher annoyance as compared to younger or elder respondents. The paper by Van Gerven et al. (2009) shows drastic effects; however, other studies report rather low effects. For instance, the NORAH study report a very weak nonlinear effect of age (eta-square between 0.01 and 0.03, see Schreckenberg et al. 2015 figure A-20). Okokon et al 2015 found age effects on annoyance only in a special bivariate model (moderate to extreme annoyance -- not in high to extreme annoyance), which is interesting to see, because it may reflect age effects on scaling habits. However, we do not see why the exclusion of respondents younger than 45 years in the HYENA studies should qualify them to be excluded from a data analysis about the exposure/response relation between environmental noise and high annoyance. In the context of age, Gjestland maintains that our inclusion of the HYENA respondents violates our eligibility criterion “member of the general population”. He does certainly not mean that people between 45 and 70 years are no members of the general population, but the restriction to this age group may lead to results which are not comparable with the population aged 18 to 80 years. True, but where do severe restrictions due to age start, where do they end?

In addition, Gjestland criticizes our selection of the HYENA annoyance data because we equated the annoyance question related to the day with the ICBEN question, which does not specify any time of day. Of course, this decision may be questioned. However, we followed Wirth et al. (2006) as well as Babisch et al. (2009, p. 1174), assuming that “the overall annoyance (day+ night) is mostly determined by the annoyance during the daytime”.

We can’t completely follow the critique on selecting the Athens-Elephtherios sub-study of HYENA. It is true that Babisch et al. (2009, p. 1170) write that they selected people “who had been living for at least 5 years, near one of six major European airports”, and we reported this description in our Table 1 without further control. However, we did not explicitly use the 5-year period as a participant selection criterion.

The Milan-Malpensa sub-study of HYENA may be regarded as a HRC study, due to the increase of aircraft movements in 1998, i.e., 4-6 years before the survey (2003-2005). However, without a clear protocol which interview took place at which time, it is a question of guessing and weighing to decide whether the interview took place in a HRC situation or not. That’s why we refrained from a HRC/LRC decision on Malpensa. And we also refrained from speculations about the effects of reports about a severe aircraft crash in the neighborhood (or somewhere in the world?) 2-4 years before an annoyance survey.

Another point is our inclusion of these two HYENA sub-studies at all – in view of the fact that Babisch et al. (2009) excluded them from a pooled analysis. Babisch et al. tried to explain the higher level of aircraft noise annoyance at almost all aircraft noise levels in the Greek and Italian sub-studies by means of sensitivity analyses and found that the percentage of women decrease with increasing level at Athens airport, and that noise sensitivity scores increase with noise level at Milan airport. Both circumstances were seen as instances of selection bias – without indicating the criteria used to define bias. We did not like to follow such intuitive decisions.

3. Weighting studies is a common procedure in meta-analysis (Borenstein et al., 2009). The weighting scheme depends on our goals. For instance, if we want to estimate a reliable mean effect size from studies differing in effect size, we can put more weight to larger studies (including many participants) than to small ones. If we want to show the diversity and spread of effect sizes, we would rather refrain from weighting studies. If we want to estimate a common effect size with the highest precision and validity possible, we would weight studies according to study quality. It may be remembered that we compared annoyance effect size estimates between groups of low and high quality studies, whereby we used our own (self-made) quality criteria. At least one of us would have liked to go further and weight studies in the exposure-response analysis according to study quality. However, since there is no common list of study quality criteria shared by the noise effects community, we limited ourselves to the goal to estimate a reliable mean effect size, and weighted the studies according to the square root of the total sample size. It should be noted that this is a non-linear form of weighting. For instance, the large Amsterdam 2002 study (N = 5873) got a weight of 76.6, and the small Da Nang study (N = 528) got a weight of 23.0.

Gjestland (2018, p 5) declares weighting studies according sample size “a pitfall”. “This may make perfect sense when studying a one-dimensional problem. However, this is not the case when analyzing aircraft noise annoyance.” He goes on in the next paragraph: “The annoyance response to aircraft noise is governed not only by the cumulative noise level but also by several other factors, both acoustical and non-acoustical“. Of course, the last observation is true, but similar statements are true for any other effect of environmental noise, e.g., for cardiovascular effects, sleep quality, children’s learning and house prices, to name but a few. All of these noise effects depend on acoustic and non-acoustic factors. Maybe that the relative importance and the type and degree of the non-acoustic factors differ between airports, and the resulting exposure-annoyance relation of the largest studies should not be automatically transferred from the largest studies to all airport studies. This seems to be especially true for studies performed at different rates and degrees of change in the residential areas studied.

However, the acoustic factors – mainly continuous sound levels - are the major factors which residents are exposed to, which can be changed by the noise source, and which play a major role at court. If no weighting is given to studies, there is a risk that the uncertainty in estimating an exposure-response relationship may be unnecessarily increased. In this case, a small study based on short-term acoustic measurements and opportunity samples of participants gets the same weigh as a large-scale study with long-term acoustic data based on radar tracks and noise propagation models as well as on a clear stratification plan for selecting respondents, and an independent scientific quality control group.
If studies cannot be weighted according to quality (or precision) criteria, the next best “guess” would be weighting according to sample size, because larger samples usually imply more scientific staff and independent scientific quality control, as compared to small samples. Sample size weighting does not guarantee higher data quality, but the probability of better quality and higher precision is higher.

Babisch, W., Houthuijs, D., Pershagen, G., Cadum, E., Katsouyanni, K., Velonakis, M., Dudley, M.L., Marohn, H.D., Swart, W., Breugelmans, O., Bluhm, G., Selander, J., Vigna-Taglianti, F., Pisani, S., Haralabidis, A., Dimakopoulou, K., Zachos, I.. Jarup, L. – for the HYENA-team. (2009). Annoyance due to aircraft noise has increased over the years - results of the HYENA study. Environment International, 35, 1169-1176.
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. (2009). Introduction to Meta-Analysis. Chichester: Wiley.
Brink, M., Pieren, R., Foraster, M., Vienneau, D., Eze, I., Schaffner, E., Heritier, H., Cajochen, C., Probst-Hensch, N., Röösli, M., Wunderli, J.-M. (2016). Do short-term temporal variations of noise exposure explain variance of noise annoyance? Paper presented at the Inter-Noise 2016, Hamburg, Germany.
Gjestland, T. (2018). A systematic review of the basis for WHO's new recommendation for limiting aircraft noise annoyance. Preprints, posted 7th November (not peer-reviewed). doi:10.20944/preprints201811.0178.v1
Guski, R., & Schreckenberg, D. (Eds.). (2015). Transportation noise effects in the environs of the airport (Vol. 7). Kelsterbach: Gemeinnützige Umwelthaus GmbH.
Guski, R., Schreckenberg, D., & Schuemer, R. (2017). Review: WHO Environmental Noise Guidelines for the European Region: A Systematic Review on Environmental Noise and Annoyance. International Journal of Environmental Research and Public Health (IJERPH), 14(12), 1-41. doi:10.3390/ijerph14121539.
IPSOS-Mori. (2015). The 2014 Survey of Noise Attitudes (SoNA). Final Report. London, UK.
Okokon, E., Turunen, A., Ung-Lanki, S., Vartiainen, A.-K., Tiittanen, P., & Lanki, T. (2015). Road-Traffic Noise: Annoyance, Risk Perception, and Noise Sensitivity in the Finnish Adult Population. International Journal of Environmental Research and Public Health, 12(6), 5712. doi:10.3390/ijerph120605712
Schreckenberg, D., Faulbaum, F., Guski, R., Ninke, L., Peschel, C., Spilski, J., & Wothge, J. (2015). Wirkungen von Verkehrslärm auf die Belästigung und Lebensqualität. Kelsterbach, Germany: Umwelthaus gGmbH. Online:
Umwelthaus-gGmbH. (2015). NORAH Knowledge No. 13. NORAH Noise Impact Study. Traffic-noise-related annoyance and quality of life. Results. Kelsterbach, Germany: Umwelthaus.
Van Gerven, P. W. M., Vos, H., Van Boxtel, M., Janssen, S., & Miedema, H. (2009). Annoyance from environmental noise across the lifespan. Journal of the Acoustical Society of America, 126(1), 187-194.
Wirth, K., Brink, M., & Schierz, C. (2006). Lärmstudie 2000. Schlussbericht zur 2. Befragungsstudie vom August 2003. Zürich, Switzerland: ETH (, retrieved 2018/11/21).

Bochum and Hagen, Rainer Guski, Rudolf Schuemer, Dirk Schreckenberg, 2018/11/22.
+ Respond to this comment
Response 1 to Comment 1
Received: 26 November 2018
The commenter has declared there is no conflict of interests.
Comment: I appreciate the comments by Guski, Schreckenberg and Schuemer, the authors of the article that was the main target of my critical review. They have taken the time and effort to give detailed feedback so that mistakes and misunderstandings in my critical review can be corrected.
In the abstract I claim that the new WHO guidelines are based on an arbitrary selection of studies. (Webster: arbitrary = "coming about seemingly at random"). Note that I refer to the WHO Guidelines and not the evidence article by Guski et al. These authors have followed their protocol and presented one set of data from a very short time period, 5 years . In my paper I show that another selection among other existing surveys yields different results. I therefore consider the selection done by the Guidelines Development Group, GDG, arbitrary.

I go on writing that the selection of surveys comprises an imperfect and faulty set of data not representative for the general airport population. (Webster: faulty = "imperfect"). I use two adjectives with the same meaning for stylistic reasons, but I did not write "very imperfect". The grading of imperfection is done by Guski et al. in their rebuttal. Note that my critique at this point is directed towards the new guidelines, and the development process. The GDG should never have based their recommendations on one set of survey results only.

My specification of which studies were eligible for inclusion by Guski et al. in the WHO dataset is inaccurate and the text will be corrected accordingly. It is quite obvious that studies that were published after the WHO review had started could not be included by Guski et al. I also regret to have overlooked the reason for exclusion of the last three surveys. However, at least one of these studies, Trondheim/Værnes, had sufficient data in the original report to estimate the prevalence of HA at any noise exposure level like the authors requested (re. their inclusion criterion # 4). The procedure is described in detail in the international standard ISO 1996, Annex E, which one would assume was known to the authors.
But it is noted that only studies that complied with the exact report format preferred by the authors, i.e. statistical regression, were included. My comments will be rephrased since it unfortunately was interpreted as an accusation of manipulation.

Regarding the selection of respondents in the HYENA study I never said that these respondents were not members of the general population. My critique was that they could not be considered representative for the general population. Van Gerven et al. found a large age dependency regarding HA. Guski et al. in their comments refer to other studies where the age-dependency is smaller. We seem to agree, however, that more likely than not, the special age group in the HYENA study will yield higher prevalence of highly annoyed residents than a wider age group. This is unfortunate when the objectives are to find results representative for airport populations in general.

I criticized the use of a non-standardized annoyance question in the HYENA study not in compliance with inclusion criterion # 4. Guski et al. cite Wirth et al. and Babisch et al. to defend their choice. Both of these teams of researchers assume that over-all annoyance is mostly determined by annoyance during the day-time. In my opinion an assumption is not good enough to be used as evidence by WHO.

Five years residency was not a selection criterion defined by Guski et al. so this statement will be modified. Still, I maintain the view that residents who have experienced several years of construction work and subsequently have been exposed to two years of a completely new aircraft noise situation can hardly be considered representative for a general airport population. The extraordinarily high annoyance scores at Athens airport seem to support this view.

Guski et al. did not follow the arguments by Babisch et al. to exclude the results from Athens and Milan from their analysis. They refer to this as an intuitive decision by Babisch et al. lacking a specific exclusion criterion. One could ask for the same proof of evidence regarding the assumption by Babisch et al. that non-standardized annoyance questions would yield standardized responses.

The proposal by Janssen and Guski to differentiate between HRC and LRC airports is commendable, but this categorization still lacks a strict and well-proven definition. In the meantime, we will have to accept different opinions.

As stated in the review article I am fully aware that weighting of the responses according to sample size is a standard procedure among many analysts. This does not prove, however, that this method is not questionable. Guski et al. claim that acoustic factors – mainly continuous sound levels - are the major factors which residents are exposed to. Still non-acoustic factors play a bigger role in assessing noise annoyance. When trying to find an "average response" for situations dominated by very different non-acoustic factors, the sample size can give an unintended bias to special factors. This was illustrated by the example in the review article. There are no definite answers to this dilemma, but when looking for an EER that can be applied to airports in general it is unfortunate that one airport ruled by possibly very special non-acoustic factors is given a dominating position as is the case with Amsterdam airport in the WHO study.

I will await the comments by other reviewers as well and incorporate the comments by Guski et al. in a new version of the article.
Response 2 to Comment 1
Received: 26 November 2018
The commenter has declared there is no conflict of interests.
Comment: I see now that I indeed used the phrase "very imperfect". when characterizing the dataset. Not in the abstract, but further down in the text. This is an unnecessary grading of imperfectness that I should not have used. It will be rephrased in the next versjon of the paper

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 3
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.