Analyzing the Contribution of Road Traffic to Changes of Air Pollutants Using Random Forest: Insights from COVID-19 lockdown in Wuhan

During the COVID-19 lockdown in Wuhan, transportation, industrial production and other human activities declined significantly, as did the NO2 concentration. In order to assess the relative contributions of different factors to reductions of air pollutants, sensitivity experiments were implemented by random forest (RF) model, with the comparison of contributions of meteorology, road traffic, and emission sources between different periods. Besides, an emulator was operated to suggest an appropriate limit for control of transportation. The RF models showed different mechanisms for air pollutants. Within-city Migration index (WMI) was more important in the normal, pre-lockdown and post-pandemic model while Out-Migration Index (OMI) was emphasized in the lockdown model. In the COVID-19 lockdown period, 73.3% of the reduction can be attributed to the decreased road traffic, showing massive impact of road traffic on the air quality. In the postpandemic period, meteorology controlled about 42.2% of the decrease and emissions from industry and household controlled 40.0% while road traffic only contributed to 17.8%. It was suggested that priority of restriction should be given to road traffic within the city. A limit of less than 40% on the control of the road traffic can get a better effect, especially for cities with severe traffic pollution.


Introduction
In recent years, the severe air pollution has killed many lives, which has aroused many concerns [1]. The air pollution is a comprehensive result of meteorology and human activities. Road traffic is deemed as an important source of NO2 and PM2.5 [2][3][4], and it also has an important impact on the concentration of O3.
After the COVID-19 outbroke in Wuhan in December, 2019 [5][6], China activated first-level public health emergency response in Wuhan on 23rd, January, taking measures such as cancelling mass gatherings, reducing the frequency of bus services in the city and halting long-distance buses [7]. Therefore, transportation, industrial production and other urban activities fell sharply [8][9]. Moreover, reports of air pollution below normal level emerged, not only in China [10][11][12] but also in many other areas such as South Korea [13], Autoregressive Distributed Lag (NARDL) model have been employed to explain the connection between the COVID-19 lockdown and the dropped air pollution [18][19][20][21]. Random forest (RF) is an ensemble learning method, which consists of several simple decision trees [22]. It samples with replacement and constructs several subsets and outputs prediction by aggregating and averaging the individual predictions of each weak tree on the subset [23]. Two important parameters should be considered carefully in the random forest algorithm [24]. mtry determines the variable sampling value of each iteration and the number of variables sampled to grow each leaf within a tree. ntree specifies the number of decision trees contained in the random forest, which is 500 by default. Thanks to its outof-bag estimation, the random forest algorithm has advantages of good generalization and simplicity regarding the ingestion of heterogeneous data. [25][26]. Furthermore, the algorithm can interpret the model straightforward due to the estimation of variable importance.
Previous researches mainly focus on the changes of air pollutants and their sources during the lockdown period while few of them study the contribution rate of meteorology and different sources [27], and they seldom analyze the relationship between the change of pollutants and the extent of road traffic changes. Generally, the relative variable importance of random forest is difficult to distinguish the real contribution of influencing factors, lacking analysis on the actual effect of road traffic on pollutant concentration. Although some people have conducted quantitative analysis on the influence of transportation, they always put emphasis on periods before and after the COVID-19 lockdown and seldom compare the conditions with the same period in 2021 [18,28], when the government implements regular epidemic prevention and control measures. What's more, there is no trade-off analysis between the decrease of pollutant concentration and the decrease of traffic volume, which is insufficient to guide the actual road traffic control.
Taking these into consideration, three questions were put forward in this paper. Firstly, how did the COVID-19 lockdown affect air pollutants? Compared with the same period in 2019 and 2021, to what extent did the concentrations of air pollutants change? Secondly, what a role did meteorology, road traffic and emission from industry and household play in the change of the NO2 concentration during different periods? Thirdly, what's the relationship between the decrease of air pollutants and the reduction of road traffic? Is there a priority and an optimal value for road traffic control? To answer these questions, Exploratory Data Analysis (EDA) was used to learn about the variation in air pollutants and related variables in Wuhan from 1st January to 16th March in 2020 and in the same period of the lunar calendar in 2019 and 2021. In addition, RF models were employed to calculate variable importance to analyze the impact of meteorological factors and human activities on NO2 concentration. Meanwhile, nine sensitivity experiments were operated to evaluate the contribution of meteorology, road traffic and emissions from industry and household to the change of NO2 concentration. Finally, an emulator of change in road traffic as well as that in NO2 concentration helped identify the relationship between them and thus to find an effective way to reduce air pollution.

Research area
Wuhan is an important industrial city in central China, with a population of 11 million. With the development of the economy, many large-scale air pollution events occur in Wuhan, which poses a significant threat to air quality [29]. Compared to other cities, it has relatively severe NO2 pollution and stricter epidemic prevention and control.

of 17
The dataset of hourly concentrations of air pollutants is downloaded from the website https://quotsoft.net/air/, a third party which publishes air quality data crawled from China National Environmental Monitoring Centre (http://www.cnemc.cn/). There are ten air quality monitoring sites in Wuhan, as depicted in Figure 1. The average value of 24hour hourly concentration is calculated as the daily concentration, and then the values of ten monitoring sites are averaged as the concentration of air pollutants in Wuhan.

Road traffic
Compared with other air pollutants, the NO2 concentration is more directly connected to anthropogenic factors, among which emission from transportation is an important contributor [30][31]. Daily migration data is obtained from Baidu Migration to show the real-time information about road traffic [32]. These do not represent the absolute number of the population, but are relative indexes related to road traffic. The daily In-Migration Index (IMI) and Out-Migration Index (OMI) are chosen to represent the inflow and outflow traffic volume of Wuhan, and the daily Within-City Migration Index (WMI) to represent road traffic in Wuhan [20]. IMI and OMI are the indexation results of the ratio of the number of people who have moved into and out of Wuhan to the total number of residents in Wuhan. WMI represents the indexation result of the ratio of the number of people traveling in Wuhan to the total resident population in Wuhan [27].

Meteorological data
The daily surface climate dataset of Wuhan is downloaded from China meteorological science data sharing service network (http://data.cma.cn). Six meteorological variables are considered in the models, including precipitation (prep), the average air pressure (pressure), average temperature (temp), average relative humidity (RH), maximum wind speed (ws) and maximum wind direction (wd), which have been verified closely linked with the NO2 concentration [33].

Methods
The flowchart of the research methodology is shown in Figure 2. Firstly, the correlation between changes of pollutant concentration and variables are analyzed through EDA during four periods, including the normal period (January 12 to March 28, 2019), prelockdown period (January 1 to January 22, 2020), lockdown period (January 24 to March 16, 2020), and post-pandemic period (January 19 to April 4, 2021). Secondly, the NO2 concentration in each period is fitted by RF and the importance of variables is calculated. Then, the relative contributions of meteorology, road traffic, emissions from industrial and household sources (represented by the concentration of SO2) to the change of NO2 concentration are calculated through three groups of nine sensitivity experiments. Finally, the change curve of pollutant concentration with the decrease of road traffic is simulated by RF.

Random Forest models
Random forest models are employed to conduct regression analysis to fit NO2 concentrations in the four periods. 10-fold validation is operated in the model of each year and the most appropriate value of mtry and ntree is chosen by grid search to pursue the minimum R-squared of the model. Additionally, variable importance is calculated to explore the different mechanisms for air pollutants in three years. Apart from the variables about meteorology and road traffic, time variables and emission variables are also included in the models. The variable week (value ranging from 1 to 7) and Julian are used to indicate emissions during a week or a year respectively. As the Spring Festival effect is an important issue in the study of air pollution, the variable lunar, which means the number of days after the first day of the Lunar New Year holiday is used to reflect changes in air pollutants before and after the Spring Festival holiday [18,34]. For example, the Spring Festival in 2019 is on 5th February, so lunar for 3rd, 4th, 5th February is -1, 0 and 1, respectively. According to the Bulletin of the second national survey of pollution sources in Wuhan, nitrogen oxide mainly comes from industry, households and road traffic while SO2 mainly comes from industry and households. Therefore, SO2 is incorporated into the random forest model to imply the amount of emission from industry and household.

Sensitivity experiments
Nine experiments are performed to quantify the contributions of meteorology (EXPMet,i), road traffic (EXPMob,i) and emission from industry and household (EXPEmi,i) to the changes of NO2 during the pre-lockdown, lockdown and post-pandemic period, respectively. (see as Table 1) [35] For example, under the assumption that the level without the change of the specific variables will be similar to that in 2019, except meteorological factors, other variables are set same as the same lunar period in the 2019 when evaluating the contributions of meteorology. The models are operated in R version 3.6.1. And the contributions and the normalization process are calculated as follows, in which i=1 means the pre-lockdown period, i=2 means the lockdown period and i=3 means the postpandemic period.

Experiment Description
EXPMet,i RF model run with meteorological variables in i period and other variables in

. Scenario analysis
An emulator is operated in order to analyze the relationship between the reduction of the NO2 concentration and the control of road traffic [28]. The level of road traffic in 2019 is deemed as the normal level and ten scenarios with each ten percent reduction are set. Random Forest is employed to predict the corresponding NO2 concentration.  As was shown in figure 3, it was obvious that the NO2 and PM2.5 concentration dropped sharply after the lockdown began in Wuhan in 2020 (January 23rd, when lunar was -1) while the O3 concentration obviously increased. During the first week after the Spring Festival (when lunar was from 1 to 7), the CO, NO2, PM2.5 and SO2 concentration significantly decreased. In the normal period in 2019, they rebounded slowly after the Spring Festival. However, the NO2 and PM2.5 concentration in 2020 kept at a low level after the Spring Festival because of the lockdown. Meanwhile, the concentrations of the two pollutants in 2021 were still lower than that in 2019. The change of NO2 concentration was most prominent, with 42.04 μg/m 3 in the pre-lockdown period dropped by 24.9% than normal period and 19.75 μg/m 3 in the lockdown period dropped by 53.9%. In the postlockdown period in 2021, the NO2 concentration was 39.31 μg/m 3 and still 15.4% less than normal period.

Changes of related variables
The meteorological conditions in three years were listed in table 2 and the changes in road traffic in three years were illustrated in figure 4. The meteorological conditions changed a little in three years, although it didn't appear a consistent trend. The precipitation (prep) and temperature (temp) showed an upward trend in three years. The average values in pressure(pressure), relative humidity (RH) and wind direction(wd) in 2020 were largest while the average wind speed(ws) was smallest.  The Spring Festival migration in 2020 reached the maximum on January 12th and it could be inferred that a large number of people left Wuhan a few days before the lockdown, which was more than the common number as before. At the beginning of Spring Festival migration (when lunar was - 14), the values of the WMI, IMI and OMI in 2020 were generally higher than those in 2019. Even on January 23 in 2020 (when lunar was -1 and 10:00 a.m. was Wuhan's lockdown time), the number of people leaving Wuhan on that day was higher than that on the same day of the lunar calendar in 2019. At the same time, there was a lag effect of the decrease of human activities in the first few days of the COVID-19 lockdown, and the degree of the decrease gradually increased. Before the COVID-19 lockdown, Wuhan had a large number of population flow. Different from that indexes rebounded after the first week of the holiday of the Spring Festival in 2019, the concentration in 2020 still remained at a low level stably without rebounding due to the low level of human activities. The difference in road traffic between 2019 and 2020 remained stable since the third week of the Spring Festival holiday.
In addition, the average WMI in 2021 was 5.23, larger than 3.85 in 2019 while the average IMI and OMI were 3.57 and 3.44, both less than 5.03 and 4.50 in 2019. This meant that in 2021, when the epidemic prevention and control was regular, the travel intensity of residents in the city increased while the activities of moving into and out of the city decreased, with most of residents travelling in the city. Notably, there was a high value of WMI on April 3 in 2021, when it came to the first day of the Tomb Sweeping Day holiday.

Different mechanisms in models
RF models of four periods all had high accuracy, the smallest cross-validation Rsquared (CV-R 2 ) of which was 0.66. Other goodness indicators and hyperparameters were shown in Table 3. Overall, all the simulation results were acceptable and the simulated concentration agreed well with the observed data, with the correlation coefficient (CORR) 0.98, 0.97, 0.98, 0.98 for each period. The normalized mean bias (NMB) were 0.04, 0.75, -0.02, 0.17 and the normalized mean error (NME) were 7.82, 6.47, 6.88, 7.64. However, different mechanisms were illustrated in four periods. As was shown in Figure 5, although SO2, which represented emissions from industry and household ranked first in four periods, the normal model and the lockdown model were more influenced by lunar and Julian, which meant a temporal trend. ws ranked second in the prelockdown model and post-pandemic model, showing the impact of meteorological factors. Variables about road traffic like WMI and OMI also played an important role in the four periods. Differently, WMI was more important in the 2019, pre-lockdown and postpandemic model while OMI was emphasized in the lockdown model, which meant that people who migrated into Wuhan had more effect on the air pollution than people who travelled in the city, which didn't correspond with the situations in the other three periods.

Relative contributions of meteorology and human activities
As was shown in figure 6 and table 4, it was acceptable that the contributions of meteorology and human activities varied from day to day. In the pre-lockdown period, the NO2 concentration run with EXPMob,1 was more similar to the NO2 concentration as the normal period than that run with EXPMet,1 and EXPEmi,1, which meant that road traffic led to the least contribution to the change of air pollutant and the changed emission contributed the greatest. The values of normalized contribution were 35.2%, 13.8% and 51.0% for ( ), ( ) and ( ). During the first week of the pre-lockdown period, changes of meteorology made greatest contributions to daily reductions of the concentration. During the last two weeks of this period, the reduction of emissions from industry and household played the most important role in the reduction of the concentration.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 August 2021 doi:10.20944/preprints202108.0024.v1  During the lockdown period, the observed NO2 concentrations were far less than those in the same period in 2019 except the days when lunar was between 4 and 12. The average NO2 concentrations run with EXPMet,2 was larger than the observed NO2 concentrations in the same period in 2019, indicating that the meteorology in the lockdown period essentially was unfavorable to the reduction of the air pollutant. The average NO2 concentration run with EXPMob,2 was smallest, which meant that road traffic dominated the reduction of the NO2 concentration in the lockdown period. What's more, the average normalized contribution of ( ) , ( ) and ( ) were 10.0%, 73.3% and 16.7%. During the first two weeks of the lockdown period, the decrease of the concentration was mostly attributed to the favorable meteorological conditions. The changes of road traffic dominated changes of the concentration during the third, fourth and last week of the lockdown period. Most of the time, road traffic was beneficial to the decrease of the concentration expect the days when lunar was 16,17, 29 and 47. During the post-pandemic period, the observed NO2 concentration was less than the same period in 2019, the reduction of which was mostly due to emission from industry and household. The NO2 concentration run with EXPMet,3 was higher than that in 2019, which showed that meteorology in 2021 was favorable to the increase of the NO2 concentration. Moreover, the level of road traffic recovered to the normal level, with the NO2 concentration run with EXPMob,3 similar to that in 2019. In summary, according to the normalization process, the meteorological conditions controlled about 42.2% decrease and the reduced emissions from industry and household controlled 40.0% decrease while the emission caused by road traffic only contributed to 17.8% decrease. As was shown in Figure 6 (f), in the post-pandemic era, the NO2 concentration didn't completely show a downward trend compared with the same period in 2019, and the contribution rate of variables varied greatly from day to day. When lunar was -6, meteorology led to an increase of 490.4% of pollutants, while road traffic led to a decrease of 569.4% of concentration. When lunar was 32, 33 and 47, road traffic led to an increase of 799.5%, 117.2% and 126.0% of concentration, while meteorology was conducive to the reduction of pollutants.

Different scenarios of road traffic control
The emulator demonstrated the linkage between NO2 concentrations and road traffic by predicting the concentrations with different reductions of road traffic compared to the level in the year 2019 level. As was shown in figure 7, generally, the concentration with high value decreased with the reduction of the level of road traffic. In a and b scenarios, when the IMI and OMI decreased, the concentration with low value increased instead. However, in c scenario, when the IMI and OMI were fixed, the law was constant that the concentration decreased with the WMI reduced. Moreover, in terms of the downward trend, compared with scenario b, changing WMI in c scenario had a more obvious effect Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 August 2021 doi:10.20944/preprints202108.0024.v1 on the reduction of pollutant concentration. In addition, in all the scenarios, the same reduction ratio had a stronger effect at the beginning, with a more obvious downward trend. When the level of road traffic was below 60% of the normal level, the pollutant concentration would remain at a stable level, which meant the impact of the change of road traffic would be smaller then. 4. Discussion

The changes of air pollutants
The Spring Festival holiday usually results in the decrease of PM2.5 and NO2 and an increase in ozone in big megacities [36] because many people return to their hometowns [37][38]. The changes of air pollutants are comprehensive results of holiday effect and the pandemic control policies [39].
During the pre-lockdown period, the Spring Festival migration in 2020 starts on January 10. In contrast to 2019, the NO2 concentration in 2020 fluctuates many times, rising from January 10 to January 12, then decreasing after January 12. After reaching the minimum on the 16, it then rises sharply. On January 13, Wuhan strengthens the epidemic prevention and control and emphasizes disinfection, ventilation and body temperature monitoring in public areas. After peaking on 20 January, the NO2 concentration continues to decrease.
During the lockdown-period, same to the situation in 2019, it drops to the minimum in the first week after the Spring Festival (January 25). When Wuhan takes COVID-19 lockdown policy on January 23, the spontaneous personal flow is cancelled and the IMI decreases. During the first week of the Spring Festival holiday, the difference is generally large, which may be related to the restriction and cancellation of the traditional custom of visiting relatives and friends. Nevertheless, the difference during the week after January 30 reaches a relatively small value as the enterprises offering protective products and Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 August 2021 doi:10.20944/preprints202108.0024.v1 emergency supplies resume work and production. Road transport, industries, and thermal power plants may cause a large number of NOx emissions [40] and thus the NO2 concentration begins to rise slowly on January 26 and remains at a relatively high level after the Spring Festival from January 26 to February 5. One reason for this may be due to the fact that medical workers from all over the country race against the clock to assist Wuhan, which increases the IMI. Another reason is that medical materials enterprises operate at full capacity, which enhance the industrial activities and transportation. The NO2 concentration begins to decline after February 5, and remains at a low level without no obvious rebound even during the previous Spring Festival travel rush, which shows a close correspondence with the COVID-19 lockdown. On February 16, Wuhan opens the necessary public places, with the WMI increasing.

Contributions of meteorological conditions and emission during different periods
The RF models have good accuracy as the atmospheric physical transport model, compared with the validation parameters in other research [35]. Assessing meteorological differences is important because similar meteorological conditions usually accompany with similar NO2 levels [41]. As for contributions of meteorology, the importance of prep is relatively high in normal period, and ws, prep, wd in pre-lockdown period, RH in lockdown period, and temp in post-pandemic period. The average values in pressure, RH and wd in 2020 are largest while ws is smallest. High relative humidity is helpful to eliminate NO2 while high temperature is conducive to the persistence of NO2 [42][43]. For example, when lunar is -19 in 2021, the temperature is 6.6 ℃, smaller than 8 the day before and 8.5 the day after and the ( ) is 148.7% while ( ) is -47.0%, which means meteorology controls much reduction of NO2 concentration.
The NO2 concentration is closely tied to traffic volumes and fossil fuel use and emission by industrial activities. During the pre-lockdown period, when lunar is -23, -9, -8 and 22, the level of road traffic increases (WMI is larger than the normal level), which isn't beneficial to the reduction of the air pollutant, with ( ) under zero. During the lockdown period, road traffic even controls about 73.3% of the reduction of the NO2 concentration on average. In contrast, when lunar is -6 in 2021, IMI is only 1.88 (compared by 4.23 in the lockdown period and 4.41 in the normal period) and the ( ) is up to 569.4%

Insights for the control of road traffic
Since traffic plays an important role in the NO2 pollution, actions should be taken to reduce traffic congestion. Policies such as the creation of a low-emission zone (LEZ) area have been proven effective [45]. To set an appropriate limitation on the level of mobility has been also verified in our study [41]. Enterprises with low energy consumption and high technologies should be supported. What's more, the role of the public in environmental governance should be paid great attraction to [46]. The random forest model proves a strong connection between the road traffic and the NO2 concentration. This shows that it is feasible to reduce the air pollution by controlling road traffic while it needs to be carefully evaluated since the effect of reducing the pollutant level by controlling traffic has a certain threshold. Overall, the control of the road traffic in the city has an obvious and efficient effect on the NO2 concentration.

Limitations of this study
It should be noted that in the past few years, China has implemented new environmental regulations and actively promoted terminal treatment and industrial structure optimization [31]. Therefore, the actual value in the case of no COVID-19 lockdown may be slightly different from the predicted value calculated using the 2019 model. The impact of this trend can be optimized by using the relevant data in 2018 or even earlier in future research. Besides, relationship between anthropogenic activities with different travel purposes and air pollutants could be further analyzed with Google mobility data [47]. Although there is difficulty in finding the same contribution rate of pollution sources in different cities [27], it will be meaningful to make a comparison between many cities in different areas.

Conclusions
Datasets about air pollutants, meteorology and road traffic were achieved to study the contributions of relevant factors for the changes of air pollution before, during and after the COVID-19 lockdown. Using the random forest models, sensitivity experiments were established quantitatively to estimate the relative contributions of the meteorological conditions and different emission sources. An emulator was also implemented to look for an effective way to reduce the air pollution by controlling road traffic. Conclusions were drawn as follows: 1. The COVID-19 lockdown led to a significant decrease in the NO2 concentration. The change of NO2 concentration was most prominent, dropped by 24.9% in the pre-lockdown period, 53.9% in the lockdown period and 15.4% in the post-pandemic period. The average PM2.5 concentration and average SO2 concentration also were the lowest in the 2020 and highest in the 2019. In contrast, the average O3 concentration increased in 2020 and dropped in 2021. 2. The results of RF models were basically in agreement with the reality. Different air pollutants mechanisms were implied in four periods. In the pre-lockdown and postpandemic period, meteorology and emission from industry and household played a more important role in changes of air pollutants than road traffic. However, road traffic had a great impact on the decrease of NO2 concentration in the lockdown period, especially by the variable WMI. 3. The reductions of NO2 during the COVID-19 lockdown period were mostly attributed to the decreased road traffic, with 10.0% caused by ( ), 73.3% by ( )and 16.7% by ( ). Road traffic dominated the reduction of the NO2 concentration in the lockdown period, for the smallest average NO2 concentration run with EXPMob,2.
4. When it comes to making policies for reducing air pollution, there is a great potential in transportation and the role of the public should be paid great importance to. Placing an appropriate restriction on road traffic within the city is an effective way to reduce the air pollution, especially for cities with severe NO2 pollution.