An ERA-based study of built environment factors affecting lung cancer incidence rate among Chinese women

: Objective ： Application of ERA methods to investigate the atmospheric pollution and built environment factors influencing lung cancer incidence rate in Chinese women. METHODS: Lung cancer incidence rate among Chinese women at 339 cancer registries were obtained from the China Cancer Registry Annual Report 2017, air quality and built environment data were obtained from the Greenpeace and China Construction Yearbook. After multiple covariates variables were eliminated, an exploratory regression analysis was performed using the world standardized population incidence rate as the dependent variable. Air quality and built environment factors as the independent variable. Results: Shandong Peninsula, Hebei and Liaoning are high incidence rate areas of female lung cancer in China, with significant regional aggregation. In addition to air quality factors such as industrial smoke emission data, the association between built environmental factors such as urbanization rate, development LUI, population density and greening coverage of built-up areas and female lung cancer incidence rate is statistically significant. CONCLUSION: In addition to air quality factors, urban spatial factors can also significantly affect respiratory health. The LUI is positively while urbanization rates and population density are negatively correlated with the incidence rate of lung cancer. The role of green space for respiratory health has not been proven. In addition, there is little relationship between income and health, and similar findings are found for indicators such as the public transportation and roads network.


Introduction
Lung cancer is a malignant tumour that occurs in the lungs. The tumour of respiratory organs (trachea, bronchus, and lung), represented by lung cancer is one of the most predominant cancer in China at present (He Jie and Chen Wanqing, 2017). The incidence rate and mortality rates of lung cancer in both men and women are increasing year by year. The development of lung cancer is a complex process with many causative factors, complex etiologic and multi-stage development. The risk of lung cancer is somewhat related to tobacco, but air pollution is a risk factor that cannot be ignored. However, there are very few studies on air pollution and lung cancer in China. Little of them have addressed the spatial dimension of air pollution. The objective of this study was to determine the effect of air pollution and built environment factors on lung cancer incidence rate.
The smoking rate in China is 50.2% for males over 15 years of age. Counterpart the smoking rate is 2.8% among females which is much lower than the international level and lower than that of developed countries. But curiously, the incidence rate of lung cancer among females in China is at a high level in the world, suggesting that other risk factors other than smoking may exist (Ma Guangsheng et al., 2005).

Scope of the study 2.1 Scope of the study
This paper focuses on the causal link between built environment factors and lung cancer and its significance. In order to minimize the influence of personal lifestyle factors such as smoking and alcohol abuse, only the incidence rate of lung cancer in Chinese women was included in this study.

Incidence rate of lung cancer among women by province
We used the most recent incidence data from the China Cancer Registry Annual Report 2017 which published by the Chinese National Cancer Centre. To eliminate the interregional effects, we selected the age-standardized (Segi population) incidence rate as the study indicator.
The data of The China Cancer Registration Annual Report 2017 is reported from qualified 339 cancer registries in 129 urban and 210 rural in China covered 288,243,347 populations. Because of the uneven distribution of registries and the complicated administrative divisions in China, some registries belong to cities or counties, while some belong to prefecture-level cities, the scale of registries is not consistent and overlapping. It is not suitable for comparison at the city scale. Therefore, this study averaged the data from the sampling points according to the provinces to obtain the provincial incidence rate data of lung cancer among Chinese women ( Figure 1).

Air quality
In this study, two indicators were selected to characterize air quality: the air quality index(AQI) and industrial dust emissions. The AQI is a quantitative description of air quality. It includes six major pollutants: PM2.5, PM10, SO2, CO, O3fubu and NO2. Chinese PM2.5 data was first released in 2014. The data is unstable and small. Considering that air quality tends to stabilize in a short term and the registration time is lagging behind the actual time of diagnosis. This study uses the 2015 Average daily air quality of 366 key cities 2015 published by Greenpeace, an international environmental protection organization. It include a total of 133,436 data. Each province takes the annual average of all cities within range as air quality indicator.

Built environment factors
Built environment factors are selected for indicators such as land use intensity(LUI), greening coverage ratio, total number of buses per 10,000 persons, road area per capita and population density.

Data normalization
The incidence rate of lung cancer is adjusted by age-standardized (Segi population,1960). For convenience, the incidence rate of lung cancer adjusted by age-standardized (Segi population) is simplified by the incidence rate of lung cancer below. Because industrial smog and dust emissions, per capita income, population density are too large and differ from other indicators by orders of magnitude, they are normalized by logarithms.

Model construction
A literature review on healthy cities published in 2005-2009 by the WHO European, Evidence Review on the Spatial Determinants of Health in urban setting found that the main influences of urban planning on health are four aspects: land use, transportation, green space and urban design (Grant and Braubach, 2010). In this study, the incidence rate of lung cancer among women in a provinces were selected as the dependent variable, the land use LUI, the road area per capita, and the greening coverage of built-up areas were selected as the independent variables corresponding to the land use, transportation, and green space factors. In order to avoid subjective difference of urban design, an alternative approach is adopting, i.e. urbanization rate and the total number of buses per 10,000 persons as the more observant dependent variables. Because atmospheric pollution factors have a direct impact on respiratory health, the AQI and industrial smog and dust emissions were selected as indicators as a separate factor of influence. At the same time, socioeconomic status also has an impact on health (Mitchell and Popham, 2008;Zhang, 2013). This study uses regional per capita income participation modelling ( Figure 2).

Multi-collinearity diagnosis
To exclude the effect of multi-collinearity, the data were imported into SPSS 22.0 with female lung cancer incidence rate as the predictor variable and air quality index as the independent variable. The adjusted R-square of the results reached 0.412, with a significance test P-value <0.01, rejecting the null hypothesis that there is a strong correlation between lung cancer incidence rate in women and air quality. However, it was also found that the variance inflation factor (VIF) between AQI and PM2.5 and PM10 was >25. Only one of them could be retained. Considering that PM2.5 particles can penetrate deep into the alveoli, the harm is greater and more well-known (Mitchell and Popham, 2008;Zhang, 2013), the PM2.5 indicator were retained. Other variables with significant P values greater than 0.05 were also excluded. This way the air quality variable retains only PM2.5 and more representative.

Exploratory regression analysis
Exploratory regression analysis (ERA) is a data mining tool to understand which models can be passed by all necessary Ordinary Least Square (OLS) diagnosis. Using ERA tools, it is possible to perform an exhaustive analysis of all combinations of covariates to find the OLS model that meets the set conditions and has the best explanatory effect on dependent variable. And to determine whether the influencing factor is significant and the magnitude of the effect. The use of ERA tools has advantages over other methods of evaluating model performance based solely on adjusted R 2 values. It can be greatly increased the chances of finding the best model by evaluating all possible combinations of candidate explanatory variables.

Statistical characteristics of the incidence rate of lung cancer among women in China
After statistics and analysis, it can be seen that the national average incidence rate of lung cancer is 171.06 per 100,000 for women and 360.72 per 100,000 for men, which is 2.1 times that of women. The incidence rate of lung cancer among Chinese women was 0.34 at the minimum(Tibet) and 423.95 per 100,000 (Shandong) at the maximum, with a mean value of 119.63 and a standard deviation of 119.40 per 100,000 persons. In 2014, the five provinces with the highest incidence rate of lung cancer among women in China were Shandong, Hebei, Liaoning, Henan and Anhui( Figure  3).

Spatial distribution of lung cancer incidence rate among women in China
Comparing the spatial distribution of the incidence rate of lung cancer among men and women in China in 2014, it can be seen that the high incidence rate of lung cancer among women is basically the same as that of men, with Hebei and Liaoning being the high incidence rate of lung cancer among women on the Shandong Peninsula, and Xinjiang and Tibet being relatively rare, with a clear spacial and regional concentration ( Figure 4). Figure 4 Spatial distribution of lung cancer incidence rate among Chinese women

Results of exploratory regression analysis
The data were imported into Arcgis 10.2 software and analysed using the "exploratory regression analysis" tool. The female lung cancer incidence as the dependent variable, and PM2.5, greening coverage of built-up areas, urbanization rate, land use LUI, road area per capita, total number of buses per 10,000 persons, log population density, log smog and dust emission, log per capita income were selected as the explanatory variables. After operation, it can be seen that 466 models were calculated, 100% passed the VIF test, 409 models passed the Jarque-Bera test, 21 models passed the space autocorrelation (Moran's I value test), 4 models passed the model significance (95% confidence range), but only 2 models passed all the model set threshold, and only 1 model included all the independent variable. In order to fully show the effect of the variables, it is also included in the table below.

Discussion
It has been shown that the main risk factors for lung cancer come from autosomal recessive heredity and lifestyle habits (e.g. smoking, alcoholism According to the results of model 2, it can be seen that the variable of industrial smog and dust emission in the air quality index has a high correlation with the incidence rate of lung cancer among Chinese women (P<0.01, significance is 100%) and PM2.5 is also significant, but the stability is weaker than the industrial smog and dust emission. The air pollution level has a stronger correlation with the incidence rate of lung cancer among Chinese women, supporting the existing theoretical hypothesis (Cardoso et al., 2019).
Among built environment factors, previous studies have concluded that high density of built environments means that the distance between destination and departure is short and people tend to choose to travel by foot or bicycle, which is good for their health. Not only do green spaces increase the walkability index, but green plants can absorb atmospheric pollutants. The incidence rate of respiratory diseases was significantly lower in the greener regions than in the poorer ones.

Implications for cancer prevention in China from the view of built environment
The results of our model showed a positive correlation between LUI and lung cancer incidence rate, which is consistent with the results of existing studies. The LUI is too high, which means that the density of buildings within the site is high and the number of high-rise buildings is large, more convenient for living. However, i.e. building sites occupy a large amounts of land, squeezing out green and public space. Overcrowding has been shown to be an important built environment factor for stress-induced chronic disease ( Urbanization rate and population density are negatively correlated with the incidence rate of lung cancer among Chinese women, which differs from the findings of some existing studies (Dong Chongya and Kang Xiaoping, 2014). But it is consistent with a research paper written by National Cancer Centre. In this paper, it was concluded that residents of rural areas had significantly higher incidence and mortality rates for all cancers combined than urban residents (213.6 per 100,000 vs 191.5 per 100,000 for incidence; 149.0 per 100,000 vs 109.5 per 100,000 for mortality, respectively). And we believe that it is also related to the larger sample size. The provinces with higher urbanization rate in China are located in the southeastern coastal areas. These areas are more urbanized and the economy is more developed, so public services are better, the residents are more educated. According to a study by Huazhong University of Science and Technology it is believed that the smoking rate in the eastern region has declined somewhat in the past decade, while the smoking rate in the central and western regions has remained constant. Furthermore, smoking rates are consistently relatively higher in rural areas compared to urban areas (Minghuan et al., 2018), which may explain some of the problem.
In addition, the greening coverage ratio of built-up areas was positively correlated with the incidence rate of lung cancer in women, which is also somewhat different from the general perception and some existing studies. This may be twofold, the first is that the relationship between the amount of green space and health outcomes is insignificant. A study in the UK (2017) showing that green space in low-income suburbs is less accessible and of poorer quality. The health outcomes in these deprived areas tend to be worse than average. The reason may be that poor quality green spaces, although numerous, are not sufficient to offset the health problems of the low-income population, or that poor quality green spaces are actually harmful to health (Mitchell and Popham, 2007). A study in the Netherlands also showed that there was no significant link between the amount of green space in the living environment and health. People who had more green space in their living environment spent more time on gardening, resulting in less physical activity and less time, which in turn was detrimental to health (Maas et al., 2008). Secondly, there may be a problem with our choice of green space indicators. The calculating of Greening coverage ratio of built-up areas is relatively rough and broad. It reflects the ecological and environmental protection status of a region. Protective green space, production green space and green planting on roofs is all included in it. In addition, the Greening coverage ratio of urban builtup areas in various provinces is only an average, the difference between rural and urban living environments is not reflected. Resulting that the greening coverage ratio of built-up areas in various provinces do not differ much, therefore, the Greening coverage ratio of built-up areas may not be suitable as an indicator of the greening rate of living environments.
The total number of buses per 10,000 persons and road area per capita indicators are not significant. It may imply that lung cancer pathogenesis is different from chronic diseases such as obesity and that physical activity is not a risk factor for respiratory health. The per capita income indicator is the least significant of the explanatory variables, indicating that the incidence rate of lung cancer among China women is also not associated with the socio-economic status of the individual.

Limitation
Although the relationship between the incidence rate of lung cancer and built environment factors is discussed to some extent, the analysis is not in-depth. The study only confirmed the influence of built environment factors on the incidence rate of lung cancer and which factors have a greater influence and which factors have not yet been found to have a definitive effect. Although some weak causal links were found, the spatial metrology model established may be low according to the usual validity criteria. On the one hand, it was limited to the author's academic level, on the other hand, the research on healthy human settlement is still a fairly young field. There are a lot of nonlinear compound relationships between urban space and respiratory health. Maybe the general linear mathematical model cannot be better simulated and research methods need improvement.
Even though existing big data and mathematical modelling techniques have improved dramatically over the past, healthy human settlements research remains a considerable challenge. This is largely constrained by the systemic complexity inherent in the field of public health and healthy human settlements. Analytical techniques based on linear multiple regression methods may not be the best approach for the complex mega-system of healthy human settlements.
The third, greening coverage ratio of built-up areas indicator may not suitable for our study because of roughness. However, the statistical calibre of the various types of publicly available statistical information are all based on the "Greening coverage ratio of built-up areas". For the time being, due to the limited availability of data, the study can only be replaced by an indicator of greening coverage ratio in built-up areas. So the robustness of this conclusion has yet to be proven and this gap will be filled in the follow-up study.

conclusion
The results show that industrial smog and dust emissions and PM2.5 are significantly health risks factors for respiratory health. They must be combat and control. In addition to air quality factors, built environment factors are also significantly respiratory health risk factors. The LUI is positively correlated with the incidence rate of lung cancer in women, which is detrimental to all residents' health. Urbanization rates and population density are negatively correlated with the incidence rate of lung cancer. That is to say, perhaps high-quality medical services are a deterrent to lung cancer. The role of green space for respiratory health has not been proven. In addition, there is little relationship between income and health, and similar findings are found for indicators such as the public transportation and roads network. However, the existing research is far from in-depth and rich, and further study and optimized research methods is needed.