TY - JOUR
T1 - Using a land use regression model with machine learning to estimate ground level PM2.5
AU - Wong, Pei Yi
AU - Lee, Hsiao Yun
AU - Zeng, Yu Ting
AU - Chern, Yinq Rong
AU - Chen, Nai Tzu
AU - Candice Lung, Shih Chun
AU - Su, Huey Jen
AU - Wu, Chih Da
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/5/15
Y1 - 2021/5/15
N2 - Ambient fine particulate matter (PM2.5) has been ranked as the sixth leading risk factor globally for death and disability. Modelling methods based on having access to a limited number of monitor stations are required for capturing PM2.5 spatial and temporal continuous variations with a sufficient resolution. This study utilized a land use regression (LUR) model with machine learning to assess the spatial-temporal variability of PM2.5. Daily average PM2.5 data was collected from 73 fixed air quality monitoring stations that belonged to the Taiwan EPA on the main island of Taiwan. Nearly 280,000 observations from 2006 to 2016 were used for the analysis. Several datasets were collected to determine spatial predictor variables, including the EPA environmental resources dataset, a meteorological dataset, a land-use inventory, a landmark dataset, a digital road network map, a digital terrain model, MODIS Normalized Difference Vegetation Index (NDVI) database, and a power plant distribution dataset. First, conventional LUR and Hybrid Kriging-LUR were utilized to identify the important predictor variables. Then, deep neural network, random forest, and XGBoost algorithms were used to fit the prediction model based on the variables selected by the LUR models. Data splitting, 10-fold cross validation, external data verification, and seasonal-based and county-based validation methods were used to verify the robustness of the developed models. The results demonstrated that the proposed conventional LUR and Hybrid Kriging-LUR models captured 58% and 89% of PM2.5 variations, respectively. When XGBoost algorithm was incorporated, the explanatory power of the models increased to 73% and 94%, respectively. The Hybrid Kriging-LUR with XGBoost algorithm outperformed the other integrated methods. This study demonstrates the value of combining Hybrid Kriging-LUR model and an XGBoost algorithm for estimating the spatial-temporal variability of PM2.5 exposures.
AB - Ambient fine particulate matter (PM2.5) has been ranked as the sixth leading risk factor globally for death and disability. Modelling methods based on having access to a limited number of monitor stations are required for capturing PM2.5 spatial and temporal continuous variations with a sufficient resolution. This study utilized a land use regression (LUR) model with machine learning to assess the spatial-temporal variability of PM2.5. Daily average PM2.5 data was collected from 73 fixed air quality monitoring stations that belonged to the Taiwan EPA on the main island of Taiwan. Nearly 280,000 observations from 2006 to 2016 were used for the analysis. Several datasets were collected to determine spatial predictor variables, including the EPA environmental resources dataset, a meteorological dataset, a land-use inventory, a landmark dataset, a digital road network map, a digital terrain model, MODIS Normalized Difference Vegetation Index (NDVI) database, and a power plant distribution dataset. First, conventional LUR and Hybrid Kriging-LUR were utilized to identify the important predictor variables. Then, deep neural network, random forest, and XGBoost algorithms were used to fit the prediction model based on the variables selected by the LUR models. Data splitting, 10-fold cross validation, external data verification, and seasonal-based and county-based validation methods were used to verify the robustness of the developed models. The results demonstrated that the proposed conventional LUR and Hybrid Kriging-LUR models captured 58% and 89% of PM2.5 variations, respectively. When XGBoost algorithm was incorporated, the explanatory power of the models increased to 73% and 94%, respectively. The Hybrid Kriging-LUR with XGBoost algorithm outperformed the other integrated methods. This study demonstrates the value of combining Hybrid Kriging-LUR model and an XGBoost algorithm for estimating the spatial-temporal variability of PM2.5 exposures.
UR - http://www.scopus.com/inward/record.url?scp=85102590408&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102590408&partnerID=8YFLogxK
U2 - 10.1016/j.envpol.2021.116846
DO - 10.1016/j.envpol.2021.116846
M3 - Article
C2 - 33735646
AN - SCOPUS:85102590408
SN - 0269-7491
VL - 277
JO - Environmental Pollution
JF - Environmental Pollution
M1 - 116846
ER -