Using land-use machine learning models to estimate daily NO2 concentration variations in Taiwan

Pei Yi Wong, Huey Jen Su, Hsiao Yun Lee, Yu Cheng Chen, Ya Ping Hsiao, Jen Wei Huang, Tee Ann Teo, Chih Da Wu, John D. Spengler

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)


It is likely that exposure surrogates from monitoring stations with various limitations are not sufficient for epidemiological studies covering large areas. Moreover, the spatiotemporal resolution of air pollution modelling approaches must be improved in order to achieve more accurate estimates. If not, the exposure assessments will not be applicable in future health risk assessments. To deal with this challenge, this study featured Land-Use Regression (LUR) models that use machine learning to assess the spatial-temporal variability of Nitrogen Dioxide (NO2). Daily average NO2 data was collected from 70 fixed air quality monitoring stations, belonging to the Taiwanese EPA, on the main island of Taiwan. Around 0.41 million observations from 2000 to 2016 were used for the analysis. Several datasets were employed to determine spatial predictor variables, including the EPA environmental resources dataset, the meteorological dataset, the land-use inventory, the landmark dataset, the digital road network map, the digital terrain model, MODIS Normalized Difference Vegetation Index database, and the power plant distribution dataset. Regarding analyses, conventional LUR and Hybrid Kriging-LUR were performed first to identify important predictor variables. A Deep Neural Network, Random Forest, and XGBoost algorithms were then used to fit the prediction model based on the variables selected by the LUR models. Lastly, data splitting, 10-fold cross validation, external data verification, and seasonal-based and county-based validation methods were applied to verify the robustness of the developed models. The results demonstrated that the proposed conventional LUR and Hybrid Kriging-LUR models captured 65% and 78%, respectively, of NO2 variation. When the XGBoost algorithm was further incorporated in LUR and hybrid-LUR, the explanatory power increased to 84% and 91%, respectively. The Hybrid Kriging-LUR with XGBoost algorithm outperformed all other integrated methods. This study demonstrates the value of combining Hybrid Kriging-LUR model and an XGBoost algorithm to estimate the spatial-temporal variability of NO2 exposure. For practical application, the associations of specific land-use/land cover types selected in the final model can be applied in land-use management and in planning emission reduction strategies.

Original languageEnglish
Article number128411
JournalJournal of Cleaner Production
Publication statusPublished - 2021 Oct 1

All Science Journal Classification (ASJC) codes

  • Renewable Energy, Sustainability and the Environment
  • General Environmental Science
  • Strategy and Management
  • Industrial and Manufacturing Engineering


Dive into the research topics of 'Using land-use machine learning models to estimate daily NO2 concentration variations in Taiwan'. Together they form a unique fingerprint.

Cite this