Background: For indoor air modelling, difficulties in collecting indoor parameters including life activity patterns and building characteristics are dilemmas when conducting a large-area study. Land-use/land cover information which is easier to obtain could represent as surrogates of emission sources for assessing indoor air quality. Moreover, low-cost sensors and machine learning provide a better way to enhance model accuracy. Objectives: This study proposed an alternative estimation approach to assess daily PM2.5 concentration for indoor environments of schools in a large area by integrating low-cost sensors, land-use/land cover predictors, and machine learning-based modelling approaches. Methods: Indoor PM2.5 data was collected from 145 indoor AirBox sensors in Kaohsiung and Pingtung Counties of Taiwan. Geospatial predictors were extracted from the circular buffers surrounding each AirBox sensor. Spearman correlation analysis and stepwise variable selection procedures were performed to select variables for land-use regression (LUR) and integrated with XGBoost, Random Forest (RF), and LGBM machine learning models. Results: The results revealed that outdoor PM2.5 and distance to the nearest thermal power plant were the main determinants of indoor estimation variations, when there were no indoor sources. When incorporating machine learning, the R2 increased from 0.59 for LUR to 0.85 for LUR-XGBoost while the RMSE decreased from 8.63 to 5.27 μg/m3, which performed better than both LUR-RF and LUR-LGBM. Conclusions: This study demonstrates the value of the proposed alternative approach by incorporating data from a low-cost sensor with LUR model and machine learning algorithm in estimating the spatiotemporal variability of indoor PM2.5 for a large area.
All Science Journal Classification (ASJC) codes