It is well known benzene negatively impacts human health. This study is the first to predict spatial-temporal variations in benzene concentrations for the entirety of Taiwan by using a mixed spatial prediction model integrating multiple machine learning algorithms and predictor variables selected by Land-use Regression (LUR). Monthly benzene concentrations from 2003 to 2019 were utilized for model development, and monthly benzene concentration data from 2020, as well as mobile monitoring vehicle data from 2009 to 2019, served as external data for verifying model reliability. Benzene concentrations were estimated by running six LUR-based machine learning algorithms; these algorithms, which include random forest (RF), deep neural network (DNN), gradient boosting (GBoost), light gradient boosting (LightGBM), CatBoost, extreme gradient boosting (XGBoost), and ensemble algorithms (a combination of the three best performing models), can capture how nonlinear observations and predictions are related. The results indicated conventional LUR captured 79% of the variability in benzene concentrations. Notably, the LUR with ensemble algorithm (GBoost, CatBoost, and XGBoost) surpassed all other integrated methods, increasing the explanatory power to 92%. This study establishes the value of the proposed ensemble-based model for estimating spatiotemporal variation in benzene exposure.
All Science Journal Classification (ASJC) codes
- Environmental Engineering
- Environmental Chemistry
- Public Health, Environmental and Occupational Health
- Health, Toxicology and Mutagenesis