Spatiotemporal continuous estimates of PM2.5 concentrations in China, 2000-2016: A machine learning method with inputs from satellites, chemical transport model, and ground observations.Environ Int. 2019 02; 123:345-357.EI
Ambient exposure to fine particulate matter (PM2.5) is known to harm public health in China. Satellite remote sensing measurements of aerosol optical depth (AOD) were statistically associated with in-situ observations after 2013 to predict PM2.5 concentrations nationwide, while the lack of surface monitoring data before 2013 have created difficulties in historical PM2.5 exposure estimates. Hindcast approaches using statistical models or chemical transport models (CTMs) were developed to overcome this limitation, while those approaches still suffer from incomplete daily coverage due to missing AOD data or limited accuracy due to uncertainties of CTMs. Here we developed a new machine learning (ML) model with high-dimensional expansion (HD-expansion) of numerous predictors (including AOD and other satellite covariates, meteorological variables and CTM simulations). Through comprehensive characterization of the nonlinear effects of, and interactions among different predictors, the HD-expansion parameterized the association between PM2.5 and AOD as a nonlinear function of space and time covariates (e.g., planetary boundary layer height and relative humidity). In this way, the PM2.5-AOD association can vary spatiotemporally. We trained the model with data from 2013 to 2016 and evaluated its performance using annually-iterated cross-validation, which iteratively held out the in-situ observations for a whole calendar year (as testing data) to examine the predictions from a model trained by the rest of the observations. Our estimates were found to be in good agreement with in-situ observations, with correlation coefficients (R2) of 0.61, 0.68, and 0.75 for daily, monthly and annual averages, respectively. To interpolate the missing predictions due to incomplete AOD data, we incorporated a generalized additive model into the ML model. The two-stage estimates of PM2.5 sacrificed the prediction accuracy on a daily timescale (R2 = 0.55), but achieved complete spatiotemporal coverage and improved the accuracy of monthly (R2 = 0.71) and annual (R2 = 0.77) averages. The model was then used to predict daily PM2.5 concentrations during 2000-2016 across China and estimate long-term trends in PM2.5 for the period. We found that population-weighted concentrations of PM2.5 significantly increased, by 2.10 (95% confidence interval (CI): 1.74, 2.46) μg/m3/year during 2000-2007, and rapidly decreased by 4.51 (3.12, 5.90) μg/m3/year during 2013-2016. In this study, we produced AOD-based estimates of historical PM2.5 with complete spatiotemporal coverage, which were evidenced as accurate, particularly in middle and long term. The products could support large-scale epidemiological studies and risk assessments of ambient PM2.5 in China and can be accessed via the website (http://www.meicmodel.org/dataset-phd.html).