A feature selection and multi-model fusion-based approach of predicting air quality.ISA Trans 2019IT
With the rapid development of China's industrialization, the air pollution is becoming more and more serious. It is vital for us to predict the air quality for determining the further prevention measures of avoiding the brought disasters. In this paper, we are going to propose an approach of predicting the air quality based on the multiple data features through fusing the multiple machine learning models. The approach takes the meteorological data and air quality data for the past six days as one batch of input (the whole data set is for 46 days) and employs a multi-model fusion to provide an improved 24-hour prediction of PM2.5 pollutant concentration all over Beijing. During the above process, two focal feature groups are composed. The first focal feature group contains the historical meteorological data, while the second group includes the statistical information, the date information and the polynomial variations. Besides the two groups, we complement one million more data items by employing the time sliding means. Among the supplementary data, we select the most critical 500 features with Light Gradient Boosting Machine (LightGBM) model and send the features as the input to Gradient Boosting Decision Tree (GBDT) and LightGBM models. Meanwhile, we screen the most critical 300 features with eXtreme Gradient Boosting (XGBoost) model and send them as the input to the three prediction models. Referring to each of the models, we respectively gain the optimal parameters through grid search methods and then fuse the models' contribution with the linear weighting. The experiments indicate that the proposed approach based on the weighting fusion is better than that provided by a single modeling scheme, and the loss value is 0.4158 under the SMAPE index.