Tags

Type your tag names separated by a space and hit enter

Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data.
Anal Chim Acta. 2011 Apr 29; 692(1-2):63-72.AC

Abstract

During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice.

Authors+Show Affiliations

Department of Chemistry and Applied Biosciences, ETH Zurich, Switzerland. balabin@org.chem.ethz.chNo affiliation info available

Pub Type(s)

Journal Article

Language

eng

PubMed ID

21501713

Citation

Balabin, Roman M., and Sergey V. Smirnov. "Variable Selection in Near-infrared Spectroscopy: Benchmarking of Feature Selection Methods On Biodiesel Data." Analytica Chimica Acta, vol. 692, no. 1-2, 2011, pp. 63-72.
Balabin RM, Smirnov SV. Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. Anal Chim Acta. 2011;692(1-2):63-72.
Balabin, R. M., & Smirnov, S. V. (2011). Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. Analytica Chimica Acta, 692(1-2), 63-72. https://doi.org/10.1016/j.aca.2011.03.006
Balabin RM, Smirnov SV. Variable Selection in Near-infrared Spectroscopy: Benchmarking of Feature Selection Methods On Biodiesel Data. Anal Chim Acta. 2011 Apr 29;692(1-2):63-72. PubMed PMID: 21501713.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. AU - Balabin,Roman M, AU - Smirnov,Sergey V, Y1 - 2011/03/08/ PY - 2010/11/10/received PY - 2011/02/21/revised PY - 2011/03/01/accepted PY - 2011/4/20/entrez PY - 2011/4/20/pubmed PY - 2011/9/2/medline SP - 63 EP - 72 JF - Analytica chimica acta JO - Anal Chim Acta VL - 692 IS - 1-2 N2 - During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. SN - 1873-4324 UR - https://www.unboundmedicine.com/medline/citation/21501713/Variable_selection_in_near_infrared_spectroscopy:_benchmarking_of_feature_selection_methods_on_biodiesel_data_ L2 - https://linkinghub.elsevier.com/retrieve/pii/S0003-2670(11)00353-9 DB - PRIME DP - Unbound Medicine ER -