Abstract #658

# 658
ADSA®-EAAP Speaker Exchange Presentation: Comparison of 3 different variable selection strategies to improve the predictions of fatty acid profile in bovine milk by mid-infrared spectrometry.
Hélène Soyeurt*1, Yves Brostaux1, Frédéric Dehareng2, Nicolas Gengler1, Pierre Dardenne2, 1University of Liège-Gembloux Agro-Bio Tech, Gembloux,Belgium, 2Walloon Agricultural Research Centre, Gembloux,Belgium.

Mid-infrared (MIR) spectrometry is used to provide phenotypes related to the milk composition. Foss spectrum contains 1,060 datapoints. The number of reference values required to build a calibration equation is often lower than the spectral variables mainly due to the cost of chemical analysis. Problems of collinearity and overfitting appear when this high dimensional data set is used. This research will study the interest of using variable selection (VS) approach before the use of partial least square regression (PLS). The data set included 1,236 milk spectra related to their fatty acid (FA) contents. Saturated (SFA), monounsaturated (MUFA), polyunsaturated (PUFA), short chain (SCFA), medium chain (MCFA), and long chain FA (LCFA) were studied. The data set was randomly divided in 3 groups which were used to create 3 calibration and validation data sets. Three different VS methods were compared. The first strategy was based on the part of trait variability explained by each considered variables (R2VS). The second method was based on the regression coefficient estimated after PLS procedure divided by the standard deviation of the considered spectral variable (BSVS). The third strategy permitted to underline the uninformative variables which were the ones having the lowest ratio of average regression coefficient to their corresponding standard deviation estimated after a leave-one out cross-validation (UVEVS). For UVEVS and BSVS, the cutoff was determined from the known uninformative region of MIR milk spectrum. The cutoff for R2VS was determined by testing different thresholds ranged between 5 and 40%. The most interesting cutoff for R2VS was 25%. The worst results in terms of validation root mean square error of prediction (RMSEPv) were obtained using a full PLS (i.e., without VS). The maximum difference (g/dl of milk) of RMSEPv obtained from the full PLS and from the PLS using selected variables were 0.156 for SFA, 0.139 for MUFA, 0.011 for PUFA, 0.025 for SCFA, 0.164 for MCFA, and 0.188 for LCFA. R2VS gave the best results for all studied traits followed by UVEVS and then BSVS. In conclusion, the use of VS improved significantly the performance of FA MIR equations.

Key Words: milk, fatty acid, infrared