Abstract #T368

# T368
Identification and removal of outliers in feed databases for beef cattle.
Huyen Tran*1, William Weiss2, Galen Erickson3, Phillip S. Miller3, 1National Animal Nutrition Program, University of Kentucky, Lexington, KY, 2The Ohio State University, Wooster, OH, 3University of Nebraska, Lincoln, NE.

Accurate feed composition data are critical for diet formulation and determination of nutrient requirements of animals. Large feed databases are available; however, they often contain misidentified feed names and can have biased nutritive values. The first 2 objectives of this project were to identify and characterize outliers in feed databases and to develop feed composition tables for beef cattle. Approximately 1.5 million feed composition records provided by 3 commercial laboratories were sorted, screened, and reclassified. Histograms were used to visualize sample distribution. For most forages, feeds were classified as haylage or hay when DM <70 or ≥70, respectively. Grains were classified as high moisture (DM <80%) and dry grains (DM ≥80%). Any nutrient with a value outside mean ± 3.5 SD was removed (method A). Data were analyzed by laboratory before individual means and variance were weighted for sample size for calculation of the overall mean and SD. The third objective was to compare performance of method A to a combination of univariate and multivariate approaches for identifying outliers. Fifteen feeds were randomly selected representing grains, forages, byproducts, and oilseeds and screened for outliers. Feeds with missing key nutrients were removed. Principal component and clustering analyses of SAS were used in the multivariate approach. Among 1.5 million data classified as 352 feeds, 45.7% of the data seemed to be misidentified, leaving 196 feeds for analysis. Outliers were characterized as inaccurate DM classification, transformation of data, decimal point issues, erroneous data, or terminology inconsistency. Method A removed 1.4% of samples and decreased means by 0.9% and SD by 15.2% for CP. The multivariate analysis removed a larger percentage of samples (33.3%) and decreased means by 1.6% and SD by 38.8% for CP. Clustering analysis defined 8 of 15 feeds with >1 cluster. The multivariate method was powerful in decreasing the SD and clustering feeds. Removing outliers based on 3.5 SD (Method A) was simple to use; but this method was inefficient in clustering feeds classified by economic values or maturity. A National Research Support project supported by USDA-NIFA and the State Agricultural Experiment Stations.

Key Words: feed composition, data processing, outlier mining