Year of Award
2021
Document Type
Thesis
Degree Type
Master of Science (MS)
Degree Name
Computer Science
Department or School/College
Computer Science
Committee Chair
Oliver Serang
Commitee Members
Douglas Brinkerhoff, Eric Chesebro
Keywords
Protein inference, bioinformatics, statistics, machine learning, proteomics, protein standard
Subject Categories
Applied Statistics | Bioinformatics | Biostatistics | Computational Biology | Data Science | Numerical Analysis and Scientific Computing | Other Computer Sciences
Abstract
The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in that they have been carefully prepared to contain only the proteins specified in the target set. Though this helps, it is still unclear which metrics most adequately capture all the important aspects of a good protein inference method. In this manuscript, a novel protein standard dataset, an ensemble protein inference engine that utilizes several metrics and protein standard datasets to evaluate the performance of inference methods, and several novel protein inference methods are presented.
Recommended Citation
Lucke, Kyle Lee, "ENSEMBLE PROTEIN INFERENCE EVALUATION" (2021). Graduate Student Theses, Dissertations, & Professional Papers. 11845.
https://scholarworks.umt.edu/etd/11845
Included in
Applied Statistics Commons, Bioinformatics Commons, Biostatistics Commons, Computational Biology Commons, Data Science Commons, Numerical Analysis and Scientific Computing Commons, Other Computer Sciences Commons
© Copyright 2021 Kyle Lee Lucke