Year of Award

2021

Document Type

Thesis

Degree Type

Master of Science (MS)

Degree Name

Computer Science

Department or School/College

Computer Science

Committee Chair

Oliver Serang

Commitee Members

Douglas Brinkerhoff, Eric Chesebro

Keywords

Protein inference, bioinformatics, statistics, machine learning, proteomics, protein standard

Subject Categories

Applied Statistics | Bioinformatics | Biostatistics | Computational Biology | Data Science | Numerical Analysis and Scientific Computing | Other Computer Sciences

Abstract

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in that they have been carefully prepared to contain only the proteins specified in the target set. Though this helps, it is still unclear which metrics most adequately capture all the important aspects of a good protein inference method. In this manuscript, a novel protein standard dataset, an ensemble protein inference engine that utilizes several metrics and protein standard datasets to evaluate the performance of inference methods, and several novel protein inference methods are presented.

Share

COinS
 

© Copyright 2021 Kyle Lee Lucke