Year of Award

2019

Document Type

Thesis

Degree Type

Master of Science (MS)

Degree Name

Computer Science

Department or School/College

Computer Science

Committee Chair

Oliver Serang

Commitee Members

Oliver Serang, Rob Smith, J. Stephen Lodmell

Keywords

de novo, small molecules, algorithms, mass spectrometry, graph isomorphism, glycomics

Subject Categories

Computer Sciences

Abstract

In the analysis of mass spectra, if a superset of the molecules thought to be in a sample is known a priori, then there are well established techniques for the identification of the molecules such as database search and spectral libraries. Linear molecules are chains of subunits. For example, a peptide is a linear molecule with an “alphabet” of 20 possible amino acid subunits. A peptide of length six will have 20⁶ = 64, 000, 000 different possible outcomes. Small molecules, such as sugars and metabolites, are not constrained to linear structures and may branch. These molecules are encoded as undirected graphs rather than simply linear chains. An undirected graph with six subunits (each of which have 20 possible outcomes) will 6 have 20⁶ · 2^{(6 choose 2)} = 2, 097, 152, 000, 000 possible outcomes. The vast amount of complex graphs which small molecules can form can render databases and spectral libraries impossibly large to use or incomplete as many metabolites may still be unidentified. In the absence of a usable database or spectral library, an the alphabet of subunits may be used to connect peaks in the fragmentation spectra; each connection represents a neutral loss of an alphabet mass. This technique is called “de novo sequencing” and relies on the alphabet being known in advance. Often the alphabet of m/z difference values allowed by de novo analysis is not known or is incomplete. A method is proposed that, given fragmentation mass spectra, identifies an alphabet of m/z differences that can build large connected graphs from many intense peaks in each spectrum from a collection. Once an alphabet is obtained, it is informative to find common substructures among the peaks connected by the alphabet. This is the same as finding the largest isomorphic subgraphs on the de novo graphs from all pairs of fragmentation spectra. This maximal subgraph isomorphism problem is a generalization of the subgraph isomorphism problem, which asks whether a graph G₁ has a subgraph isomorphic to a graph G₂ . Subgraph isomorphism is NP-complete. A novel method of efficiently finding common substructures among the subspectra induced by the alphabet is proposed. This method is then combined with a novel form of hashing, eschewing evaluation of all pairs of fragmentation spectra. These methods are generalized to Euclidean graphs embedded in Zⁿ.

Recommended Citation

Kreitzberg, Patrick Anthony, "ZERO-KNOWLEDGE DE NOVO ALGORITHMS FOR ANALYZING SMALL MOLECULES USING MASS SPECTROMETRY" (2019). Graduate Student Theses, Dissertations, & Professional Papers. 11396.
https://scholarworks.umt.edu/etd/11396

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks at University of Montana

Graduate Student Theses, Dissertations, & Professional Papers

ZERO-KNOWLEDGE DE NOVO ALGORITHMS FOR ANALYZING SMALL MOLECULES USING MASS SPECTROMETRY

Year of Award

Document Type

Degree Type

Degree Name

Department or School/College

Committee Chair

Commitee Members

Keywords

Subject Categories

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

ScholarWorks at University of Montana

Graduate Student Theses, Dissertations, & Professional Papers

ZERO-KNOWLEDGE DE NOVO ALGORITHMS FOR ANALYZING SMALL MOLECULES USING MASS SPECTROMETRY

Author

Year of Award

Document Type

Degree Type

Degree Name

Department or School/College

Committee Chair

Commitee Members

Keywords

Subject Categories

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links