# The Alphabet Projection of Mass Spectrometry Data

## Presentation Type

Oral Presentation

## Abstract/Artist Statement

My presentation will be about finding small molecules in mass spectrometry (MS) data. There is a wide breadth of future applications for this technique but the most impactful may be in drug testing. This method can be used to find what a drug has metabolized into (it could be a harmful poison or a therapeutic chemical) after it has interacted with a patient's physiology. MS is a technique used to find the mass of objects (e.g. molecules and amino acids) which are too small to be weighed through conventional means. The data is measured in mass vs intensity; intensity can be thought of as the abundance of the mass in the sample. The data looks like a series of peaks where a peak is present if there is a mass found at that value and the height is proportional to the intensity of that mass. We use the mass difference between peaks to find molecules that are either too small to be found by MS or have disappeared from the sample before the MS process began. We identify a set of the most important mass differentials, which we call an alphabet, that connect many masses in the MS data. There are methods which currently use an already known alphabet to connect a graph, but such an alphabet may be so large as to be unusable for certain data sets such as urine analysis. We are the first to propose a method which finds such an alphabet without any knowledge of the data a priori.

In order to find the most important masses in the MS data we represent the masses as vertices in a graph. We connect the two vertices if there is a mass in the alphabet equal to the difference between the two vertices. The larger the graph it builds the more important the masses in the alphabet are. This comes from the idea that chain reactions are important. If many masses all lose a sugar mass than the sugar must be important to the sample. If those smaller masses then lose a water molecule, water and sugar combined are important. From the MS data millions of mass differentials may be calculated, we project these millions down to the most important ones. Usually we project down to between 32 and 128.

The alphabets from which we build the graphs are determined randomly. Each mass differential is picked by choosing two random masses and taking the difference between them. Then we have a model which calculates what we think the quality of the graph(s) made by the alphabet are. If the quality of a set of graphs produced by one alphabet is better than produced by another, we keep the first. After proposing random masses, building the graphs, and then accepting the best alphabets many times the best alphabet will begin to converge to a final answer, meaning we can not find an alphabet which produces better graphs.

As stated above the most significant application may be in the field of drug testing. But this method can be used anytime you are unsure of what may be contained in a sample. The TSA can use this method to find which potential bomb-making chemicals to look for. This would be done by making a bomb, take MS data then use our method to find which chemicals are prevalent in the sample. Another biological application may be in diagnosis. If we take a urine sample from a patient there may be a molecule which shows up in our alphabet that can determine whether the patient is diabetic, experiencing kidney failure, is pregnant, etc.

p { margin-bottom: 0.1in; line-height: 120%; }

Oliver Serang

## Share

COinS

Feb 22nd, 9:40 AM Feb 22nd, 9:55 AM

The Alphabet Projection of Mass Spectrometry Data

My presentation will be about finding small molecules in mass spectrometry (MS) data. There is a wide breadth of future applications for this technique but the most impactful may be in drug testing. This method can be used to find what a drug has metabolized into (it could be a harmful poison or a therapeutic chemical) after it has interacted with a patient's physiology. MS is a technique used to find the mass of objects (e.g. molecules and amino acids) which are too small to be weighed through conventional means. The data is measured in mass vs intensity; intensity can be thought of as the abundance of the mass in the sample. The data looks like a series of peaks where a peak is present if there is a mass found at that value and the height is proportional to the intensity of that mass. We use the mass difference between peaks to find molecules that are either too small to be found by MS or have disappeared from the sample before the MS process began. We identify a set of the most important mass differentials, which we call an alphabet, that connect many masses in the MS data. There are methods which currently use an already known alphabet to connect a graph, but such an alphabet may be so large as to be unusable for certain data sets such as urine analysis. We are the first to propose a method which finds such an alphabet without any knowledge of the data a priori.

In order to find the most important masses in the MS data we represent the masses as vertices in a graph. We connect the two vertices if there is a mass in the alphabet equal to the difference between the two vertices. The larger the graph it builds the more important the masses in the alphabet are. This comes from the idea that chain reactions are important. If many masses all lose a sugar mass than the sugar must be important to the sample. If those smaller masses then lose a water molecule, water and sugar combined are important. From the MS data millions of mass differentials may be calculated, we project these millions down to the most important ones. Usually we project down to between 32 and 128.

The alphabets from which we build the graphs are determined randomly. Each mass differential is picked by choosing two random masses and taking the difference between them. Then we have a model which calculates what we think the quality of the graph(s) made by the alphabet are. If the quality of a set of graphs produced by one alphabet is better than produced by another, we keep the first. After proposing random masses, building the graphs, and then accepting the best alphabets many times the best alphabet will begin to converge to a final answer, meaning we can not find an alphabet which produces better graphs.

As stated above the most significant application may be in the field of drug testing. But this method can be used anytime you are unsure of what may be contained in a sample. The TSA can use this method to find which potential bomb-making chemicals to look for. This would be done by making a bomb, take MS data then use our method to find which chemicals are prevalent in the sample. Another biological application may be in diagnosis. If we take a urine sample from a patient there may be a molecule which shows up in our alphabet that can determine whether the patient is diabetic, experiencing kidney failure, is pregnant, etc.

p { margin-bottom: 0.1in; line-height: 120%; }