Presenter Information

Jed SyrenneFollow

Presentation Type

Poster

Faculty Mentor’s Full Name

Dr. Mark Grimes

Faculty Mentor’s Department

DBS

Abstract

In statistical clustering, proteins that cluster together are likely to possess a functional relationship with each other. By statistically clustering and filtering proteomic data, networks can be created so that the vast perplexity of protein-protein interaction data can be understood and meaningfully analyzed. Here, glioblastoma and glioblastoma multiforme phosphorylation data was obtained from PhosphoSitePlus and subsequently analyzed using R. The binary data were input into a dataframe and collapsed by their gene names. The Spearman-Euclidean and Euclidean distances were then calculated, with t-stochastic neighbor embedding being performed separately on the outputs. The results were then divided into discrete clusters. Offensively large clusters were broken down to a manageable size via a penalized matrix decomposition. The rank of the penalized matrix decomposition was determined by interpolating values of the data cluster using DINEOF, running PCA on the populated dataframe, plotting the number of principle components against the proportion of variance explained, and finally choosing the point of diminishing returns that still explained over 90% of the variance. Clusters were transformed into network and then visualized in Cytoscape. The final networks represent a useful tool for researchers concerned with protein-protein interactions in glioblastomas. Work is being done to integrate these networks with those obtained from mass spectrometry peak intensities, allowing meaningful analysis of legacy datasets.

Category

Life Sciences

Share

COinS
 
Apr 27th, 3:00 PM Apr 27th, 4:00 PM

Statistical Clustering of Glioblastoma Multiforme for Graph Theory Analysis

UC South Ballroom

In statistical clustering, proteins that cluster together are likely to possess a functional relationship with each other. By statistically clustering and filtering proteomic data, networks can be created so that the vast perplexity of protein-protein interaction data can be understood and meaningfully analyzed. Here, glioblastoma and glioblastoma multiforme phosphorylation data was obtained from PhosphoSitePlus and subsequently analyzed using R. The binary data were input into a dataframe and collapsed by their gene names. The Spearman-Euclidean and Euclidean distances were then calculated, with t-stochastic neighbor embedding being performed separately on the outputs. The results were then divided into discrete clusters. Offensively large clusters were broken down to a manageable size via a penalized matrix decomposition. The rank of the penalized matrix decomposition was determined by interpolating values of the data cluster using DINEOF, running PCA on the populated dataframe, plotting the number of principle components against the proportion of variance explained, and finally choosing the point of diminishing returns that still explained over 90% of the variance. Clusters were transformed into network and then visualized in Cytoscape. The final networks represent a useful tool for researchers concerned with protein-protein interactions in glioblastomas. Work is being done to integrate these networks with those obtained from mass spectrometry peak intensities, allowing meaningful analysis of legacy datasets.