Oral Presentations and Performances: Session I

Project Type

Presentation

Project Funding and Affiliations

NIH

Faculty Mentor’s Full Name

Lucia Williams

Faculty Mentor’s Department

Computer Science

Abstract / Artist's Statement

Analyzing genetic diversity within a host is crucial for understanding disease evolution and population genetics. However, accurately assessing the genetic subgroups within microscopic populations is computationally expensive and complex. Our project addresses this problem by developing a pipeline that uses graph theory and integer linear programming to assess viral strain composition, with the hope that in the future this technique could be used on organisms with longer genomes. We start by taking raw viral genomes, cleaning them, and building a De Bruijn graph that captures genetic similarities and differences. The edges are then cleaned and compressed, deleting redundant information together. We add a super source and sink so that all inputs and outputs can be reached from two nodes. Our ILP model then finds paths through this graph, while estimating the weight (population size) of each path. These paths are traced and potential strains are output. This work continues research done at MSU by Lucia Williams and Brendan Mumey, applying their graph theory work on flow decomposition to a practical biological problem. This project uses Snakemake and Rust to create an intelligent pipeline that automatically manages job dependencies, and the flow of information needed to output potential strains. The goal is that with further development, researchers from different disciplines could use this pipeline to accurately analyze the genetic diversity of more complex organisms with more accuracy than currently possible.

Category

Life Sciences

Share

COinS
 
Apr 17th, 10:15 AM Apr 17th, 10:30 AM

Examining Genetic Diversity with Rust and Snakemake

UC 329

Analyzing genetic diversity within a host is crucial for understanding disease evolution and population genetics. However, accurately assessing the genetic subgroups within microscopic populations is computationally expensive and complex. Our project addresses this problem by developing a pipeline that uses graph theory and integer linear programming to assess viral strain composition, with the hope that in the future this technique could be used on organisms with longer genomes. We start by taking raw viral genomes, cleaning them, and building a De Bruijn graph that captures genetic similarities and differences. The edges are then cleaned and compressed, deleting redundant information together. We add a super source and sink so that all inputs and outputs can be reached from two nodes. Our ILP model then finds paths through this graph, while estimating the weight (population size) of each path. These paths are traced and potential strains are output. This work continues research done at MSU by Lucia Williams and Brendan Mumey, applying their graph theory work on flow decomposition to a practical biological problem. This project uses Snakemake and Rust to create an intelligent pipeline that automatically manages job dependencies, and the flow of information needed to output potential strains. The goal is that with further development, researchers from different disciplines could use this pipeline to accurately analyze the genetic diversity of more complex organisms with more accuracy than currently possible.