Year of Award
2022
Document Type
Thesis
Degree Type
Master of Science (MS)
Degree Name
Computer Science
Department or School/College
Computer Science
Committee Chair
Travis Wheeler
Commitee Members
Jesse Johnson, Robert Hubley
Keywords
transposable elements, subfamily clustering
Publisher
University of Montana
Subject Categories
Bioinformatics
Abstract
Biological sequence annotation is typically performed by aligning a sequence to a database of known sequence elements. For transposable elements, these known sequences represent subfamily consensus sequences. When many of the subfamily models in the database are highly similar to each other, a sequence belonging to one subfamily can easily be mistaken as belonging to another, causing non-reproducible subfamily annotation. Because annotation with subfamilies is expected to give some amount of insight into a sequence’s evolutionary history, it is important that such annotation be reproducible. Here, we present our software tool, SCULU, which builds upon our previously-described methods for computing annotation confidence, and uses those confidence estimates to find and collapse pairs of subfamilies that have a high risk of annotation collision. The result is a reduced set of subfamilies, with increased expected subfamily annotation reliability.
Recommended Citation
Shingleton, Audrey M., "SUBFAMILY CLUSTERING USING LABEL UNCERTAINTY (FOR TRANSPOSABLE ELEMENT FAMILIES)" (2022). Graduate Student Theses, Dissertations, & Professional Papers. 11913.
https://scholarworks.umt.edu/etd/11913
Included in
© Copyright 2022 Audrey M. Shingleton