Year of Award
2022
Document Type
Thesis
Degree Type
Master of Science (MS)
Degree Name
Computer Science
Department or School/College
Computer Science
Committee Chair
Travis Wheeler
Commitee Members
Jesse Johnson, Robert Hubley
Keywords
transposable elements, subfamily clustering
Subject Categories
Bioinformatics
Abstract
Biological sequence annotation is typically performed by aligning a sequence to a database of known sequence elements. For transposable elements, these known sequences represent subfamily consensus sequences. When many of the subfamily models in the database are highly similar to each other, a sequence belonging to one subfamily can easily be mistaken as belonging to another, causing non-reproducible subfamily annotation. Because annotation with subfamilies is expected to give some amount of insight into a sequence’s evolutionary history, it is important that such annotation be reproducible. Here, we present our software tool, SCULU, which builds upon our previously-described methods for computing annotation confidence, and uses those confidence estimates to find and collapse pairs of subfamilies that have a high risk of annotation collision. The result is a reduced set of subfamilies, with increased expected subfamily annotation reliability.
Recommended Citation
Shingleton, Audrey M., "SUBFAMILY CLUSTERING USING LABEL UNCERTAINTY (FOR TRANSPOSABLE ELEMENT FAMILIES)" (2022). Graduate Student Theses, Dissertations, & Professional Papers. 11913.
https://scholarworks.umt.edu/etd/11913
Included in
© Copyright 2022 Audrey M. Shingleton