Year of Award

2022

Document Type

Thesis

Degree Type

Master of Science (MS)

Degree Name

Computer Science

Department or School/College

Computer Science

Committee Chair

Travis Wheeler

Commitee Members

Jesse Johnson, Robert Hubley

Keywords

transposable elements, subfamily clustering

Subject Categories

Bioinformatics

Abstract

Biological sequence annotation is typically performed by aligning a sequence to a database of known sequence elements. For transposable elements, these known sequences represent subfamily consensus sequences. When many of the subfamily models in the database are highly similar to each other, a sequence belonging to one subfamily can easily be mistaken as belonging to another, causing non-reproducible subfamily annotation. Because annotation with subfamilies is expected to give some amount of insight into a sequence’s evolutionary history, it is important that such annotation be reproducible. Here, we present our software tool, SCULU, which builds upon our previously-described methods for computing annotation confidence, and uses those confidence estimates to find and collapse pairs of subfamilies that have a high risk of annotation collision. The result is a reduced set of subfamilies, with increased expected subfamily annotation reliability.

Share

COinS
 

© Copyright 2022 Audrey M. Shingleton