Year of Award

2022

Document Type

Thesis

Degree Type

Master of Science (MS)

Degree Name

Computer Science

Department or School/College

Computer Science

Committee Chair

Travis Wheeler

Commitee Members

Jesse Johnson, Robert Hubley

Keywords

transposable elements, subfamily clustering

Publisher

University of Montana

Subject Categories

Bioinformatics

Abstract

Biological sequence annotation is typically performed by aligning a sequence to a database of known sequence elements. For transposable elements, these known sequences represent subfamily consensus sequences. When many of the subfamily models in the database are highly similar to each other, a sequence belonging to one subfamily can easily be mistaken as belonging to another, causing non-reproducible subfamily annotation. Because annotation with subfamilies is expected to give some amount of insight into a sequence’s evolutionary history, it is important that such annotation be reproducible. Here, we present our software tool, SCULU, which builds upon our previously-described methods for computing annotation confidence, and uses those confidence estimates to find and collapse pairs of subfamilies that have a high risk of annotation collision. The result is a reduced set of subfamilies, with increased expected subfamily annotation reliability.

Share

COinS
 

© Copyright 2022 Audrey M. Shingleton