Presentation Type

Oral Presentation

Category

STEM (science, technology, engineering, mathematics)

Abstract/Artist Statement

Identification of bumble bees is a time-consuming process that often involves lethal collections of specimens for microscopic identification. With declining bumble bee populations, new methods of identification are in demand. High resolution cameras have provided taxonomists with an alternative to lethal captures. High definition photographs provide a clear source of key identifying features, but still requires the evaluation of hundreds of combinations of features to correctly identify specimens.

Machine learning uses decision trees to classify large amounts of data in short periods of time. Much like taxonomists, the program learns to identify a specimen using combinations of characteristics. Two methods of machine learning were compared to explore the effectiveness and accuracy of identification of bumble bee species using machine learning. Random forest modeling (RFM) builds a forest of decision trees to classify data based on the consensus of trees in the forest. Recursive partitioning (RP) builds a single tree that classifies data by sorting similar characteristics together until a majority is reached.

In this experiment, RFM and RP were used to identify species of the bumble bee genus Bombus using characteristics such as thorax pattern, face length, and abdominal segment colors. RFM had an accuracy of 93% while RP had an accuracy of 88%. Ecological observation data is often not equally distributed with and this can affect the accuracy of some models. RFM showed no bias towards species with fewer observations in the data, suggesting it may be a better model for data sets with unequal distribution of observations. RP models showed significant bias towards species with more observations, suggesting it was less likely to correctly predict on species it encountered less frequently. My study indicates using machine learning may streamline identification of bumble bees and reduce or remove the need for lethal collections.

Mentor Name

Diana Six

Personal Statement

The use of Random Forest or Recursive Partitioning in identification of insects has many practical applications. Citizen (or community) science projects collect large amounts of data that can vary from environmental to specimen specific information. Using trained models can allow researchers to analyze this data in a time efficient way. Instead of lethal collections to identify specimens, a properly trained model may be able to identify specimens based on characteristics that citizen scientists, students and researchers can easily collect. These methods are applicable to other species and can be expanded to include more features. With 24 species of bees in Montana and diverse physical characteristics within species, identification is difficult. Citizen science projects in the Pacific Northwest have been successful in monitoring pollinator populations using photographs that contain identifying features. By using machine learning, community scientists can input data into apps and receive immediate identifications. This not only expands the knowledge of those involved, it substantially streamlines the process of data analysis. This presentation provides a good introduction to machine learning as well as an application of these models. Communicating science is a passion of mine, and allowing me the opportunity to present to a non-science audience on a topic that is outside of my main field will challenge my abilities and I am eager to meet that challenge.

GradCon2.mp4 (68294 kB)
Video Presentation

Share

COinS
 

Using classification trees to identify bumble bees

Identification of bumble bees is a time-consuming process that often involves lethal collections of specimens for microscopic identification. With declining bumble bee populations, new methods of identification are in demand. High resolution cameras have provided taxonomists with an alternative to lethal captures. High definition photographs provide a clear source of key identifying features, but still requires the evaluation of hundreds of combinations of features to correctly identify specimens.

Machine learning uses decision trees to classify large amounts of data in short periods of time. Much like taxonomists, the program learns to identify a specimen using combinations of characteristics. Two methods of machine learning were compared to explore the effectiveness and accuracy of identification of bumble bee species using machine learning. Random forest modeling (RFM) builds a forest of decision trees to classify data based on the consensus of trees in the forest. Recursive partitioning (RP) builds a single tree that classifies data by sorting similar characteristics together until a majority is reached.

In this experiment, RFM and RP were used to identify species of the bumble bee genus Bombus using characteristics such as thorax pattern, face length, and abdominal segment colors. RFM had an accuracy of 93% while RP had an accuracy of 88%. Ecological observation data is often not equally distributed with and this can affect the accuracy of some models. RFM showed no bias towards species with fewer observations in the data, suggesting it may be a better model for data sets with unequal distribution of observations. RP models showed significant bias towards species with more observations, suggesting it was less likely to correctly predict on species it encountered less frequently. My study indicates using machine learning may streamline identification of bumble bees and reduce or remove the need for lethal collections.