Title

COMPUTATIONAL PRESERVATION OF THE BLACKFEET LANGUAGE USING MACHINE LEARNING ALGORITHMS

Presenter Information

Michael Jacobi

Presentation Type

Presentation

Abstract

Through investigating the audio features of sounds and different machine learning algorithms, we aim to develop a computational framework that automatically identifies and extracts desired sounds from audio clips of the Blackfeet language. The data acquired from this framework will be used to compile a database that will facilitate the digital preservation of the language. Many machine learning algorithms require training data to learn from, and test data to apply that knowledge on. The first step of this project was to create training data by manually identifying occurrences of a desired sound and associating them with sets of quantitative sound features. The next step is to identify a set of audio features that best characterizes the desired sound. This is accomplished through understanding and applying related research results, manual analysis of the training data, and trial and error. The quality of characterization is measured by the percentage of correctly characterized sounds, given a set of audio features and a learning algorithm. This is the first computational linguistic system applied to the Blackfeet language. If successful, similar systems can be implemented for other indigenous languages. Blackfeet is a local Montanan, Native American language that is critically endangered with only 5000 speakers in Canada and 100 in US, most of whom are elderly. Therefore, it is vitally important to preserve this language.

Category

Physical Sciences

This document is currently not available here.

Share

COinS
 
Apr 15th, 9:20 AM Apr 15th, 9:40 AM

COMPUTATIONAL PRESERVATION OF THE BLACKFEET LANGUAGE USING MACHINE LEARNING ALGORITHMS

UC 327

Through investigating the audio features of sounds and different machine learning algorithms, we aim to develop a computational framework that automatically identifies and extracts desired sounds from audio clips of the Blackfeet language. The data acquired from this framework will be used to compile a database that will facilitate the digital preservation of the language. Many machine learning algorithms require training data to learn from, and test data to apply that knowledge on. The first step of this project was to create training data by manually identifying occurrences of a desired sound and associating them with sets of quantitative sound features. The next step is to identify a set of audio features that best characterizes the desired sound. This is accomplished through understanding and applying related research results, manual analysis of the training data, and trial and error. The quality of characterization is measured by the percentage of correctly characterized sounds, given a set of audio features and a learning algorithm. This is the first computational linguistic system applied to the Blackfeet language. If successful, similar systems can be implemented for other indigenous languages. Blackfeet is a local Montanan, Native American language that is critically endangered with only 5000 speakers in Canada and 100 in US, most of whom are elderly. Therefore, it is vitally important to preserve this language.