Year of Award
2021
Document Type
Thesis
Degree Type
Master of Science (MS)
Other Degree Name/Area of Focus
Data Science
Department or School/College
Mathematical Sciences
Committee Chair
Dr. Javier Perez Alvaro
Commitee Members
Dr. Johnathan Bardsley, Dr. Simona Stanmach
Keywords
Machine Learning, Random Forest, Multiple Linear Regression, Grid Search, Feature Importance
Subject Categories
Data Science
Abstract
Flight delays cost airlines and affect passenger’s satisfaction. In this research work, we predicted the daily percentage of delayed flights based on the national weather data using the multiple linear regression and the random forest models. We extracted the passenger flight on-time performance data from the Bureau of Transportation Statistics and the weather dataset from NOAA National Centers for Environmental Information for the years from 2015 to 2019. We used the flight dataset for Seattle airport as the origin. We predicted the daily percentage of delayed flights for the Seattle-originated flights based on the features such as weather conditions of the origin and its top 10 destination airports on the date of flight, weather features of the day before the flight for the origin, the number of daily flights from Seattle to these destinations, year, month, and day of week. We conducted the random forest model by training and rigorously hyper-parameter tuning. We measured the assessment of the fitted model with the evaluation metrics, such as mean absolute error, root mean squared error, and coefficient of determination scores. The random forest model with the evaluation scores of 2.68, 4.08, and 0.79, respectively, outperformed the multiple linear regression model to predict the daily percentage of delayed flights.
Recommended Citation
Mahmoudi, Parto, "FORECASTING THE DAILY PERCENTAGE OF DELAYED FLIGHTS BASED ON THE NATIONAL WEATHER DATA" (2021). Graduate Student Theses, Dissertations, & Professional Papers. 11710.
https://scholarworks.umt.edu/etd/11710
Included in
© Copyright 2021 Parto Mahmoudi