Year of Award
Master of Science (MS)
Other Degree Name/Area of Focus
Department or School/College
Dr. Javier Perez Alvaro
Dr. Johnathan Bardsley, Dr. Simona Stanmach
Machine Learning, Random Forest, Multiple Linear Regression, Grid Search, Feature Importance
University of Montana
Flight delays cost airlines and affect passenger’s satisfaction. In this research work, we predicted the daily percentage of delayed flights based on the national weather data using the multiple linear regression and the random forest models. We extracted the passenger flight on-time performance data from the Bureau of Transportation Statistics and the weather dataset from NOAA National Centers for Environmental Information for the years from 2015 to 2019. We used the flight dataset for Seattle airport as the origin. We predicted the daily percentage of delayed flights for the Seattle-originated flights based on the features such as weather conditions of the origin and its top 10 destination airports on the date of flight, weather features of the day before the flight for the origin, the number of daily flights from Seattle to these destinations, year, month, and day of week. We conducted the random forest model by training and rigorously hyper-parameter tuning. We measured the assessment of the fitted model with the evaluation metrics, such as mean absolute error, root mean squared error, and coefficient of determination scores. The random forest model with the evaluation scores of 2.68, 4.08, and 0.79, respectively, outperformed the multiple linear regression model to predict the daily percentage of delayed flights.
Mahmoudi, Parto, "FORECASTING THE DAILY PERCENTAGE OF DELAYED FLIGHTS BASED ON THE NATIONAL WEATHER DATA" (2021). Graduate Student Theses, Dissertations, & Professional Papers. 11710.
© Copyright 2021 Parto Mahmoudi