Master of Science (MS)

Data Science

Mathematical Sciences

Dr. Javier Perez Alvaro

Dr. Johnathan Bardsley, Dr. Simona Stanmach


Machine Learning, Random Forest, Multiple Linear Regression, Grid Search, Feature Importance


University of Montana

Data Science


Flight delays cost airlines and affect passenger’s satisfaction. In this research work, we predicted the daily percentage of delayed flights based on the national weather data using the multiple linear regression and the random forest models. We extracted the passenger flight on-time performance data from the Bureau of Transportation Statistics and the weather dataset from NOAA National Centers for Environmental Information for the years from 2015 to 2019. We used the flight dataset for Seattle airport as the origin. We predicted the daily percentage of delayed flights for the Seattle-originated flights based on the features such as weather conditions of the origin and its top 10 destination airports on the date of flight, weather features of the day before the flight for the origin, the number of daily flights from Seattle to these destinations, year, month, and day of week. We conducted the random forest model by training and rigorously hyper-parameter tuning. We measured the assessment of the fitted model with the evaluation metrics, such as mean absolute error, root mean squared error, and coefficient of determination scores. The random forest model with the evaluation scores of 2.68, 4.08, and 0.79, respectively, outperformed the multiple linear regression model to predict the daily percentage of delayed flights.

Data Science Commons



© Copyright 2021 Parto Mahmoudi