Flight Prices Prediction - Regression

Titan Bagus Bramantyo

Sosial Media


9 orang menyukai ini
Suka

Summary

For many people, air travel is one of the preferred ways of transportation. Many people choose airplane transportation because of the quick trip time. For stakeholders, the process of predicting flight prices is important in order to keep service prices so that people can enjoy airline services.

Analyst : https://linkedin.com/in/titanbr

Description

Background

The price of flight services has increased significantly for both domestic and international flights, even to the point of being considered unreasonable for some people. Airlines adjust ticket price to maintain operational stability and ensure the regional connectivity is not disrupted. In addition to the services delivered, a raise of flight ticket prices is influenced by a rising in aviation fuel prices. As a result, a system that predicts flight ticket prices using various parameters from time to time is necessary.

 

Problem Statement

  • How do we construct a regression-based flight price prediction system?
  • Which algorithm is the most appropriate to be used in the process of predicting flight prices?

 

Motivation

The result of the research will potentially is being used to help stakeholders identify the factors that cause flight tickets to be expensive. Stakeholders can maintain price stability by understanding the factors that cause flight price inflation, allowing people to keep enjoying airline services.

 

Analysis Goal

  • Build a flight price prediction application system.
  • Obtain a suitable algorithm for process in flight price prediction.

 

Method

1. Data acquisition

In this study, I used an open source dataset available on the Kaggle platform entitled Flight Price Prediction released by Shubham Bathwal. The dataset used has 12 feature columns and there are 300 thousand records. (https://www.kaggle.com/datasets/shubhambathwal/flight-price-prediction).

dataset

2. Data pre-processing

In this step, I try to remove features that are not relevant for research such as 'flight' and 'days_left' for example. Another process is checking whether the dataset contains an empty value.

And the process continues to label encoding process, because machine couldn't process any data except in numerical format.

3. Separating target and variable features

The next process is to cut the dataset into two, namely target data and variable data. the purpose of this cut is as a process of preparing variables to be applied to algorithm testing.

4. Principal component analysis (PCA)

The PCA process is used to 'encapsulate' some features into the desired number. As in this case, the dataset has 11 features, but to be applied to the regression model it can only accommodate 2 features.

5. Training and testing model

To start building a regression model, what needs to be done beforehand is to divide the portion of the dataset into training and testing datasets. In this case, I set the portion 20% for testing and 80% for training.

6. Evaluate model

Here are the model evaluation for each algorithm.

 

Analysis Result

Here is the result of those algorithms.

AlgorithmRMSE ResultAccuracy
Simple Linear Regression4, 34100%
Support Vector Regression0, 06999, 8%
KNN Regression0, 03199, 96%
Multi-layer Perceptron Regression0, 01599, 99%

The smaller the RMSE value, the better. That way the MLP Regression algorithm is the best algorithm among the four algorithms tested.

 

Repository access : https://github.com/katibpasha/flight-prices-regression

Informasi Course Terkait
  Kategori: Data Science / Big Data
  Course: Machine Learning For Beginner