Employee Salary Prediction with Linear Regression

Arief Rachman Hakim

Sosial Media


1 orang menyukai ini
Suka

Summary

As an employee, we must know several factors that can affect our Salary. One of the factors that can affect our Salary is the Years of Experience. We all know that if we have more experience in one subject we can get more Salary too. So, for that case, I will make some predictions using the Kaggle dataset that covered relations between the Year of experience and Salary.

Description

Resource for this project :

The library that I used :

  • pandas
  • numpy
  • matplotlib
  • sklearn
  • drive (for mounting gdrive storage)

 

Tutorial :

1. Download dataset

This dataset covered how a year of experience can affect the Salary

 

2. Import Library

The first step import python libraries that we need, For list of library I need is :

  • pandas
  • numpy
  • matplotlib
  • sklearn
  • drive (for mounting gdrive storage)

 

 

3. Read Dataset

Now we read the dataset with pandas library. As we can see, the dataset has 2 feature there are YearExperience and Salary.

 

4. EDA

Now perform an Exploratory Data Analysis. In Exploratory Data Analysis, firstly we check that there are Null values present or not, then check the information of the data, then describe the data which shows the mean value, standard deviation value, minimum value, Maximum value etc. 

As we can see, the data doesn’t have null value

And for data type for feature is float for YearExperience the int64 for Salary

Now visualize the data YearExperience and Salary using the matlplotlib scatter plot function

 

5. Prepare Data

On preparing the data, we divide the data into the independent and dependent features. X stores the independent feature (YearExperience) and y stores the dependent feature (Salary)

 

6. Split Data

Then Split the data into the training and testing using the train_test_split function which takes some of the parameters like X, y, random_state, test_size. X is an independent feature and y is the dependent feature, random_state used for randomly selecting the data and test_ size used for dividing the data into the training and testing.

 

7. Define the Model

Now define the LinearRegression model with by default parameters and trained LinearRegression model with training data ( X_train and Y_train ). And test the model using the testing data (X_test). and display the predicted and actual data.

Now calculate the difference between the actual salary value and the predicted salary value and make a DataFrame and show the data of actual salary, predicted salary and the difference between the actual salary and predicted salary

 

8. Visualize Model

Now visualize the training data, draw the best fit line and Plot all the training points of the training data and see the bias. Bias is the difference between the best fit line and the training point. Bias is the difference between the best fit line and the training point. This difference is called the Bias (error).

Now visualize the testing data, draw the best fit line and Plot all the testing points of the testing data and see the bias.

 

9. Model Evaluation

Check the accuracy of the model which is near 98% accuracy on the testing data and also check the mean squared error and r2_score using the actual data and predicted data.

 

I use rmse and r2 score because it’s the best model evaluation for regression type

 

10. Prediction with custom data

Now the last step is to test on the custom data so I gave 3 different years of experience to my model, there are 3, 4, and 5. Then check what prediction for the salary of the 3 different years of experienced employee. So, this is the predictions on 3 different year of experience : 

  • 3 year = 54851 thousands
  • 4 year = 63672 thousands
  • 5 year = 72492 thousands

In conclusion, the year of the experience of the employee can affect how big salary the employee can get

 

11. Code Documentation

https://drive.google.com/drive/folders/15_guXA36jMqTVgmdMgpySbUZh0uH0jH9?usp=sharing 

Informasi Course Terkait
  Kategori: Artificial Intelligence
  Course: Teknologi Kecerdasan Artifisial (SIB AI-Hacker)