ATI ZAIDIAH
The purpose of this project is: to predict the weight of the fish by comparing the data on the length, width and height of the fish
Data understanding: The dataset used is data taken from the dataset on kaggle.com,. is data from 7 different common fish species in fish market sales. With this dataset, it is possible to predict fish weight using linear regression. The data consists of 159 records and 7 columns. The dependent variable is the weight of the fish while the independent variables are length1, length2, length3, height and width
Data Structure:
Species, weight, lenght1, lenght2, lenght3, height, width
The dataset used is as shown in Figure 1 below:
Figure 1. Fish species dataset (source kaggle.com)
DATA VISUALIZATION
Visualization was carried out to compare the weight of the fish with lenght1, lenght2, lenght3, height and width.
Data visualization was previously used using MS. Excel, where the data to be compared are:
The following is a scatter graph (Figure 2) which illustrates the visualization of the 5 comparisons above
Figure 2. Graphics visualization
From Figure 2 above, it can be assumed that the relationship between fish weight and length1, length2, length3, height and width all have a high positive correlation, because the higher the value of length1, length2, length3, height and width, the higher the fish's weight will be.
To see whether the data visualization is consistent or not, normalization is carried out on the data by dividing the value of each variable by its maximum value. Here is presented one of the data visualization that has been normalized.
Figure 3. Data Visualization after normalization
From Figure 3 above, it can be concluded that the data before and after normalization produce almost the same image so that it can be said that the visualization results are consistent.
After visualizing the data using a scatter graph, the next step is to perform calculations using a data analyst using excel.
The following are the results of the calculation of data analysis using linear regression, namely by doing a correlation between the dependent variable and the independent variable.
Figure 4. Correlation between weight and length1
From the picture above, it can be concluded that the results of the correlation analysis between weight and length1 have a high positive correlation, indicated by the correlation value above 0.9
Figure 5. Correlation between weight and length2
From the picture above, it can be concluded that the results of the correlation analysis between weight and length2 have a high positive correlation, indicated by the correlation value above 0.9
Figure 6. Correlation between weight and length3
From the picture above, it can be concluded that the results of the correlation analysis between weight and length3 have a high positive correlation, indicated by the correlation value above 0.9
Figure 7. Correlation between weight and height
From the picture above it can be concluded that the results of the correlation analysis between weight and height have a fairly high positive correlation, this is indicated by the correlation value above approaching the number 0.9
Figure 8. Correlation between weight and width
From the picture above it can be concluded that the results of the correlation analysis between weight and height have a fairly high positive correlation, this is indicated by the correlation value above approaching the number 0.9
Conclusion
The conclusion from the analysis that has been done is that all independent variables have a positive correlation where the one with the highest positive correlation is weight with length2, which is 0.92 while the smallest positive correlation is weight with height, which is 0.72.