The dataframe has a Unamed:0 Colomn. We can just remove the column by df.drop
Variable explanation
Fixed Acidity: Most acids involved with wine or fixed or non-volatile (do not evaporate readily)
Volatile Acidity: The amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
Citric Acid: Often added to wines to increase acidity, complement a specific flavor or prevent ferric hazes
Residual Sugar: From the natural grape sugars left in a wine after the alcoholic fermentation finishes.
Chlorides: The amount of salt in the wine
Free Sulfur Dioxide: It prevents microbial growth and the oxidation of wine
Total Sulfur Dioxide: The amount of free + bound forms of SO₂
Density: Sweeter wines have a higher density
pH: Describes the level of acidity on a scale of 0–14. Most wines are always between 3–4 on the pH scale
Alcohol: Available in small quantities in wines makes the drinkers sociable
Sulphates: A wine additive that contributes to SO₂ levels and acts as an antimicrobial and antioxidant
Quality: which is the output variable/predictor (score between 0 and 10)
In this data set, we have a total of 4898 dimensions and 11 features, 12 column is the label. All the 11 features are in float. All are numerical variables. The 12th variable is an integer. In this data set no missing value.
Visualizing the distribution of the data with a bar plot
Visualizing the wine quality
Quality of the white wines is normally distributed. Most of the wines are rated 5-7.
See the relationship between the features and class.
From the images above, we can find how many features are correlated with quality of white wine.
Negative Correlation
1) volatice acidity has a negative relationship with quality
2) Density has a negative relationship with quality
Positive Correlation
1) Alcohol and sulfate have a positive relationship with quality.
Lets confirm the relationship using the reg-plots
Modeling
Take only ‘Alcohol’ and ‘sulfate’ for feature because that has positive correlation with Target
Rescaling
split our dataset into training and testing and then creating a linear regression model using the training set
Evaluate Model
Informasi Course Terkait
Kategori: Artificial Intelligence
Course: Persiapan Ujian Sertifikasi Internasional DSBIZ - AIBIZ