EDA AND PREDICTION WHITE WINE

Wiga Audi Prasetyo

Sosial Media


3 orang menyukai ini
Suka

Summary

This portofolio discuss about Exploration Data Analysis and Predict White Wine Quality using Linear Regsession Model.

Description

Description

This Analysis use White Wine Quality Dataset from UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/wine+quality).

 

Exploratory Data Analysis

Importing Library that will be used

 

Load Dataset

The dataframe has a Unamed:0 Colomn. We can just remove the column by df.drop 

Variable explanation

  1. Fixed Acidity: Most acids involved with wine or fixed or non-volatile (do not evaporate readily)
  2. Volatile Acidity: The amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
  3. Citric Acid: Often added to wines to increase acidity, complement a specific flavor or prevent ferric hazes
  4. Residual Sugar: From the natural grape sugars left in a wine after the alcoholic fermentation finishes.
  5. Chlorides: The amount of salt in the wine
  6. Free Sulfur Dioxide: It prevents microbial growth and the oxidation of wine
  7. Total Sulfur Dioxide: The amount of free + bound forms of SO₂
  8. Density: Sweeter wines have a higher density
  9. pH: Describes the level of acidity on a scale of 0–14. Most wines are always between 3–4 on the pH scale
  10. Alcohol: Available in small quantities in wines makes the drinkers sociable
  11. Sulphates: A wine additive that contributes to SO₂ levels and acts as an antimicrobial and antioxidant
  12. Quality: which is the output variable/predictor (score between 0 and 10)

In this data set, we have a total of 4898 dimensions and 11 features, 12 column is the label. All the 11 features are in float. All are numerical variables. The 12th variable is an integer. In this data set no missing value. 

Visualizing the distribution of the data with a bar plot

Visualizing the wine quality

Quality of the white wines is normally distributed. Most of the wines are rated 5-7.

See the relationship between the features and class.

From the images above, we can find how many features are correlated with quality of white wine.

Negative Correlation

1) volatice acidity has a negative relationship with quality

2) Density has a negative relationship with quality

Positive Correlation

1) Alcohol and sulfate have a positive relationship with quality.

Lets confirm the relationship using the reg-plots

Modeling

Take only ‘Alcohol’ and ‘sulfate’ for feature because that has positive correlation with Target

Rescaling 

split our dataset into training and testing and then creating a linear regression model using the training set

Evaluate Model

Informasi Course Terkait
  Kategori: Artificial Intelligence
  Course: Persiapan Ujian Sertifikasi Internasional DSBIZ - AIBIZ