Mushroom Classification using Deep Neural Network

Timothy Christyan

Sosial Media


0 orang menyukai ini
Suka

Summary

By seeing the characteristics of a mushroom, we can make a classification whether those mushroom edibles or poisonous. The most common characteristics that determine whether a mushroom poisonous or not is from the color of its cap. Poisonous mushroom tends to have a more bright cap color. Other than cap color, there also many other attributes that can determine whether a mushroom is poisonous or not, such as its root shape, cap shape, natural habitat, and many more.

Using a Deep Neural Network, I will try to make a classification model that can classify whether a mushroom edible or poisonous from its attributes. The dataset for training the model is collected from mushroom dataset in Kaggle. The finished model will be evaluated using confusion matrix. The model is created in the Google Colab environment.

Description

Step for creating Deep Neural Network model for classifying poisonous and edible mushroom is as follow:

 

Import all library and connect google colaboratory to drive

 

Load the Dataset

To load the dataset, save the mushroom dataset from Kaggle and save it to the google Drive. Then mount google drive into colab and load the dataset into dataframe using library Pandas

 

Preprocessing

Before start making the model, first do preprocessing step on the dataset. The preprocessing done on the dataset is:

  • Removing empty data if any, do this using data.isna().sum() to check whether data has an empty value or not. From this its found that the dataset has no empty value so this process can be skipped.

  • Encode data that not yet has numeric value. To do this first check unique value in each column in dataset. I do label encoding for column that only has 2 unique values, meanwhile for column that has more than 2 unique value I do Binary Encoder using library category_encoders. 

 

Make Training and Testing Data

Start by determine ‘class’ column as target and other column as feature. Then use train_test_split() function from sklearn library to split the feature and target data into training and testing data.

 

Making Deep Neural Network Classification Model

The model is created using Sequential() class from library Keras. Add 1 input layer with the same dimension as the training dataset, this layer is used to receive input from the training data. Also add 2 dense layer, the first layer has 16 units with ‘relu’ as activation function, meanwhile for the second layer (which is also the output layer) use 2 units with ‘sigmoid’ activation layer. The compile the model

Next train the model using the training data. In training process use 100 epochs (can be added or reduced) and shuffle the dataset.

 

Evaluate the Finished Model

Usually we want a model that can generalize on all data, which mean do not underfit or overfit. Underfit happen when the model performs well on training data but not on test data. Overfit happen when performance in training is good, but performance in validation stop or even degraded after certain epoch. 

To check whether the finished model is underfit or overfit, plot the accuracy and loss of training process of the model.

Underfit is marked with a plot where validation loss is far greater than training accuracy, meanwhile overfit is marked with the training accuracy that keep increasing but validation accuracy stops or even degrade.

Because the plot of validation accuracy against training accuracy, and validation loss against training loss doesn't have that big of difference, it can be concluded that the finished model doesn't underfit nor overfit. 

Next, make some prediction using model and compare it to the actual test data. 

To see the model accuracy, precision, and recall; use the classification_report() function from sklearn.metrics library. The confusion matrix of the model can also be visualized using function heatmap() from seaborn library and function confusion_matrix() from sklearn.metrics library.

 

Conclussion

After evaluating the model, can be concluded that the classification Deep Neural Network model for poisonous and edible mushroom give a perfect result. The model give a perfect 100% for accuracy, precision, and recall; and also do not underfit and overfit. 

Informasi Course Terkait
  Kategori: Data Science / Big Data
  Course: Riset Kecerdasan Artifisial (SIB AI-RESEARCH)