BISA AI - AI For Everyone
Voice Gender Recognition Classification with KNN

Nantalira Niar Wijaya

Sosial Media


0 orang menyukai ini
Suka

Summary

K Nearest Neighbors is machine learning model that can classifier data based on "neighbors". In this case the model will help to determine gender recognition by voice from many indicators or features.

Description

Read Dataset From Google Drive

First step when we want to read dataset from google drive, we have to connect path that contains the dataset in it with drive mount from google.colab library.

Import pandas library to read top 5 rows from dataset (csv) as dataframe and numpy library to convert dataframe to array in next step. Dataset that we use now is Gender Recognition by Voice to identify a voice as male or female. 

We can see dataset information like column name, column type, etc with info() from pandas.

Filtering Dataset

Filtering Dataset from missing value and duplicate data.

After checking our dataset from missing value and duplicate data, we found that our dataset hasn't missing value but has 2 duplicate data. To get the best performance of dataset we must remove duplicate data from dataset.

The next step of filtering data is dividing the dataset into features and label. In this case the variable X is the feature which contains all the columns of the dataset except the label column and the variable y is the label which contains the label column. Don't forget to change the dataset from dataframe to array to make the next step easier.

Preprocessing Data

In iris dataset we know that y variable is object type, because of that I change from object type to integer type. Preprocessing with label binarizer allow us to change male/female (object type) to 1/0 (integer type).

But in order to train dataset the y variable or dependent variable must array one dimensions. I use flatten function to change y variable that have array two dimensions to one dimensions.

Split Dataset Into Train and Testing Data

To train and test data, sklearn have library train test split that can help split data set into 4 variable (X train, y train, X test, and y test). The dataset will split 40% to test data and 60% to train data.

Classification with K Nearest Neighbors

K Nearest Neighbors is machine learning model that can classifier data based on "neighbors". Because of that KNN have to determine the neighbors that stored in k variable. To find the best k variable i to looping the odd number from 1 to 11 and try to execute it. Then I store the accuracy scores of models into scores array.

Visualize the accuracy scores of different neighbors in a line plot with matplotlib. From this visualization I can determine the best neighbor (k = 3) to my KNN model.

From my KNN model we can evaluation model to watch the model performance. with classification report library I can watch precision, recall, f-score and accuracy from my model.

Scaling Train Data

As you can see the accuracy score is 0.73 that mean 73%. I want to improve accuracy score with scale my train data with standard scaler.

I do the same thing that i do before and watch the accuracy score from my model.

You can see that the best accuracy with 3 neighbor can reach more than 0.974.

From scaling train data my KNN model can improve the performance from 0.73 to 0.97 that very enough for my KNN model to predict Gender Recognition by Voice data.

Informasi Course Terkait
  Kategori: Artificial Intelligence
  Course: Blockchain Kecerdasan Artifisial (SIB AI-BLOCKCHAIN)