Nafisan Nauroti Salsa Bila
Pada proyek kali ini, akan dilakukan klasifikasi teks terhadap dataset Coronavirus Tweets menggunakan LSTM. Dimana tahapan yang dilakukan yaitu exploring data, preprocessing, model building, dan evaluation.
Natural Language Processing or NLP is a branch of Artificial Intelligence which deal with bridging the machines understanding humans in their Natural Language. Natural Language can be in form of text or sound, which are used for humans to communicate each other. NLP can enable humans to communicate to machines in a natural way.
Text Classification is a process involved in Sentiment Analysis. It is classification of peoples opinion or expressions into different sentiments. Sentiments include Positive, Neutral, and Negative, Review Ratings and Happy, Sad. Sentiment Analysis can be done on different consumer centered industries to analyse people's opinion on a particular product or subject.
Exploring the Data
In this project, I used Coronavirus Tweets dataset from kaggle.com.
Preprocessing
Tweet texts often consists of other user mentions, hyperlink texts, emoticons and punctuations. In order to use them for learning using a Language Model. We cannot permit those texts for training a model. So we have to clean the text data using various preprocessing and cleansing methods.
Mapping
Stop words removal
Tokenizing
Model Building
Model Evaluation
Now that we have trained the model, we can evaluate its performance. We will some evaluation metrics and techniques to test the model.
It's a pretty good model I trained here in terms of NLP. 84.1% accuracy is good enough considering the baseline human accuracy also pretty low in these tasks. This model is good for handling most tasks for Sentiment Analysis.