Portofolio Detail >> PREPROCESSING FOR SENTIMENT LABELLED SENTENCES

PREPROCESSING FOR SENTIMENT LABELLED SENTENCES

Eka Wulan Yunita

Sosial Media

4 orang menyukai ini
Suka

Summary

the Text preprocessing is the stage for processing text from datasets into clean and ready-to-process datasets.tage for processing text from datasets into clean and ready-to-process datasets. The step of text preprocessing are different, depending on the dataset that you have. Dataset that will processed is the Sentiment Labeled Sentences Data Set which will be cleaned so that it is ready to proceed to the next stage.

Description

TEXT PROCESSING?

Text preprocessing is the stage for processing text from datasets into clean and ready-to-process datasets. Of course, the dataset that is owned is in the form of text or documents. This is necessary so that the model to be made has good and accurate results. Without this process, it is feared that the model to be built will be inaccurate and ineffective. Now, dataset that will processed is the Sentiment Labeled Sentences Data Set which will be cleaned so that it is ready to proceed to the next stage. This dataset amounts to -+ 1000 data in the form of sentiments that have been labeled 1-5.

STEPS

Preprocessing step are different for each person, according to the data that they have. Because, each data has different elements.

In this case, For preprocessing has several steps, there are:

CASEFOLDING

Before that, we must download libraries that is required

Don't forget to install the nltk library and the literary library (for stopwords). NLTK is a platform used to build text analysis programs.

Then, load data.

Continue to casefolding

Case folding is useful for equating all letters to lowercase by using the str.lower() command

2. FILTERING

Then, Filtering process is the stage of selecting things that are considered important and not, such as punctuation marks, emoticons, etc.

In here, the things that are omitted are tagger, punctuation, and numbers. so that it will produce data that is really words.

3. STOPWORD

Actually, stopwords are the same as filtering, but the difference is that stopwords only select words to be removed/added. while filtering selects other than words.

Here, the stopword uses the nltk library. 'English' there adjusts to the language of the data that we have.

4. TOKENIZATION

Tokenizing or also called the Lexical Analysis stage is the process of cutting text into smaller parts, which are called tokens.

5. STEMMING

Stemming is the process of changing word forms into basic words or the stage of finding the root of each word.

RESULT

Then, the clean data is saved to proceed to the next step.

Informasi Course Terkait

Kategori: Data Science / Big Data
Course: Basic Text Processing

Kelas GRATIS

Master Class

Learning Path

Master Class + Sertifikasi BNSP

Master Class + Sertifikasi Internasional

Portofolio Peserta

Webinar

Udemy

Kelas GRATIS

Master Class

Master Class + Sertifikasi BNSP

Master Class + Sertifikasi Internasional

Learning Path

Portofolio Peserta

Program Special

Webinar

Udemy

Learncation

Sertifikasi Internasional

Sertifikasi Nasional

Kelas Corporate

Sertifikasi Internasional

Sertifikasi Nasional

Kelas Corporate

Kolaborasi Seminar

Kolaborasi pelatihan

Gallery

Tentang Kami

Testimonial Peserta

Testimonial Video Peserta

Corporate Social Responsibility

Pengajar Kami

Hubungi Kami

Dokter Mekanik

E-learning

LEIP

Flungo

Tampil

Run Addicts

TripTracker

Gramatikal