Bethelsando Gemilang Wahyudi
SUMMARY
Email spam, also known as junk email, is unsolicited, unwanted, or irrelevant messages sent via email. These messages are typically sent in large quantities by spammers, who hope to either scam people out of their money or trick them into giving away personal information. Spam emails may contain links to malicious websites or attachments that can harm your computer, so it's important to be careful when dealing with them. Most email providers have spam filters in place to help protect users from this type of unwanted email. Actually we can detect which one is spam or not use machine learning.
DESCRIPTION:
3. Mounting drive with collab
4. Import library that we need to process dataset and make data to a dataframe
5.Preprocessing dataframe
After we know that dataset is clean we continue to next step
6. Get the statistical from dataframe and get in the columns on pandas dataframe
7. Define the X data from dataframe
8. Define the y data or result from dataframe
9. We do some classification use eleven algorithm
10. Train data with each algorithm
Until we get this output
============================== KNeighborsClassifier ****Results**** Accuracy: 87.0070% Log Loss: 1.379305610485888 ============================== SVC ****Results**** Accuracy: 71.0750% Log Loss: 0.4836649541561642 ============================== NuSVC ****Results**** Accuracy: 82.6759% Log Loss: 0.3320301535134764 ============================== DecisionTreeClassifier ****Results**** Accuracy: 93.1168% Log Loss: 2.3773790403302795 ============================== RandomForestClassifier ****Results**** Accuracy: 97.8345% Log Loss: 0.16699441191493325 ============================== XGBClassifier ****Results**** Accuracy: 96.5197% Log Loss: 0.12179309395210326 ============================== AdaBoostClassifier ****Results**** Accuracy: 96.2877% Log Loss: 0.5251066199866752 ============================== GradientBoostingClassifier ****Results**** Accuracy: 96.7517% Log Loss: 0.12360515066970783 ============================== GaussianNB ****Results**** Accuracy: 95.2823% Log Loss: 1.6259912051490315 ============================== LinearDiscriminantAnalysis ****Results**** Accuracy: 72.3125% Log Loss: 8.361442220781141 /usr/local/lib/python3.8/dist-packages/sklearn/discriminant_analysis.py:878: UserWarning: Variables are collinear warnings.warn("Variables are collinear") ============================== QuadraticDiscriminantAnalysis ****Results**** Accuracy: 75.2514% Log Loss: 8.547879695569543 ============================== |
11. Compare the accuracy from each algorithm and get the best model machine learning
12. And we get the best algorithm
13. From this chart we know that randomforest classifier is the best model for this dataset