Mathematics behind classifying spam email using Machine Learning

Comparative Study on Email Spam Classifier using Data Mining Techniques

Spam dataset is examined using TANAGRA data mining tool (data analysis software)
Feature construction and selection is used to extract significant figures
Study shows Rnd tree classification algorithm is the most accurate

Phishing Email Detection Based on Structural Properties

Study proposes a way to identify spam email using distinct structural properties
With the use of one-class Support Vector Machine (classification algorithm), potential phishing emails are classified
Overall, demonstrates an effective approach to prevent wide exposure of suspicious emails with minimal effort necessary

Council Post: The Dangers of Phishing

Rising threat of phishing is presented in this study with statistics
Presents the problem with spam emails and social engineering, need for spam classification filters
Provides brief overview of machine learning being used to automate spam email searching, classification algorithms such as kNN

Email Spam Classification Using Hybrid Approach of RBF Neural Network and Particle Swarm Optimization

Study takes a unique approach for spam emails: combines Radial Basis Function Neural Networks (RBFNN) and Particles Swarm Optimization (PSO) algorithm
The PSO algorithm is used to optimize RBFNN algorithm based on the PSO search process
Experiments conducted on an email spam dataset, SPAMBASE from UCI Machine Learning Repository

# Import libraries (pandas, sklearn, numpy, nltk) # Load the data # Dataset: https://www.kaggle.com/balakishan77/spam-or-ham-email-classification # Exploratory Data Analysis and Preprocessing # Printing first few rows of data, remove duplicates, view missing data # Make a function that would clean the text using stopwords (useless words in data and nonwords like punctuation marks or special marks) # Inside function, tokenize the data (split sentences into lists of key words) # Exhibit the tokenization # Encode text into token counts for machine learning using CountVectorizer # Split the training datasets and testing datasets (Keep it as 80% training and 20% testing) # Create and train the Naive Bayes Classifier # Print out predicted and actual values for the spam/ham classification # Check the accuracy (%) of the model during training # Evaluate on the testing data set the Model