Overview:
The second part is about the distinction between Machine Learning (ML) and rule-based systems. The example of a spam filter is used to explain how the implementation would look like without ML.
Rule-based systems
What you need to do is to define some rules to distinguish between ham and spam. So you start defining the rules and for a while everything works fine. However, at some point you have to adjust the rule set and you end up on the hamster wheel because you can’t handle the constant reconfiguration of the rules. Also, this system gets harder and harder to maintain.
Machine Learning
The second way to implement this Spam filter is to use ML instead of using hard-coded rules. That means you need to collect the data, define & calculate (extract) the features, and then train and use the model to classify messages into spam and not spam.
Collect the data
Collecting the data while using the “SPAM” button of your mail system
Define & calculate (extract) the features
Creating the features -> start with the rules you would use in rule-based systems
Features:
- Length of title > 10? true/false
- Length of body > 10? true/false
- Sender “promotions@online.com”? true/false
- Sender “hpYOSKmL@test.com”? true/false
- Sender domain “test.com”? true/false
- Description contains “deposit”? true/false
All of the six features here are binary features, so you can encode each mail as binary code like [1, 1, 0, 0, 1, 1]. Besides this every email has a label1 / target (spam = 1, no-spam = 0), which is the desired output.

Training
This data is used to train the model. This process is often called as fitting a model.
In training, something happens that is similar to solving a very complex system of equations with many parameters. Here, the features are offset against each other in such a way that the correct classification is obtained at the end. Correct in this example means 1 for spam 0 for no spam. More precisely, we get a probability for the correct label. The trained model contains exactly the information that best solves the equation, namely the weights with which the individual features must be offset to get the correct result.2
Apply the model
If the model is now applied to unknown data sets, the result is a probability. This probability indicates whether this is a spam mail or not. To finally decide how to categorize the mail, a threshold is used (e.g. 0.5). Thus, everything greater than or equal to 0.5 is declared as spam.
