ML Zoomcamp 2023 – Introduction to Machine Learning – Part 3

Overview:

  1. What is Supervised Machine Learning?
  2. Types of Supervised Machine Learning

As I mentioned before (in part 2 of this Introduction to ML) there are several approaches to get a software solution for a problem. To give an overview there is the classical approach where everything is hard-coded. In contrast to this there are AI-approaches.

On the one hand there are knowledge-based systems, that are divided into rule-based systems and case-based Reasoning. Rule-based systems were mentioned before in this blog.

On the other hand, there is machine learning as another AI approach. “Machine Learning […] provides systems with the ability to learn from experience without being programmed explicitly. Machine Learning is concerned with the development of […] applications that can access data and learn from it on themselves.1

There are different kind of problems ML is trying to solve.

  • Regression (predict continuous values, e.g. prices)
  • Classification (predict labels to distinguish between different classes)
  • Clustering (predict groups or the data without having any group labels or group characteristics)

Depending on these problem types you can use several different learning strategies to solve the problem.

  • Supervised Learning
  • Unsupervised Learning
  • Semi-supervised Learning
  • Reinforcement Learning
  • Active Learning

This should give you a brief overview about where Supervised Learning fits in. For more information on all other approaches, please refer to the literature, e.g. 2

What is Supervised Machine Learning?

Supervising the model means that we act as a teacher for the model while we showing different examples with its target value (e.g. price of the car). The machine is able to learn from this examples, while it’s extracting the patterns and generalize to new examples.

Figure 1.3.1

From figure 1.3.1 you can extract a lot of important information:

  • Rows are the observations or objects for which we want to predict something
  • Columns are features of the dedicated observation/object
  • X is defined as the whole set of features that is called feature matrix (two-dimensional array, array of arrays)
  • y is defined as vector with the target variable (one-dimensional array)

From this you can derive the formal definition of supervised machine learning: g(X) ~ y where:

  • X : feature matrix
  • y : target variable
  • g : model that takes X and produces sth. that is approxymately close to y

The aim of a training is to get the function g. Mostly this model (g) won’t be able to predict always the correct target variable, but we try to be as close as possible.

Figure 1.3.2 – Using the model for prediction

Figure 1.3.2 shows sample predictions for a few objects. You see the output of the function g(X) as a likelihood. Depending on the threshold this value is evaluated to 0 or 1.

Types of Supervised Machine Learning

Regression:

  • e.g. price of a car/house/…
  • g predicts a number between -inf..+inf

Classification:

  • e.g. identify a picture as a car, identify mail as spam
  • g predicts a category
  • Input is a picture (or characteristics of an object/observation) and the output is the label/class

Subtypes of classification:

  • Multiclass classification problem (distinguishs between several classes (e.g. cat, dog, car))
  • Binary classification problem (distinguishs between two classes (e.g. spam vs. not spam))

Ranking:

  • Usually used when you want to rank something (e.g. recommender system) -> giving a score to each item in an e-commerce shop and show the top values, because the algorithm calculates that this items have the highest potential to be bought by this customer
  • Googles search engine works in a similar way

  1. [S. Burns (2019): Python Machine Learning – Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow, 4 ed., Amazon Kindle Publishing] ↩︎
  2. [M. Deru, A. Ndiaye (2019): Deep Learning mit TensorFlow, Keras und TensorFlow.js, 1. Aufl., Bonn, Deutschland: Rheinwerk Computing] ↩︎

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.