Regression – knowMLedge.com

ML Zoomcamp 2023 – Machine Learning for Regression – Part 12

Tuning the model The topic for this article is finding the best regularization parameter for our linear regression model. We realized that the parameter r affects the quality of our model and now we try to find the best value for this r. What you see here is using r=0 makes the bias term hugeContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 12”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 11

Regularization The topic for this part is regularization as a way to solve the problem of duplicated columns in our data. Remember the formula for normal equation is: w = (XTX)-1*XTy The problem what we have is connected with the first part (XTX)-1. We need to take an inverse of the GRAM matrix. Sometimes thisContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 11”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 10

Categorical variables Categorical variables are variables that are categories (typically strings)Here: make, model, engine_fuel_type, transmission_type, driven_wheels, market_category, vehicle_size, vehicle_style But, there is one value that looks like numerical variable, but it isn’t.number_of_doors is not really a numerical number. Typical way of encoding such categorical variables is that we represent it with a bunch of binaryContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 10”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 9

Simple feature engineering Suppose we want to develop a new feature based on the existing ones in the feature matrix X. Let’s assume we want to use the year information as an age information. Let’s assume further we have year 2017. We can add this new feature ‘age’ to our prepare_X function. What is oneContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 9”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 8

This part is about RMSE as an objective way to evaluate the model performance. In the first part of this article RMSE is introduced and in the second part RMSE is used to evaluate our model on unseen data. Root Mean Squared Error – RMSE We have the following variables, so we can calculate theContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 8”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 7

Car price baseline model This article is about building a baseline model for price prediction of a car. Here we’ll use the implemented code from the last article to build the model. First we start with a simple model while we’re using only numerical columns. The next code snippet shows how to extract all numericalContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 7”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 6

Training a linear regression model From the last article we know that we need to multiply the feature matrix X with weights vector w to get y (the prediction for price). g(X) = Xw ~ y Actually we want this Xw to be equal to y, but often it’s not possible.To achieve this, we needContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 6”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 5

Linear regression vector form This article covers the generalization to a vector form of what we did in the last article. That means coming back from only one observation xi (of one car) to the whole feature matrix X. Looking at the last part of this formula we see the dot product (vector-vector multiplication). g(xi)Continue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 5”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 4

Linear regression Let’s delve deeper into the topic of linear regression. Linear regression is a fundamental statistical technique used in the field of machine learning for solving regression problems. In simple terms, regression analysis involves predicting a continuous outcome variable based on one or more input features. That means the output of the model isContinue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 4”

ML Zoomcamp 2023 – Machine Learning for Regression – Part 3

Setting up the validation framework To validate the model, we take the dataset and split it into three parts (train-val-test / 60-20-20). The reason why this is useful was mentioned in an earlier blog post. This means that we train the model on the training dataset, check if it works fine on the validation dataset,Continue reading “ML Zoomcamp 2023 – Machine Learning for Regression – Part 3”