ML Zoomcamp 2023 – Machine Learning for Regression – Part 6

Training a linear regression model

From the last article we know that we need to multiply the feature matrix X with weights vector w to get y (the prediction for price).

g(X) = Xw ~ y

Actually we want this Xw to be equal to y, but often it’s not possible.
To achieve this, we need to find a way to compute w.

Xw = y can be written as X-1Xw = X-1y that is Iw = X-1y

The last formula can be simplified to w = X-1y because we know that Iw = wI = w for every vector w and Identity matrix I.

But there is a problem. X-1 exists only for squared matrices and X is of dimension m x (n+1) which is not squared in almost every case.

We need to find an approximation for this XTXw = XTy

XTX is called the GRAM MATRIX and for XTX the inverse exists, because this is squared (n+1)x(n+1)


(XTX)⁻1XTXw = (XTX)⁻1XTy

(XTX)⁻1XTX = Iw

Iw = (XTX)-1XTy

That value is the closest possible solution

w = (XTX)⁻1XTy

Now we know what we have to do now. We need to implement the function train_linear_regression, that takes the feature matrix X and the target variable y and returns the vector w.

def train_linear_regression(X, y):
    pass

To approach this implementation we first use a simplified example.

X =[
    [148, 24, 1385],
    [132, 25, 2031],
    [453, 11, 86],
    [158, 24, 185],
    [172, 25, 201],
    [413, 11, 83],
    [38, 54, 185],
    [142, 25, 431],
    [453, 31, 86],  
]

X = np.array(X)
X
# Output: 
# array([[ 148,   24, 1385],
#             [ 132,   25, 2031],
#             [ 453,   11,   86],
#             [ 158,   24,  185],
#             [ 172,   25,  201],
#             [ 413,   11,   83],
#             [  38,   54,  185],
#             [ 142,   25,  431],
#             [ 453,   31,   86]])

From the last article we know that we need to add a new column with ones to the feature matrix X. That is for the multiplication with w0. We remember that we can use np.ones() to get a vector with ones at each position.

ones = np.ones(9)
ones
# Output: array([1., 1., 1., 1., 1., 1., 1., 1., 1.])

# X.shape[0] looks at the number of rows and creates the vector of ones
ones = np.ones(X.shape[0])
ones
# Output: array([1., 1., 1.])

Now we need to stack this vector of ones with our feature matrix X. For this we can use the NumPy function np.column_stack() as shown in the next snippet.

np.column_stack([ones, ones])
# Output: 
# array([[1., 1.],
#             [1., 1.],
#             [1., 1.],
#             [1., 1.],
#             [1., 1.],
#             [1., 1.],
#             [1., 1.],
#             [1., 1.],
#             [1., 1.]])
X = np.column_stack([ones, X])
y = [10000, 20000, 15000, 25000, 10000, 20000, 15000, 25000, 12000]

# GRAM MATRIX
XTX = X.T.dot(X)

# Inverse GRAM MATRIX
XTX_inv = np.linalg.inv(XTX)

In the following code snippet we test whether the multiplication of XTX with XTX_inv actually produces the Identity matrix I.

# Without round(1) it's not exactly identity matrix but the other values 
# are very close to 0 --> we can treat them as 0 and take it as identity matrix
XTX.dot(XTX_inv)
# Output:
#array([[ 1.00000000e+00,  2.60208521e-18,  4.85722573e-17,
         1.08420217e-18],
#       [ 4.54747351e-13,  1.00000000e+00,  1.50990331e-14,
         2.22044605e-16],
#       [ 5.68434189e-14,  1.11022302e-16,  1.00000000e+00,
         3.46944695e-17],
#       [ 9.09494702e-13,  0.00000000e+00, -2.13162821e-14,
         1.00000000e+00]])

# This gives us the I matrix
XTX.dot(XTX_inv).round(1)
# Output: 
# array([[ 1.,  0.,  0.,  0.],
#             [ 0.,  1.,  0.,  0.],
#             [ 0.,  0.,  1.,  0.],
#             [ 0.,  0., -0.,  1.]])

Now we can implement the formula to get the full w vector.

# w_full contains all the weights
w_full = XTX_inv.dot(X.T).dot(y)
w_full
# Output: array([ 3.00092529e+04, -2.27839691e+01, -2.57690874e+02, -2.30322797e+00])

From that vector w_full we can extract w0 and all the other weights.

w0 = w_full[0]
w = w_full[1:]
w0, w
# Output: (30009.25292276666, array([ -22.78396914, -257.69087426,   -2.30322797]))

Now we can implement the function train_linear_regression, that takes the feature matrix X and the target variable y and returns w0 and the vector w.

def train_linear_regression(X, y):
    ones = np.ones(X.shape[0])
    X = np.column_stack([ones, X])
    
    XTX = X.T.dot(X)
    XTX_inv = np.linalg.inv(XTX)
    w_full = XTX_inv.dot(X.T).dot(y)
    
    return w_full[0], w_full[1:]

Let’s test this newly implemented function with some simple examples:

X =[
    [148, 24, 1385],
    [132, 25, 2031],
    [453, 11, 86],
    [158, 24, 185],
    [172, 25, 201],
    [413, 11, 83],
    [38, 54, 185],
    [142, 25, 431],
    [453, 31, 86],  
]

X = np.array(X)
y = [10000, 20000, 15000, 25000, 10000, 20000, 15000, 25000, 12000]

train_linear_regression(X, y)
# Output: (30009.25292276666, array([ -22.78396914, -257.69087426,   -2.30322797]))

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.