Training a linear regression model
From the last article we know that we need to multiply the feature matrix X with weights vector w to get y (the prediction for price).
g(X) = Xw ~ y
Actually we want this Xw to be equal to y, but often it’s not possible.
To achieve this, we need to find a way to compute w.
Xw = y can be written as X-1Xw = X-1y that is Iw = X-1y
The last formula can be simplified to w = X-1y because we know that Iw = wI = w for every vector w and Identity matrix I.
But there is a problem. X-1 exists only for squared matrices and X is of dimension m x (n+1) which is not squared in almost every case.
We need to find an approximation for this XTXw = XTy
XTX is called the GRAM MATRIX and for XTX the inverse exists, because this is squared (n+1)x(n+1)
(XTX)⁻1XTXw = (XTX)⁻1XTy
(XTX)⁻1XTX = Iw
Iw = (XTX)-1XTy
That value is the closest possible solution
w = (XTX)⁻1XTy
Now we know what we have to do now. We need to implement the function train_linear_regression, that takes the feature matrix X and the target variable y and returns the vector w.
def train_linear_regression(X, y):
pass
To approach this implementation we first use a simplified example.
X =[
[148, 24, 1385],
[132, 25, 2031],
[453, 11, 86],
[158, 24, 185],
[172, 25, 201],
[413, 11, 83],
[38, 54, 185],
[142, 25, 431],
[453, 31, 86],
]
X = np.array(X)
X
# Output:
# array([[ 148, 24, 1385],
# [ 132, 25, 2031],
# [ 453, 11, 86],
# [ 158, 24, 185],
# [ 172, 25, 201],
# [ 413, 11, 83],
# [ 38, 54, 185],
# [ 142, 25, 431],
# [ 453, 31, 86]])
From the last article we know that we need to add a new column with ones to the feature matrix X. That is for the multiplication with w0. We remember that we can use np.ones() to get a vector with ones at each position.
ones = np.ones(9)
ones
# Output: array([1., 1., 1., 1., 1., 1., 1., 1., 1.])
# X.shape[0] looks at the number of rows and creates the vector of ones
ones = np.ones(X.shape[0])
ones
# Output: array([1., 1., 1.])
Now we need to stack this vector of ones with our feature matrix X. For this we can use the NumPy function np.column_stack() as shown in the next snippet.
np.column_stack([ones, ones])
# Output:
# array([[1., 1.],
# [1., 1.],
# [1., 1.],
# [1., 1.],
# [1., 1.],
# [1., 1.],
# [1., 1.],
# [1., 1.],
# [1., 1.]])
X = np.column_stack([ones, X])
y = [10000, 20000, 15000, 25000, 10000, 20000, 15000, 25000, 12000]
# GRAM MATRIX
XTX = X.T.dot(X)
# Inverse GRAM MATRIX
XTX_inv = np.linalg.inv(XTX)
In the following code snippet we test whether the multiplication of XTX with XTX_inv actually produces the Identity matrix I.
# Without round(1) it's not exactly identity matrix but the other values
# are very close to 0 --> we can treat them as 0 and take it as identity matrix
XTX.dot(XTX_inv)
# Output:
#array([[ 1.00000000e+00, 2.60208521e-18, 4.85722573e-17,
1.08420217e-18],
# [ 4.54747351e-13, 1.00000000e+00, 1.50990331e-14,
2.22044605e-16],
# [ 5.68434189e-14, 1.11022302e-16, 1.00000000e+00,
3.46944695e-17],
# [ 9.09494702e-13, 0.00000000e+00, -2.13162821e-14,
1.00000000e+00]])
# This gives us the I matrix
XTX.dot(XTX_inv).round(1)
# Output:
# array([[ 1., 0., 0., 0.],
# [ 0., 1., 0., 0.],
# [ 0., 0., 1., 0.],
# [ 0., 0., -0., 1.]])
Now we can implement the formula to get the full w vector.
# w_full contains all the weights
w_full = XTX_inv.dot(X.T).dot(y)
w_full
# Output: array([ 3.00092529e+04, -2.27839691e+01, -2.57690874e+02, -2.30322797e+00])
From that vector w_full we can extract w0 and all the other weights.
w0 = w_full[0]
w = w_full[1:]
w0, w
# Output: (30009.25292276666, array([ -22.78396914, -257.69087426, -2.30322797]))
Now we can implement the function train_linear_regression, that takes the feature matrix X and the target variable y and returns w0 and the vector w.
def train_linear_regression(X, y):
ones = np.ones(X.shape[0])
X = np.column_stack([ones, X])
XTX = X.T.dot(X)
XTX_inv = np.linalg.inv(XTX)
w_full = XTX_inv.dot(X.T).dot(y)
return w_full[0], w_full[1:]
Let’s test this newly implemented function with some simple examples:
X =[
[148, 24, 1385],
[132, 25, 2031],
[453, 11, 86],
[158, 24, 185],
[172, 25, 201],
[413, 11, 83],
[38, 54, 185],
[142, 25, 431],
[453, 31, 86],
]
X = np.array(X)
y = [10000, 20000, 15000, 25000, 10000, 20000, 15000, 25000, 12000]
train_linear_regression(X, y)
# Output: (30009.25292276666, array([ -22.78396914, -257.69087426, -2.30322797]))