Part2: Supervised Learning: Extending to higher dimensions and non-linear regression

In Part1, the aim was to predict house prices given the number of bedrooms. What if the house prices are dependant not just on the number of bedrooms but on other parameters as well, for instance, floor area. Let’s build a mathematical model for the same and it isn’t too much different from Part1’s model.


Mathematical model 2:

(1)y=w0+w1×nB+w2×fA where w's are weights that need to be found out and nB and fA’s (or x) are respectively the number of bedrooms and floor area in square units.

house prices, (y): 150k, 220k, 320k
Number of bedrooms, (nB): 1, 2, 3
Floor Area, (fA): 42, 59, 61

Y=[1nBfA][w0w1w2](2)=XTw The matrix equation becomes [150220320]=[114212591361][w0w1w2]

The equation to find the parameters would remain the same as in Part1 (reproduced in eq. 3), the change being in the feature matrix and also an added parameter (w_2). (3)w=(XTX)1XTY

Code in Python to solve the same


import numpy as np       # library for mathematics (linear algebra)

nB = np.array([1.0, 2.0, 3.0, ])  # Num bedrooms
fA = np.array([42.0, 59.0, 61.0])  # Floor area
Y = np.array([150.0, 220.0, 320.0])  # Cost


s = np.ones(len(nB)).reshape(-1,1)
X = np.hstack((s, nB.reshape(-1, 1), fA.reshape(-1, 1)))
a = X.transpose() @ X
b = X.transpose() @ Y
w = np.linalg.solve(a, b)  # To find parameters
print(w)

######
[130.0, 104.0, -2.0]


Non-linear regression

The above case can be extended to non-linear cases as long as the weight vector w stays linear

(4)y=ϕ(x)Tw

where ϕ(x) can be a polynomial or radial basis or any other functions. The non-linear case for our house pricing example can be written explicitly as y=w0+w1×nB2+w2×fA2=[1nB2fA2][w0w1w2](5)=ϕ(x)Tw

X = np.hstack((s, nB.reshape(-1, 1)**2, fA.reshape(-1, 1)**2))
a = X.transpose() @ X
b = X.transpose() @ Y
w = np.linalg.solve(a, b)  # To find parameters
print(w)

######
[119.1, 19.69, 0.00635]

The ϕ(x) in eq. 5 is just one of the many forms ϕ(x) can take. Another possible one could be y=w0+w1×nB2+w2×fA2+w3×nB×fA=[1nB2fA2nB×fA][w0w1w2w3]=ϕ(x)Tw


A few points to note

I mentioned about some small techiques used in practice to get better estimates of the parameters.

  • One of the most important one is Feature Scaling. There are also a lot of other resources available on this and it’s beyond the scope to discuss it here.
  • What happens if the feature matrix is singular? I will briefly touch upon this in Ridge regression
  • What happens if the data is non-linear? A short intro to locally weighted regression can be found here



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Part1: Supervised Learning: Basic linear regression
  • Jacobian
  • Custom Inverse Kinematics MoveIt Plugin
  • Inverse Kinematics Solutions: Analytic and Optimization Based Approaches
  • Part5: Supervised Learning: Probabilistic regression