top of page

Day 9: Feature Engineering and Polynomial Regression

Feature Engineering
Feature Engineering: using intuition to design new features, by transforming and/or combining original features

The choice of features can have a huge impact on your learning algorithm's performance.

In this section, we will take a look at feature engineering by revisiting the example of predicting the price of a house.

Let's say that you have received the data of measurements of x1 (frontage) and x2 (depth) of a house

at this moment, you only have x1 and x2, but you can also measure the area of the house by multiplying frontage(x1) and depth(x2), with this knowledge, you can create a new feature x3:

f_w,b(X) = w1*x1 + w2*x2 + b           # x1 = frontage, x2 = depth
                                # area = frontage * depth
x3 = x1*x2                         # x3 = new feature called area
f_w,b(X) = w1*x1 + w2*x2 + w3*x3 + b

what we just did, creating a new feature, is an example of feature engineering.

Depending on what insights you may have into the application, rather than just taking the features that you happen to have started off with, sometimes by defining new features, you may be able to get a better model.

Polynomial Regression

let's take the ideas of multiple linear regression and feature engineering to come up with a new algorithm called polynomial regression, which will let you fit curves, non-linear functions to your data

take a look at the plot below, with the actual value in red crosses, and the prediction line in blue, it doesn't look like the line fit the data set very well.

We may want to fit a curve in this scenario, maybe a quadratic function, in which you will add x², which is x raise to the power of 2:

when we used the quadratic function, we noticed the prediction line now fits the data better:

We can maybe choose a cubic function, which looks like this:

when we used the cubic function, the line seems to be slightly off:

These are both examples of polynomial regression, because we took the optional feature x, and raise it to the power of 2 or 3 or any other power

It is usually a good idea to get your features to comparable range of values if you were to perform feature scaling.

Aside from getting your features squared or cubed, you may also use the square root of your feature.

Recent Posts

See All

Day 39: Tree Ensembles

Using Multiple Decision Trees One of the weaknesses of using a single decision tree is that decision tree can be highly sensitive to small changes in the data. One solution to make the algorithm less


bottom of page