top of page

Day 6: Multiple Linear Regression



Previously, we looked at a version of linear regression with only one feature, today, we're going to explore linear regression with more different features


Size in sqft (X1)

Number of Bedrooms (X2)

Number of floors (X3)

Age of home in years (X4)

Price ($) in $1000s

2104

5

1

45

460

1416

3

2

40

232

1534

3

2

30

315

852

2

1

36

178

a few notations:

  • X1, X2, X3, X4 denote 4 input features

  • Xj represent the list of features (j = 1...4)

  • We'll also use n to denote the number of training examples

  • X_i = features of i-th example

X_2 = [1416, 3, 2, 40] # a row vector

Xj_i = value of feature j in the i-th training example, X3_2 = 2


Now that we have multiple features, we're going to define our model function differently:


We can define W as a list of numbers that list the parameters:


In math, this is called a vector and sometimes to designate that this is a vector, which means a list of numbers, we add the arrow on top of it.

So, we can re-write our model function as follow, note that b doesn't have an arrow on top of it as bias is a constant.


which will be calculated as shown in the first example above.


Vectorization


When implementing a learning algorithm, using vectorization will both make our code shorter and make it run much more efficiently.

With vectorization, we can easily implement functions with many input features, which we can implement with NumPy's dot function:

f = np.dot(w, x) + b

The numpy dot function is a vectorized implementation of the dot product operation between 2 vectors, the reason that vectorization implementation is much faster is because the numpy dot function is able to use parallel hardware in your computer.


Take a look at a comparison image below:


Matrices

Matrices are 2-dimensional arrays. the elements of the matrix are all of the same type

NumPy's basic data structure is an indexable, n-dimensional array containing elements of the same time (dtype). Matrices have a 2-dimensional index [m, n]


Matrix creation

The same functions that created 1-D vectors will create 2-D arrays.

Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further that NumPy, when printing, will print one row per line.


a = np.zeros((1, 5))
print(f"a shape = {a.shape}, a = {a}")

# result:
a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]

a = np.zeros((2, 1))
print(f" a shape = {a.shape}, a = {a}")

# result:
a shape = (2, 1), a = [[0.]
                       [0.]]

a = np.random.random_sample((1, 1))
print(f"a shape = {a.shape}, a = {a}")

# result:
a shape = (1, 1), a = [[0.44236513]]

Indexing

Matrices include a second index. The two indexes describe [row, column]

Access can either return an element or a row or column. See below:


# vector indexing operations on matrices
a = np.arange(6).reshape(-1, 2)
print(f"a shape: {a.shape}, \na = {a}")

# result:
a shape: (3,2)
a = [[0 1]
     [2 3]
     [4 5]]

# access an element
print(f"\na[2,0].shape: {a[2,0].shape}, a[2,0] = {a[2,0]},
        type(a[2,0]) = {type(a[2,0])} \naccessing an element returns a scalar")

# result:
a[2,0].shape: {}, a[2,0] = 4, type(a[2,0]) = <class 'numpy.int64'>
accessing an element returns a scalar

# access a row
print(f"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}")

#result:
a[2].shape: (2,), a[2] = [4,5], type(a[2]) = <class 'numpy.ndarray'>

Slicing

Slicing creates an array of indices using a set of 3 values (start:stop:step). A subset of values is also valid. Its use is best explained by an example:



# vector 2-D slicing operations
a = np.arange(20).reshape(-1, 10)
print(f"a = \n {a}")

# result
a =
[[0  1  2   3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]

# access 5 consecutive elements (start:stop:step)
print("a[0, 2:7:1] = ", a[0, 2:7:1], ", a[0, 2:7:1].shape =", a[0, 2:7:1].shape, "a 1-D array")

# result:
a[0, 2:7:1] = [2 3 4 5 6], a[0, 2:7:1].shape = (5,) a 1-D array

# access 5 consecutive elements in 2 rows (start:stop:step)
print("a[:, 2:7:1] = ", a[:, 2:7:1], ", a[:, 2:7:1].shape =", a[:, 2:7:1].shape, "a 2-D array")

# result:
a[:, 2:7:1] = [2  3  4  5   6]
             [12 13 14 15 16]], a[:, 2:7:1].shape = (2,5) a 1-D array

# access all elements
print("a[:, :] = \n", a[:,:].shape = ", a[:,:].shape)

# result:
a[:,:] =
[[0  1  2   3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]], a[:,:].shape = (2, 10)

# access all elements in one row
a[1,:] is the same as a[1]

a[1,:] or a[1] = [10 11 12 13 14 15 16 17 18 19], shape (10,) a 1-d array

Recent Posts

See All

Day 39: Tree Ensembles

Using Multiple Decision Trees One of the weaknesses of using a single decision tree is that decision tree can be highly sensitive to small changes in the data. One solution to make the algorithm less

Comments


bottom of page