*Today, we will put together what we've previously learned to implement gradient descent for multiple linear regression using vectorization.*

__To recap__

**parameters:** W1, ..., Wn

b

**Model:**

**Cost function:**

##### Normal Equation

Before moving on, we will make a quick note on an alternative way to find **w** and **b** for linear regression. This method is called the normal equation:

only used for linear regression

solve the problem of finding

**w**and**b**without iteration

Disadvantages:

doesn't generalize to other learning algorithms

slow when number of features is large ( > 10,000 )

what you need to know:

normal equation method may be used in machine learning libraries that implement linear regression

Gradient descent is the recommended method for finding parameters

**w, b**

###### Vector Vector dot product

The dot product is a mainstay of Linear Algebra and NumPy. The dot product is shown below:

The dot product multiplies the values in two vectors element-wise and the sums the result. Vector dot product requires the dimensions of the two vectors to be the same.

Let's implement our own version of the dot product below, using a for loop, to implement a function which returns the dot product of two vectors (assume both a and b are the same shape):

```
def my_dot(a, b):
x = 0
for i in range(a.shape[0]):
x = x + a[i] * b[i]
return x
```

```
# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
my_dot(a, b)
```

`# result of my_dot(a, b) = 24`

*Note, the dot product is expected to return a scalar value.*

Let's try the same operations using np.dot:

```
# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
c = np.dot(b, a)
```

`# result of both c would be 24`

__Compute Cost with Multiple Variables__

The equation for the cost function with multiple variables is:

where:

In contrast to previous functions, **w **and **x_i** are vectors rather than scalars, supporting multiple features. Below is an implementation of the above equations:

```
def compute_cost(X, y, w, b):
m = X.shape[0]
for i in range(m):
f_wb_i = np.dot(X[i], w) + b
cost = cost + (f_wb_i - y[i]) ** 2
cost = cost / (2*m)
return cost
```

__Gradient descent with Multiple Variables__

*please note that I'm referring to the term: partial derivative/ derivative term/ symbol interchangeably in my posts, as was done in the course and they refer to the same thing. but in mathematics, partial derivative is used to refer to multi-variable functions, (>1), and derivative used to refer to single variable function. *

Gradient descent for multiple variables:

where, n is the number of features, parameters **w_j**, **b, **are updated simultaneously and where:

let's implement the equations above *(there are many ways to implement this equation, and this is one version)*:

```
# let's first compute the partial derivative terms
def compute_gradient(X, y, w, b):
m, n = X.shape
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
err = (np.dot(X[i], w) + b) - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err * X[i, j]
dj_db = dj_db + err
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_db, dj_dw
```

after receiving your derivative terms, let's compute gradient descent:

```
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
J_history = []
w = copy.deepcopy(w_in)
b = b_in
for i in range(num_iters):
dj_db, dj_dw = gradient_function(X, y, w, b)
w = w - alpha * dj_dw
b = b - alpha * dj_db
if i < 100000:
J_history.append(cost_function(X, y, w, b))
if i%math.ceil(num_iters/10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}")
return w, b, J_history
```

To test for implementation:

*note that this is the code to implement gradient descent, but no actual data is being used, and this is for reference purposes only*

```
# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# set gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b, compute_cost, compute_gradient, alpha, iterations)
print(f"b, w found by gradient descent: {b_final:0.2f}, {w_final}")
m, _ = X_train.shape
for i in range(m):
print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")
```

```
# expected result:
b, w found by gradient descent: -0.00,[0.2 0. -0.01. -0.07]
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178
```

Our example result shows that our predictions are not very accurate (vs the target value), we'll explore how to improve on this in our next post tomorrow.

## コメント