top of page

Day 14: Gradient Descent for Logistic Regression

We have previously looked at gradient descent for linear regression. Today, we will look at gradient descent for logistic regression.

Recall that the purpose of running gradient descent is to find the values of parameters w and b, and we do that by minimizing the cost function J


The model function we use in logistic regression:

The algorithm to minimize the cost function:

so, the gradient descent algorithm for logistic regression:

Gradient descent implementation in Python

The gradient descent algorithm implementation has two components:

  • The loop implementing the gradient descent algorithm. (gradient_descent)

  • the calculation of the partial derivatives (see the algorithm inside the square brackets of the gradient descent algorithm). (compute_gradient_logistic)

  • the partial derivative for w[j] will be denoted dj_dw and b would be dj_db

To implement the partial derivatives to find parameters w and b:

  • initialize variables to accumulate dj_dw and dj_db

  • for each example:

  • calculate the error for that example f(x) - y[i]

  • for each input value xj_i in this example,

  • multiply the error by the input xj_i and add to the corresponding element

  • dj_dw

  • add the error to dj_db

  • divide dj_db and dj_dw by total number of examples (m)

  • note that x[i] in numpy X[i,:] or X[i] and xj_i is X[i,j]

def compute_gradient_logistic(X, y, w, b):
	m, n = X.shape
	dj_dw = np.zeros((n,))
	dj_db = 0.

	for i in range(m):
		f_wb_i = sigmoid([i], w) + b)
		err_i = f_wb_i - y[i]
		for j in range(n):
			dj_dw[j] = dj_dw[j] + err_i * X[i,j]
		dj_db = dj_db + err_i
	dj_dw = dj_dw/m
	dj_db = dj_db/m

	return dj_db, dj_dw
X (ndarray (m,n))    : Data, m examples with n features
y (ndarray (m,)      : target values
w (ndarray (n,)      : model parameters
b (scalar)          : model parameter
dj_dw (ndarray (n,)) : the gradient of cost w.r.t the parameters w
dj_db (scalar)       : the gradient of cost w.r.t the parameter b

def gradient_descent(X, y, w_in, b_in, alpha, num_iters):
    # An array to store cost J and w's at each iteration
    J_history = []
    w = copy.deepcopy(w_in)
    b = b_in
    for i in range(num_iters):
        # calculate the gradient and update the parameters
        dj_db, dj_dw = compute_gradient_logistic(X, y, w, b)
        # update parameters using w, b, alpha, and gradient
        w = w - alpha * dj_dw
        b = b - alpha * dj_db
        # Save cost J at each iteration
        if i < 100000:
            J_history.append(compute_cost_logistic(X, y, w, b))
        # print cost every at intervals 10 times or as many iterations if < 10
        if i%math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]}")
    return w, b, J_history

Recent Posts

See All

Day 39: Tree Ensembles

Using Multiple Decision Trees One of the weaknesses of using a single decision tree is that decision tree can be highly sensitive to small changes in the data. One solution to make the algorithm less


bottom of page