top of page

Day 13: Cost Function for logistic regression

The cost function gives you a way to measure how well a specific set of parameters fits the training data, thereby gives you a way to choose better parameters.

The squared-error cost function that we used in linear regression is not suitable for logistic regression as it may result in a 'non-convex' cost function, which is "wiggly" with many possible minima.

In logistic regression, instead of the square-error function, we will be using the loss function, which looks like this:


which could be simplified as follow:


Loss Function Intuition

The loss function measures how well you're doing on one training example, and by summing up the losses on all of the training examples, you will then get the cost function, which measures how well you're doing on the entire training set.

the higher the loss, the further away is the prediction from ground truth label y.

if the algorithm predicts a probability close to 1 and the true label (y) is 1, then the loss is very small:

if prediction = 1, and y = 1, then loss = 0

if prediction = 1, and y = 0, then loss is very high, possibly infinity.

Our goal in the loss function is to get loss to be as close to 0 as possible

Thus, the cost function for logistic regression:


With this choice of loss function, the overall cost function will be convex, and thus you can reliably use gradient descent to take you to the global minimum


We can simplify our cost function as re-write it as follows:


This particular cost function is derived from statistics using a statistical principle called maximum likelihood estimation, which is an idea from statistics on how to efficiently find parameters for different models.


Cost Function for logistic regression

To calculate the cost function in logistic regression, we can compute it through this code:

import numpy as np

def compute_cost_logistic(X, y, w, b):
	m = X.shape[0]
	cost = 0.0
	for i in range(m):
		z_i = np.dot(X[i], w) + b
		f_wb_i = sigmoid(z_i)
		cost += -y[i] * np.log(f_wb_i) - (1 - y[i]) * np.log(1 - f_wb_i)

	cost = cost/m
	return cost

* we defined a sigmoid() function in day 11: introduction to classification


notation:

X (ndarray (m,n)) : Data, m examples with n features

y (ndarray (m,)) : target values

w (ndarray (n,)) : model parameters

b (scalar) : model parameters


Recent Posts

See All

Day 39: Tree Ensembles

Using Multiple Decision Trees One of the weaknesses of using a single decision tree is that decision tree can be highly sensitive to small changes in the data. One solution to make the algorithm less

Comments


bottom of page