top of page

Day 24: Neural Network implementation in Python

Forward Prop in a single layer

In this section, we will explore implementation of forward propagation from scratch. We're going to take another look at our coffee roasting model example and use a 1-D vector instead of a 2-D matrix like we did before.

x = np.array([200, 17])    # 1D vector

w1_1 = np.array([1, 2])
b1_1 = np.array([-1])
z1_1 =, x) + b1_1
a1_1 = sigmoid(z1_1)

w1_2 = np.array([-3, 4])
b1_2 = np.array([1])
z1_2 =, x) + b1_2
a1_2 = sigmoid(z1_2)

w1_3 = np.array([5, -6])
b1_3 = np.array([2])
z1_3 =, x) + b1_3
a1_3 = sigmoid(z1_3)

Then, we combine all those to create a1:

a1 = np.array([a1_1, a1_2, a1_3])

Finally, we want to calculate a2:

w2_1 = np.array([-7, 8, 9])
b2_1 = np.array([3])
z2_1 =, a1) + b2_1
a2_1 = sigmoid(z2_1)

and that's how we implement forward prop with NumPy.

General Implementation of forward propagation

In previous section, we looked at forward prop implementation by hard coding lines of code for every single neuron. Let's now take a look at more general implementation of forward prop in Python.

Let's define the dense function:

  • It takes as input the activation from previous layers, as well as the parameters, weight(w) and bias(b), for the neurons in a given layer

  • then, it output the activations from the current layer

def dense(a_in, W, b):
    units = W.shape[1]             # 3 units
    a_out = np.zeros(units)
    for j in range(units):
        w = W[:, j]                # this pulls out the jth column in W
        z =, a_in) + b[j]
        a_out[j] = g(z)            # g() defined outside of here
    return a_out

Given the dense function, let's string together a few dense layers sequentially, in order to implement forward prop in the neural network:

def sequential(x):
    a1 = dense(x, W1, b1)
    a2 = dense(a1, W2, b2)
    a3 = dense(a2, W3, b3)
    a4 = dense(a3, W4, b4)
    f_x = a4
    return f_x

Recent Posts

See All

Day 39: Tree Ensembles

Using Multiple Decision Trees One of the weaknesses of using a single decision tree is that decision tree can be highly sensitive to small changes in the data. One solution to make the algorithm less


bottom of page