top of page

Day 25: Neural Network Training

TensorFlow Implementation

In previous post, we learned how to carry out forward pass in neural network. This week, we're going to go over training of a neural network.

Let's continue with our running example of handwritten digit recognition, 0 or 1.



Given a set of (x, y), how to build and train this code?


Steps to take to train a neural network in TensorFlow:

  1. Specify the model, which tells TensorFlow how to compute for the inference. Essentially, asking TensorFlow to sequentially string together these 3 layers of a neural network

  2. Compile the model using a specific loss function.

  3. Train the model, call the fit function, which tells TensorFlow to fit the model that we specified in step 1 using the loss of the cost function we specified in step 2 to the dataset


# Step 1
import tensorflow as tf
from tensorflow.keras. import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
                Dense(units=25, activation='sigmoid'),
                Dense(units=15, activation='sigmoid'),
                Dense(units=1, activation='sigmoid')
                ])
# Step 2
from tensorflow.keras.losses import BinaryCrossentropy

model.compile(loss=BinaryCrossEntropy())
# Step 3
model.fit(X, Y, epochs=100)

epochs: number of steps in gradient descent


Training Details

In this section, we will look at the details of what the TensorFlow code for training a neural network is actually doing. Let's go over the 3 steps to train the neural network again.

In step 1, when we specify the model on TensorFlow, the code specifies the entire architecture of the neural network and therefore tells TensorFlow everything it needs in order to compute the activations.

In step 2, we have to specify what the loss function is, that will also define the cost function we use to train the neural network.

For the handwritten digit classification problem where images are either of zero or one, the most common loss function to use is the binary cross entropy loss function, which is the same one we use in logistic regression:



The syntax model.compile(loss=BinaryCrossEntropy()) ask TensorFlow to compile the neural network using this loss function. You can change or choose your choice of loss function, depending on your problem, for example, in a regression model, you may want to use the square error cost function:


model.compile(loss=MeanSquaredError())

Lastly, you will ask TensorFlow to minimize the cost function (sort of like gradient descent we did before). TensorFlow use an algorithm called backpropagation to compute those partial derivative terms. Essentially, it implements backpropagation all within this function called fit, and in the example, for 100 iterations(epochs)


Note: this neural network shown in this post may also be called MultiLayer Perceptron(MLP)

MLP is a fully connected class of feedforward Artificial Neural Network (ANN)

Recent Posts

See All

Day 39: Tree Ensembles

Using Multiple Decision Trees One of the weaknesses of using a single decision tree is that decision tree can be highly sensitive to...

Comments


bottom of page