top of page

Day 29: Additional Neural Network Concepts

Advanced Optimization

Gradient descent is an optimization algorithm that's widely used in Machine Learning and was the foundation of many algorithm like linear regression and logistic regression and early implementation of neural networks, but there are other optimization algorithms for minimizing the cost function as well.


The Adam algorithm can adapt the learning rate automatically, depending on how gradient descent is proceeding. If a parameter wj or b keeps on moving in roughly the same direction, it will increase the learning rate for that parameter, conversely, if a parameter keeps oscillating back and forth, it will reduce the learning rate for that parameter.


Adam stands for Adaptive Movement Estimation


The Adam algorithm doesn't use a single global learning rate alpha, but uses a different learning rates for every single parameter of your model. If your parameters (w or b) keeps moving in the same direction, increase learning rate, and if your parameters keep oscillating, reduce learning rate.


Additional Layer Types

All the neural network layers we have seen so far have been the dense layer type in which every neuron in the layer get its inputs all the activation from the previous layer. To recap the dense layer that we have been using, the activation of a neuron, in let's say, the second hidden layer is a function of every single activation value from the previous layer.


Other layer types that you may see is called a Convolutional layer, usually to process images, and Recurrent neural network for time series prediction. We will explore them further in Deep Learning.


Adam Algorithm

The Adam optimization algorithm need some default initial learning rate, let's take a look at how we can implement it in Python with TensorFlow:


model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
        )

Recent Posts

See All

Day 39: Tree Ensembles

Using Multiple Decision Trees One of the weaknesses of using a single decision tree is that decision tree can be highly sensitive to small changes in the data. One solution to make the algorithm less

コメント


bottom of page