#### Advanced Optimization

Gradient descent is an optimization algorithm that's widely used in Machine Learning and was the foundation of many algorithm like linear regression and logistic regression and early implementation of neural networks, but there are other optimization algorithms for minimizing the cost function as well.

The Adam algorithm can adapt the learning rate automatically, depending on how gradient descent is proceeding. If a parameter **wj** or **b **keeps on moving in roughly the same direction, it will increase the learning rate for that parameter, conversely, if a parameter keeps oscillating back and forth, it will reduce the learning rate for that parameter.

Adam stands for Adaptive Movement Estimation

The Adam algorithm doesn't use a single global learning rate alpha, but uses a different learning rates for every single parameter of your model. If your parameters (**w **or **b**) keeps moving in the same direction, increase learning rate, and if your parameters keep oscillating, reduce learning rate.

#### Additional Layer Types

All the neural network layers we have seen so far have been the dense layer type in which every neuron in the layer get its inputs all the activation from the previous layer. To recap the dense layer that we have been using, the activation of a neuron, in let's say, the second hidden layer is a function of every single activation value from the previous layer.

Other layer types that you may see is called a Convolutional layer, usually to process images, and Recurrent neural network for time series prediction. We will explore them further in Deep Learning.

#### Adam Algorithm

The Adam optimization algorithm need some default initial learning rate, let's take a look at how we can implement it in Python with TensorFlow:

```
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
)
```

## Comments