We are going to discuss about Normalization & Optimizers which will be used across AI, we will use them in LLM's, Agentic AI framework etc. As we discussed in our previous blogs(links for previous blogs are mentioned at the end of this blog), while calculating gradients, when the previous weights are almost similar to new weights, we will land into a problem called Vanishing Gradient . Similarly when current weight is too large than the previous weights, then we will land into Exploding Gradient issue. Using Normalization problem we are going to prevent Vanishing & Exploding Gradient problems. This blogs agenda : Normalization Batch Normalization (Useful in Neural Network) Layer Normalization (Useful in LLM's) Weights Initialization Xavier He EWMA (Exponential Weighted Moving Average) We will use it in Optimizers like Momentum, NAG, Adagrad, RMSProp, Adam Normalization Normalization in Neural Networks and De...
(AI Blog#3) Deep Learning Foundations - Activation & Loss Functions, Gradient Descent algorithms & Optimization techniques
It is extremely important to have a deep knowledge while designing a machine learning model, otherwise we will end up creating ML models which are of no use. We have to have a clear understanding on certain techniques to confidently build a ML model, train it using "training data", finalize the model and to deploy it in production. So far, from blog #1, #2, we have seen about the fundamentals of Deep Learning and Neural Network, architecture of a Neural Network, internal layers and components etc. Providing the links of Blogs #1 , #2 below for quick reference. Deep Learning & Neural Networks : https://arunsdatasphere.blogspot.com/2026/01/deep-learning-and-neural-networks.html Building a real world neural network: A practical usecase explained : https://arunsdatasphere.blogspot.com/2026/01/building-real-world-neural-network.html Now let's dive through below concepts/criteria to help gaining confidence on building your ML model: Activation Functions (Forward Propaga...