So, What is gradient descent algorithm ? - Given a feed forward netword we apply gradient descent as a fundamental function of Operation : 1. Randomly initialize : b, W1, W2, ... , Wm 2. Repeat untill convergence 3. Predict y(i) for each data point in training 4. Calculate loss J(b,w) 5. Calculate gradient of J(b,w) b(new) = b(old) -a.gradient w1(new) = w1(old)-a.gradient.w1 w2(new) = w2(old)-a.gradient.w2 wm(new) = wm(old) - a.gradient.wm 6. Update b,w1,w2,......,wm .. Simulateneously.. - When "Louis Augustin Cauchy" needed a function to find local minima he used idea of slope to iteratively move in direction guided by slope to reach local minima. - Using the same idea in feed forward networks leads to convergence of minimum error. Why gradient anyway ? So if gradient is set to 1 or in other words if we dont use gradient descent we reach to a point dir...