Skip to main content

Posts

Showing posts from September 8, 2019

What happens if gradient is set at 1 in "Gradient descent algorithm"

So, What is gradient descent algorithm ?  - Given a feed forward netword we apply gradient descent as a fundamental function of Operation : 1. Randomly initialize : b, W1, W2, ... , Wm 2. Repeat untill convergence 3. Predict y(i) for each data point in training 4. Calculate loss J(b,w) 5. Calculate gradient of J(b,w)      b(new)   =  b(old)   -a.gradient     w1(new) =  w1(old)-a.gradient.w1     w2(new) =  w2(old)-a.gradient.w2     wm(new) =  wm(old) - a.gradient.wm 6. Update b,w1,w2,......,wm .. Simulateneously.. - When "Louis Augustin Cauchy" needed a function to find local minima he used idea of slope to iteratively move in direction guided by slope to reach local minima. - Using the same idea in feed forward networks leads to convergence of minimum error. Why gradient anyway ? So if gradient is set to 1 or in other words if we dont use gradient descent we reach to a point directly on x - axis which will not be the required local minima.