So, What is gradient descent algorithm ?
- Given a feed forward netword we apply gradient descent as a fundamental function of Operation :1. Randomly initialize : b, W1, W2, ... , Wm
2. Repeat untill convergence
3. Predict y(i) for each data point in training
4. Calculate loss J(b,w)
5. Calculate gradient of J(b,w)
b(new) = b(old) -a.gradient
w1(new) = w1(old)-a.gradient.w1
w2(new) = w2(old)-a.gradient.w2
wm(new) = wm(old) - a.gradient.wm
6. Update b,w1,w2,......,wm .. Simulateneously..
- When "Louis Augustin Cauchy" needed a function to find local minima he used idea of slope to iteratively move in direction guided by slope to reach local minima.
- Using the same idea in feed forward networks leads to convergence of minimum error.
Why gradient anyway ?
So if gradient is set to 1 or in other words if we dont use gradient descent we reach to a point directly on x - axis which will not be the required local minima.
Comments
Post a Comment