Neural networks are not too hard to understand theoretically - part 2
In previous article, I have described background between neural network training. Here I will just further investigate special types of neural networks.
Neural networks with one hidden layer
Here we have neural network with one hidden layer where we have P nodes:
\[y_p = \sigma (\sum_{i=1}^N w^{(1)}_{i,p} x_i)\]where sigma is activation function
now our output node is:
\[f_m(x) = Z(\sum_{i=1}^P w^{(2)}_{i,m} y_i)\]where Z is called output function.
Now, we can apply our gradient method to update weights:
\[\Delta w = - \alpha (\sum_{n=0}^K \sum_{m=1}^M (f_m(x_n, w) - y_{m,n}) \nabla f_m)\]where gradients we can calculate as follows:
\[\frac{\partial f_m}{\partial w^{(1)}_{i,j}} = Z'(\sum_{k=1}^P w_{k,m}^{(2)} y_k) (\sum_{k=1}^P w_{k,m}^{(2)} \sigma'(\sum_{l=1}^N w^{(1)}_{l,k} x_l) x_i)\] \[\frac{\partial f_m}{\partial w^{(2)}_{i,m}} = Z'(\sum_{k=1}^P w^{(2)}_{k,m} y_k) \sigma(\sum_{l=1}^N w_{l,i}^{(1)} x_l)\]And that’s it!
Now we have full method for determining gradient of all f’s, then for determining gradient of error and finally for updating weights accordingly.
Z and sigma are functions which derivatives we have to calculate.
Special case - Z and sigma are identity functions
In this case: \(z(t) = t\) and \(\sigma(t) = t\)
so our derivatives are like this then:
\[\frac{\partial f_m}{\partial w^{(1)}_{i,j}} = \sum_{k=1}^P w_{k,m}^{(2)} x_i\] \[\frac{\partial f_m}{\partial w^{(2)}_{i,m}} = \sum_{l=1}^N w_{l,i}^{(1)} x_l\]