In previous article, I have described background between neural network training. Here I will just further investigate special types of neural networks.

Neural networks with one hidden layer

Here we have neural network with one hidden layer where we have P nodes:

\[y_p = \sigma (\sum_{i=1}^N w^{(1)}_{i,p} x_i)\]

where sigma is activation function

now our output node is:

\[f_m(x) = Z(\sum_{i=1}^P w^{(2)}_{i,m} y_i)\]

where Z is called output function.

Now, we can apply our gradient method to update weights:

\[\Delta w = - \alpha (\sum_{n=0}^K \sum_{m=1}^M (f_m(x_n, w) - y_{m,n}) \nabla f_m)\]

where gradients we can calculate as follows:

\[\frac{\partial f_m}{\partial w^{(1)}_{i,j}} = Z'(\sum_{k=1}^P w_{k,m}^{(2)} y_k) (\sum_{k=1}^P w_{k,m}^{(2)} \sigma'(\sum_{l=1}^N w^{(1)}_{l,k} x_l) x_i)\] \[\frac{\partial f_m}{\partial w^{(2)}_{i,m}} = Z'(\sum_{k=1}^P w^{(2)}_{k,m} y_k) \sigma(\sum_{l=1}^N w_{l,i}^{(1)} x_l)\]

And that’s it!

Now we have full method for determining gradient of all f’s, then for determining gradient of error and finally for updating weights accordingly.

Z and sigma are functions which derivatives we have to calculate.

Special case - Z and sigma are identity functions

In this case: \(z(t) = t\) and \(\sigma(t) = t\)

so our derivatives are like this then:

\[\frac{\partial f_m}{\partial w^{(1)}_{i,j}} = \sum_{k=1}^P w_{k,m}^{(2)} x_i\] \[\frac{\partial f_m}{\partial w^{(2)}_{i,m}} = \sum_{l=1}^N w_{l,i}^{(1)} x_l\]