Neural networks are not too hard to understand theoretically - part 2

In previous article, I have described background between neural network training. Here I will just further investigate special types of neural networks.

Neural networks with one hidden layer

Here we have neural network with one hidden layer where we have P nodes:

$$ y_p = \sigma (\sum_{i=1}^N w^{(1)}_{i,p} x_i) $$

where sigma is activation function

now our output node is:

$$ f_m(x) = Z(\sum_{i=1}^P w^{(2)}_{i,m} y_i) $$

where Z is called output function.

Now, we can apply our gradient method to update weights:

$$ \Delta w = - \alpha (\sum_{n=0}^K \sum_{m=1}^M (f_m(x_n, w) - y_{m,n}) \nabla f_m) $$

where gradients we can calculate as follows:

$$ \frac{\partial f_m}{\partial w^{(1)}{i,j}} = Z'(\sum x_l) x_i) $$ }^P w_{k,m}^{(2)} y_k) (\sum_{k=1}^P w_{k,m}^{(2)} \sigma'(\sum_{l=1}^N w^{(1)}_{l,k

$$ \frac{\partial f_m}{\partial w^{(2)}{i,m}} = Z'(\sum}^P w^{(2){k,m} y_k) \sigma(\sum x_l) $$ }^N w_{l,i}^{(1)

And that's it!

Now we have full method for determining gradient of all f's, then for determining gradient of error and finally for updating weights accordingly.

Z and sigma are functions which derivatives we have to calculate.

Special case - Z and sigma are identity functions

In this case: $$ z(t) = t $$ and $$ \sigma(t) = t $$

so our derivatives are like this then:

$$ \frac{\partial f_m}{\partial w^{(1)}{i,j}} = \sum x_i $$ }^P w_{k,m}^{(2)

$$ \frac{\partial f_m}{\partial w^{(2)}{i,m}} = \sum x_l $$ }^N w_{l,i}^{(1)