Neural networks are not too hard to understand theoretically - part 2
In previous article, I have described background between neural network training. Here I will just further investigate special types of neural networks.
Neural networks with one hidden layer
Here we have neural network with one hidden layer where we have P nodes:
$$ y_p = \sigma (\sum_{i=1}^N w^{(1)}_{i,p} x_i) $$
where sigma is activation function
now our output node is:
$$ f_m(x) = Z(\sum_{i=1}^P w^{(2)}_{i,m} y_i) $$
where Z is called output function.
Now, we can apply our gradient method to update weights:
$$ \Delta w = - \alpha (\sum_{n=0}^K \sum_{m=1}^M (f_m(x_n, w) - y_{m,n}) \nabla f_m) $$
where gradients we can calculate as follows:
$$ \frac{\partial f_m}{\partial w^{(1)}{i,j}} = Z'(\sum x_l) x_i) $$ }^P w_{k,m}^{(2)} y_k) (\sum_{k=1}^P w_{k,m}^{(2)} \sigma'(\sum_{l=1}^N w^{(1)}_{l,k
$$ \frac{\partial f_m}{\partial w^{(2)}{i,m}} = Z'(\sum}^P w^{(2){k,m} y_k) \sigma(\sum x_l) $$ }^N w_{l,i}^{(1)
And that's it!
Now we have full method for determining gradient of all f's, then for determining gradient of error and finally for updating weights accordingly.
Z and sigma are functions which derivatives we have to calculate.
Special case - Z and sigma are identity functions
In this case: $$ z(t) = t $$ and $$ \sigma(t) = t $$
so our derivatives are like this then:
$$ \frac{\partial f_m}{\partial w^{(1)}{i,j}} = \sum x_i $$ }^P w_{k,m}^{(2)
$$ \frac{\partial f_m}{\partial w^{(2)}{i,m}} = \sum x_l $$ }^N w_{l,i}^{(1)
Tweet