These days I start to learn neural networks again and write some Matlab codes from scratch. I try to understand everything I do while I write the code, so I derive the equations in the back propagation while try to keep it clear and easy to understand.
Neural Network Functions
A multi layer neural network could be defined as this:
Assume means the number of neural in layer i, then the variables in the equations could be explained as below:
xis the input, which is a row vector, it has elements.
- meas the weights in layer
i, which is a matrix of rows and columns.
- means biases in layer i, which is a row vector of elements.
- means activation function in the ith layer, which the output is a row vector, it has elements.
lmeans the last layer. The output of is the output of the neural network.
And may be different in different use cases. This one is an example:
Gradient Descent Algorithm
We need a cost function to measure how well do we do for now. And the training of the network becomes a optimization problem. The method we use in the problem is gradient descent. Let me try to explain it.
Assume we have a cost function, and it is always non-negative. For example, this function is a good one:
Then the goal is try to make the output of the cost function smaller. Since
y is fixed, the change of cost function while change
b little could be shown as this:
In order to make the cost function smaller, we need to make negative. We can make and . So that which is always negative.
Compute Partial Derivative
So the goal is to compute the partial derivative and . Using equations (1), (2), (3), (5) and chain rule, we can get this:
Note that we use before is because function is element wise.
We can find there are many same parts in these equations: . So we can define , and rewrite these equations like this to avoid compute these parts many times:
With these equations, we can write the back propagation algorithm easily.