# Backpropagation Algorithm

Posted on 17 Jan 2015, tagged `algorithm`

`deep learning`

These days I start to learn neural networks again and write some Matlab codes from scratch. I try to understand everything I do while I write the code, so I derive the equations in the back propagation while try to keep it clear and easy to understand.

## Neural Network Functions

A multi layer neural network could be defined as this:

Assume means the number of neural in layer i, then the variables in the equations could be explained as below:

`x`

is the input, which is a row vector, it has elements.- meas the weights in layer
`i`

, which is a matrix of rows and columns. - means biases in layer i, which is a row vector of elements.
- means activation function in the ith layer, which the output is a row vector, it has elements.
`l`

means the last layer. The output of is the output of the neural network.

And may be different in different use cases. This one is an example:

## Gradient Descent Algorithm

We need a cost function to measure how well do we do for now. And the training of the network becomes a optimization problem. The method we use in the problem is gradient descent. Let me try to explain it.

Assume we have a cost function, and it is always non-negative. For example, this function is a good one:

Then the goal is try to make the output of the cost function smaller. Since `x`

and `y`

is fixed, the change of cost function while change `w`

and `b`

little could be shown as this:

In order to make the cost function smaller, we need to make negative. We can make and . So that which is always negative.

## Compute Partial Derivative

So the goal is to compute the partial derivative and . Using equations (1), (2), (3), (5) and chain rule, we can get this:

Note that we use before is because function is element wise.

We can find there are many same parts in these equations: . So we can define , and rewrite these equations like this to avoid compute these parts many times:

With these equations, we can write the back propagation algorithm easily.