## Cost Function

a) = total number of layers in the network

b) = number of units (not counting bias unit) in layer l

c) = number of output units/classes

We denote as being a hypothesis that results in the output.
Our cost function for neural networks is going to be a generalization of the one we used for logistic regression.

Recall that the cost function for regularized logistic regression was:

For neural networks, it is going to be slightly more complicated:

## Backpropagation Algorithm

Given training set

Set

For training example t =1 to m:

- Set
- Perform forward propagation to compute for
- Using compute
- Compute using
- or with vectorization,
- If
- If

## Gradient Checking

We can approximate the derivative of our cost function with:

With multiple theta matrices: