a) = total number of layers in the network
b) = number of units (not counting bias unit) in layer l
c) = number of output units/classes
We denote as being a hypothesis that results in the output.
Our cost function for neural networks is going to be a generalization of the one we used for logistic regression.
Recall that the cost function for regularized logistic regression was:
For neural networks, it is going to be slightly more complicated:
Given training set
For training example t =1 to m:
- Perform forward propagation to compute for
- Using compute
- Compute using
- or with vectorization,
We can approximate the derivative of our cost function with:
With multiple theta matrices: