Neural Networks: Representation

Non-linear Hypotheses

Neural networks offers an alternate way to perform machine learning when we have complex hypotheses with many features.

Model Representation I

If we had one hidden layer, it would look visually something like:

The values for each of the “activation” nodes is obtained as follows:

Each layer gets its own matrix of weights, $\Theta^{(j)}$

The dimensions of these matrices of weights is determined as follows:

Model Representation II

vectorized implementation

In other words, for layer j=2 and node k, the variable z will be:

The vector representation of x and $z^{j}$ is:

Setting $x = a^{(1)}$ we can rewrite the equation as:

matlab code:

a1 = [ones(m, 1) X]; % add 1
z2 = a1 * Theta1';
a2 = [ones(size(sigmoid(z2), 1), 1) sigmoid(z2)]; % add 1
z3 = a2 * Theta2';
a3 = sigmoid(z3); % H_theta(x)


We then get our final result with:

Multiclass Classification

To classify data into multiple classes, we let our hypothesis function return a vector of values. Say we wanted to classify our data into one of four final resulting classes:

Our resulting hypothesis for one set of inputs may look like:

Reference

Machine Learning by Stanford University