Planar data classification with one hidden layer

Neural Network model

Here is our model:

Mathematically:
For one example $x^{(i)}$:

Given the predictions on all the examples, you can also compute the cost $J$ as follows:

Defining the neural network structure

Define three variables:
- n_x: the size of the input layer
- n_h: the size of the hidden layer (set this to 4)
- n_y: the size of the output layer

Initialize the model’s parameters

def initialize_parameters(n_x, n_h, n_y):
parameters = {"W1": W1,
              "b1": b1,
              "W2": W2,
              "b2": b2}
    
return parameters

Forward Propagation

def forward_propagation(X, parameters):
cache = {"Z1": Z1,
         "A1": A1,
         "Z2": Z2,
         "A2": A2}
    
return A2, cache

Compute Cost

Now that you have computed $A^{[2]}$ (in the Python variable “A2”), which contains for every example, you can compute the cost function as follows:

Implement compute_cost() to compute the value of the cost $J$.

Instructions:

  • Implement the cross-entropy loss.
def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (13)
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2
    
    Returns:
    cost -- cross-entropy cost given equation (13)
    """
    
    m = Y.shape[1] # number of example

    # Compute the cross-entropy cost
    logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1-A2), (1-Y))
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    assert(isinstance(cost, float))
    
    return cost

Backpropagation

Instructions: Backpropagation is usually the hardest (most mathematical) part in deep learning. To help you, here again is the slide from the lecture on backpropagation. You’ll want to use the six equations on the right of this slide, since you are building a vectorized implementation.

def backward_propagation(parameters, cache, X, Y):
grads = {"dW1": dW1,
         "db1": db1,
         "dW2": dW2,
         "db2": db2}
    
return grads

General gradient descent rule: $ \theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$ where $\alpha$ is the learning rate and $\theta$ represents a parameter.

Illustration: The gradient descent algorithm with a good learning rate (converging) and a bad learning rate (diverging). Images courtesy of Adam Harley.

Update Parameters

 W1 = W1 - learning_rate * dW1
 b1 = b1 - learning_rate * db1
 W2 = W2 - learning_rate * dW2
 b2 = b2 - learning_rate * db2

Build the neural network model in nn_model()

Instructions: The neural network model has to use the previous functions in the right order.

 for i in range(0, num_iterations):
         
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X, parameters)
        
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y, parameters)
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X, Y)
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters, grads)

Predictions

Reminder: predictions =

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    A2, cache = forward_propagation(X, parameters)
    predictions = np.round(A2)
    
    return predictions

Output:

predictions mean 0.666666666667

Reference

1.Deep Learning
2.Neural Networks and Deep Learning
3.Demystifying Deep Convolutional Neural Networks
4.CS231n: Convolutional Neural Networks for Visual Recognition

Yuehua(刘跃华) wechat