Linear Regression with Multiple Variables
The training examples are stored in X row-wise, like such:
The vectorized version is:
Gradient Descent for Multiple Variables
In other words:
Finally, the matrix notation (vectorized) of the Gradient Descent rule is:
Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable, resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:
: the average of all the values for feature (i), : the range of values (max - min).
n = size(X, 2); for i = 1:n feature = X(:, i); mu(i) = mean(feature); sigma(i) = std(feature); X_norm(:,i) = (feature - mu(i))/sigma(i);
The “Normal Equation” is a method of finding the optimum theta without iteration.