## Linear Regression with Multiple Variables

**Hypothesis function**:

The training examples are stored in X row-wise, like such:

### Cost function

**The vectorized version is:**

### Gradient Descent for Multiple Variables

In other words:

#### Matrix Notation

Finally, the matrix notation (vectorized) of the Gradient Descent rule is:

## Feature Normalization

Two techniques to help with this are **feature scaling** and **mean normalization**. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable, resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:

*: the average of all the values for feature (i),
: the range of values (max - min).*

Matlab code:

```
n = size(X, 2);
for i = 1:n
feature = X(:, i);
mu(i) = mean(feature);
sigma(i) = std(feature);
X_norm(:,i) = (feature - mu(i))/sigma(i);
```

## Normal Equation

The “Normal Equation” is a method of finding the optimum theta **without iteration**.