## Introduction

### What is Machine Learning?

Two definitions of Machine Learning are offered. Arthur Samuel described it as: *“the field of study that gives computers the ability to learn without being explicitly programmed.”* This is an older, informal definition.
Tom Mitchell provides a more modern definition: *“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”*.

In general, any machine learning problem can be assigned to one of two broad classifications:
**Supervised learning(监督学习)** and **Unsupervised learning(非监督学习)**.

### Supervised Learning

- There is a relationship between the input and the output.
- Supervised learning problems are categorized into
**“regression（回归）”**and**“classification（分类）”**problems.

*In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.*

### Unsupervised Learning

- Unsupervised learning allows us to approach problems with little or no idea what our results should look like.
- We can derive this structure by
**clustering**the data based on relationships among the variables in the data. - With unsupervised learning there is no feedback based on the prediction results.

## Linear Regression with One Variable(单变量线性回归)

### Cost Function(代价函数)

We can measure the accuracy of our hypothesis function by using a **cost function**.

Matlab code:

```
h = X*theta;
squareErrors = (h-y) .^2;
J = (1/(2*m))*sum(squareErrors);
```

This function is otherwise called the **“Squared error function”**, or **“Mean squared error”**.

**Goal**: minimize

### Gradient Descent(梯度下降法)

The gradient descent algorithm is:
**repeat until convergence:**
**where represents the feature index number.**
*At each iteration , one should simultaneously update the parameters . Updating a specific parameter prior to calculating another one on the j(th) iteration would yield to a wrong implementation.*

#### Gradient Descent For Linear Regression:

Matlab code:

```
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
h = X*theta;
theta = theta - alpha * (1/m) * (X' * (h-y));
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
```

So, this is simply gradient descent on the original cost function J. This method looks at every example in the entire training set on every step, and is called **batch gradient descent**.