## k-Nearest Neighbor Classifier

**L1 distance:**

**L2 distance:**

```
distances = np.sqrt(np.sum(np.square(self.Xtr - X[i,:]), axis = 1))
```

**L1 vs. L2.** It is interesting to consider differences between the two metrics. In particular, the L2 distance is much more unforgiving than the L1 distance when it comes to differences between two vectors. That is, the L2 distance prefers many medium disagreements to one big one. L1 and L2 distances (or equivalently the L1/L2 norms of the differences between a pair of images) are the most commonly used special cases of a p-norm. L2更不能容忍较大差异。

### A kNN classifier with L2 distance

```
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train)
dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)]))
```

### Cross-validation

**用来训练超参数，例如K-nearest-neighbour classifier中的超参数K.**

In cases where the size of your training data (and therefore also the validation data) might be small, people sometimes use a more sophisticated technique for hyperparameter tuning called cross-validation. Working with our previous example, the idea is that instead of arbitrarily picking the first 1000 datapoints to be the validation set and rest training set, you can get a better and less noisy estimate of how well a certain value of k works by iterating over different validation sets and averaging the performance across these. For example, in 5-fold cross-validation, we would split the training data into 5 equal folds, use 4 of them for training, and 1 for validation. We would then iterate over which fold is the validation fold, evaluate the performance, and finally average the performance across the different folds.

## Linear SVM Classifier

### Multiclass Support Vector Machine loss

### Linear_SVM

```
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
#a vectorized version of the structured SVM loss
num_train = X.shape[0]
num_classes = W.shape[1]
scores = X.dot(W)
correct_class_scores = scores[range(num_train), list(y)].reshape(-1,1) #(N, 1)
margins = np.maximum(0, scores - correct_class_scores +1)
margins[range(num_train), list(y)] = 0
loss = np.sum(margins) / num_train + 0.5 * reg * np.sum(W * W)
# Implement a vectorized version of the gradient for the structured SVM #
# loss, storing the result in dW.
coeff_mat = np.zeros((num_train, num_classes))
coeff_mat[margins > 0] = 1
coeff_mat[range(num_train), list(y)] = 0
coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)
dW = (X.T).dot(coeff_mat)
dW = dW/num_train + reg*W
```

## Softmax classifier

### Softmax Loss function

### Softmax_classifier

```
loss = 0.0
dW = np.zeros_like(W)
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
num_classes = W.shape[1]
num_train = X.shape[0]
scores = X.dot(W)
shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))
loss /= num_train
loss += 0.5* reg * np.sum(W * W)
dS = softmax_output.copy()
dS[range(num_train), list(y)] += -1
dW = (X.T).dot(dS)
dW = dW/num_train + reg* W
```

## Reference

1.CS231n: Convolutional Neural Networks for Visual Recognition by Stanford University

2.Image Classification

3.Linear Classification