## k-Nearest Neighbor Classifier

L1 distance:

L2 distance:

distances = np.sqrt(np.sum(np.square(self.Xtr - X[i,:]), axis = 1))



L1 vs. L2. It is interesting to consider differences between the two metrics. In particular, the L2 distance is much more unforgiving than the L1 distance when it comes to differences between two vectors. That is, the L2 distance prefers many medium disagreements to one big one. L1 and L2 distances (or equivalently the L1/L2 norms of the differences between a pair of images) are the most commonly used special cases of a p-norm. L2更不能容忍较大差异。

### A kNN classifier with L2 distance

num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train)

dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)]))


k_nearest_neighbor.py

### Cross-validation

In cases where the size of your training data (and therefore also the validation data) might be small, people sometimes use a more sophisticated technique for hyperparameter tuning called cross-validation. Working with our previous example, the idea is that instead of arbitrarily picking the first 1000 datapoints to be the validation set and rest training set, you can get a better and less noisy estimate of how well a certain value of k works by iterating over different validation sets and averaging the performance across these. For example, in 5-fold cross-validation, we would split the training data into 5 equal folds, use 4 of them for training, and 1 for validation. We would then iterate over which fold is the validation fold, evaluate the performance, and finally average the performance across the different folds.

## Linear SVM Classifier

### Linear_SVM

 loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
#a vectorized version of the structured SVM loss
num_train = X.shape[0]
num_classes = W.shape[1]
scores = X.dot(W)
correct_class_scores = scores[range(num_train), list(y)].reshape(-1,1) #(N, 1)
margins = np.maximum(0, scores - correct_class_scores +1)
margins[range(num_train), list(y)] = 0
loss = np.sum(margins) / num_train + 0.5 * reg * np.sum(W * W)

# Implement a vectorized version of the gradient for the structured SVM     #
# loss, storing the result in dW.
coeff_mat = np.zeros((num_train, num_classes))
coeff_mat[margins > 0] = 1
coeff_mat[range(num_train), list(y)] = 0
coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)

dW = (X.T).dot(coeff_mat)
dW = dW/num_train + reg*W


linear_svm.py

## Softmax classifier

### Softmax_classifier

  loss = 0.0
dW = np.zeros_like(W)
# Store the loss in loss and the gradient in dW. If you are not careful     #
# here, it is easy to run into numeric instability. Don't forget the        #
# regularization!                                                           #

num_classes = W.shape[1]
num_train = X.shape[0]
scores = X.dot(W)
shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis =  1).reshape(-1,1)
loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))
loss /= num_train
loss +=  0.5* reg * np.sum(W * W)

dS = softmax_output.copy()
dS[range(num_train), list(y)] += -1
dW = (X.T).dot(dS)
dW = dW/num_train + reg* W



softmax.py