Orthogonalization or orthogonality is a system design property that assures that modifying an instruction or a component of an algorithm will not create or propagate side effects to other components of the system. It becomes easier to verify the algorithms independently from one another, it reduces testing and development time.
如果根据某个Cost Function， 系统用在测试集上很好，但无法反应该算法在真实数据中的表现，这意味着要么开发集的分布设置不正确，要么是Cost Function的测量指标不对。
在训练神经网络时， NG建议一般不要用early stopping。 因为一般情况下early stopping不太正交化。 单一影响的手段调网络会简单不少。
Single number evaluation metric
Satisficing and Optimizing metric
Satisficing metric： 如运行时间，消耗内存等。满足指标只需达到设置的阈值即可。
Setting up the training, development and test sets have a huge impact on productivity. It is important to choose the development and test sets from the same distribution and it must be taken randomly from all the data.
Choose a dev set and test set to reflect data you expect to get in the future and consider important to do well on.
Size of the dev and test sets
Modern era – Big data
Now, because a large amount of data is available, we don’t have to compromised as much and can use a greater portion to train the model.
- Set up the size of the test set to give a high confidence in the overall performance of the system.
- Test set helps evaluate the performance of the final classifier which could be less 30% of the whole data set.
- The development set has to be big enough to evaluate different ideas.
When to change dev/test sets and metrics
在当前dev set或者test set中表现很好，但在实际应用中表现不好时， 需要修改metric或者dev set.
Improving your model performance
The two fundamental assumptions of supervised learning
There are 2 fundamental assumptions of supervised learning. The first one is to have a low avoidable bias which means that the training set fits well. The second one is to have a low or acceptable variance which means that the training set performance generalizes well to the development set and test set.
If the difference between human-level error and the training error is bigger than the difference between the training error and the development error, the focus should be on bias reduction technique which are training a bigger model, training longer or change the neural networks architecture or try various hyperparameters search.
If the difference between training error and the development error is bigger than the difference between the human-level error and the training error, the focus should be on variance reduction technique which are bigger data set, regularization or change the neural networks architecture or try various hyperparameters search.
Carrying out error analysis
Cleaning up incorrectly labeled data
Build your first system quickly, then iterate
Training and testing on different distributions
Bias and Variance with mismatched data distributions
- 定义一个新的数据train-dev set 从训练集中抽取数据,和训练集数据来自同一个数据分布,但是不用于训练数据.
Addressing data mismatch
- Task A and B have the same input x.
- You have a lot more data for Task A than Task B.
- Low level features from A could be helpful for learning B.
- Training on a set of tasks that could benefit from having shared lower-level features.
- Usually: Amount of data you have for each task is quite similar.
- Can train a big enough neural network to do well on all the tasks.
End-to-End Deep Learning