Pages

Setting up your Machine Learning Application (Week 1 Summary)


This blog serves to provide summary for Prof Andrew ng deeplearning.ai specialisation course  "Improving Deep Neural Networks Hyperparameter tuning, Regularization & Optimization" 

A good high performance neural network largely depends on our choices how we set up our:
  • training set
  • development or dev set (validation set)
  • test sets
When training the Neural Network we've got to make a lot of decisions with respect to:
  • the number of layers we keep for our Neural Network
  • the number of hidden units for each layer in NN
  • learning rates
  • the activation functions we use for different layers
We can never know in the start the optimum values for all of these but through the iterative process. First we select these values random through our intuition. After running multiple times we refine our choice of parameters for trying to find a better NN.

Today deep learning has found success in lot of areas. NLP, computer vision, speech recognition, a lot of applications on also structured data( advertisements, web search, shopping websites), computer security, logistics etc. Many times we see a lot of cross domains applied in the applications too. But the fact is that intuitions from one domain or from application do not transfer to other application areas. So the best choices you make are dependent not on our experience but it depends on
  • the amount of data we have
  • the number of input features we have through computer configuration
  • and whether we are training on GPUs or CPUs (If so what exactly are the configuration?)
So one of the things that determine how quickly we can make progress is how efficiently we can go around this cycle. And setting our data sets well (train, dev, test)

Not mentioning here what use to be an old practice in neural network, in the modern big data era, where we might have millions of example, the trend is to have the dev and test sets very small percentage of the total. The dev set would be just to evaluate your trainings so we might not need full 20% of the data for that.

For example:

Total Data            1 million                  

                                       Training set             Dev set               Test set
(% of T.data)                         98%                        1%                       1%
 or                                             
                                              99.5%                     0.4%                    0.1%



You must have heard about the bias-variance trade off. But in deep learning, we talk about bias and variance but less about bias-variance trade off.

The first picture shows high bias as the classifier underfits the data.
The last picture shows when we fit incredibly complex classifier, may be deep NN, or NN with all the hidden units, we might fit the data perfectly but this is overfitting clearly.

So this example is on a 2D with having just 2 features x1 and x2. We can plot the data and visualize bias and variance. However, in high dimensional problem, we cannot plot the data and visualize division boundary.

So in case of high dimensional data with many features, the two key numbers to look at to understand bias and variance is through:
  • Train set error
  • Dev set error
Taking example : (Cat: y=1, Non Cat: y=0)

High Variance: If we do well in training set, but poorly on dev set that seems like we generalized the training data too much and so it failed to show the same result on dev set. So it is understood there is "High Variance"

High Bias: If we run the model to find high error on training data let alone dev set. This means there is a problem of "High Bias" our data don't even seems to fit well on the training data. 
The above picture shows high variance and high bias!

BASIC RECIPE FOR MACHINE LEARNING


After having trained an initial model, we will by looking at the training and dev set error evaluate if your model is high bias, high variance or both.

For high bias we should : 
  • try pick a network having more hidden layer or more hidden units or train it longer
  • NN architecture search

For high variance:
  • get more data
  • regularization
  • NN architecture search
So this is the basic structure of how to organize your machine learning problem to diagnose bias and variance and then try to select the right operation for you to make progress on your problem.









If you enjoyed this post and wish to be informed whenever a new post is published, then make sure you subscribe to my regular Email Updates. Subscribe Now!


Kindly Bookmark and Share it:

YOUR ADSENSE CODE GOES HERE

0 comments:

Have any question? Feel Free To Post Below:

Leave your feed back here.

 

Popular Posts

Total Pageviews

Like Us at

WUZ Following

Join People Following WUZ

© 2011. All Rights Reserved | WriteUpZone | Template by Blogger Widgets

Home | About | Top