Starting with normalizing our input features, why do we really need normalizing the input features on the first place? The simple answer is we normalize the input features so that the cost function is elongated so when the gradient descent is run, it makes it easier and faster to find its way to the minimum.
Read More »
Tags:
exploding gradient,
gradient,
normalization,
vanishing gradient,
variance