What is the main effect to be expected on the weights of a deep network when we use:
L
2
regularization?
L
1
regularization?
A.:
With L
2
regularization, we expect small weights in absolute value.
With L
1
regularization, we expect sparse weights, that is, many of them equal to zero.