In general, best results are achieved during training if we:
1. gradually decrease the learning rate
2. keep the learning rate constant
3. gradually increase the learning rate

A.:

When using SGD, it is necessary to gradually decrease the learning rate over time, to overcome the source of noise caused by the random sampling of a mini-batch. Whereas the true gradient becomes small and then 0 when we approach and reach a minimum, the estimated (stochastic) gradient suffers from this problem.