1. In general, best results are achieved during training if we:

    1. gradually decrease the learning rate
    2. keep the learning rate constant
    3. gradually increase the learning rate

  2. A.:

    When using SGD, it is necessary to gradually decrease the learning rate over time, to overcome the source of noise caused by the random sampling of a mini-batch. Whereas the true gradient becomes small and then 0 when we approach and reach a minimum, the estimated (stochastic) gradient suffers from this problem.