1. What is the difference between gradient descent and stochastic gradient descent?

  2. A.:

    The difference is that in gradient descent you compute the derivative of the cost function (and that requires a sum over all training points), whereas in stochastic gradient descent you ESTIMATE the derivative (gradient), by using a small subset of the training points.