1. Explain why dropout can be seen as an ensemble method using an exponential number of networks.

  2. A.:

    During training, dropout samples from an exponential number of networks, formed by dropping at random a number of units of the original network, and keeping the rest. In a network with n input and hidden units, the number of such "subset" networks is 2n. At test time, an estimation of the average over all these networks is used, as in an ensemble method.