A certain convolutional operation in a neural network receives a color image of shape 256 × 256 × 3 and outputs a tensor of shape 256 × 256 × 60 using 5 × 5 filters with a stride of 1 and padding to conserve the image size. In other words, 3 channels are transformed into 60 channels.

How many learnable scalar parameters are introduced by this operation? How many learnable scalar parameters we would have in a dense layer with the same input and output sizes?

A.:

This is a ballpark answer ignoring biases, and assuming "same" padding. For the convolutional operation, each filter is a cube of dimension 5 × 5 × 3 (notice the input channels), and there are 60 filters, for a total of 4500 learning parameters.

A dense layer with the same input and output sizes would have (256 × 256 × 3) × (256 × 256 × 60) parameters, that is, several orders of magnitude higher.