Given the activation in the
output layer
, what is the recommended loss function in each case below?
Activation
Loss function
sigmoid
A.:
cross-entropy
softmax
A.:
cross-entropy
none (linear)
A.:
MSE