Discuss the role, importance, and differences between training set and test set.

A.:

To successfully conduct a supervised machine learning experiment, one needs a traning set with labeled examples from which the algorithm will learn, and a separate test set where its performance will be evaluated in examples not previously seen.

It is important to ensure that the test set is disjoint from the training set, to avoid biases in the experiment. Their sizes are often different, with the training set usually accounting for 60% to 90% of the total cases.

The two sets have to be drawn from the same probability distribution, so that it is reasonable to expect that we can generalize to the test set the knowledge acquired with the trainign set.