ATTENTION - changed the PCA part - 21/3
Due date: 25/3
The zip file train17.zip contains a PGM image collection of hand written 1 and 7 digits. Each image has 64x64 pixels in the PGM format, where each pixel has value 0 or 1.Each image file has a name in the format X_yyy.BMP.inv.pgm where X is the digit represented in the image.
The file test17.zip contains test images in the same format.
PGM files start with 3 lines:
P2
which are not relevant to us, followed by 64x64 pixels separated by a
blank or a line change. For us, these 64x64 pixels represent the
atributes/dimensions of the data. The class of each file/data is the
digit represented in the file name.
64 64
1
There is a complication regarding the PCA - there are more dimensions than data. PCA that are based on the covariance matrix will not work in these cases (I don't know why). But most implementaitions of PCA is based on the SVD decomposiion of the data - and these will work. So be sure to use a SVD based PCA computation (such as prcomp in R).
How to use the PCA in R:
n number of dimensions to keep
The first line computes teh PCA. The second returns the train data
transformed into the new reduced PCA dimensions - you should have
done something similar in the first exercise. The third line use the
training PCA to transform the test data.
pca<- prcomp(train)
newtrain<-pca$x[,1:n]
newtest<-scale(test,pca$center,pca$scale)%*%pca$rotation[,1:n]