Multiclass classification oob error
Im implementing a random forrest for a 6 class classification and witnessing a strange phenomenon.
I have 10 percent of my set sectioned out as a pseudo validation set. Im training 50 percent of the training items (training items being 90 percent of the whole set) per tree randomly selected.
Now my oob error is almost the mirror image of my validation error. Im using averaged f1 error (ie average of the f1 error per class). As more trees are built oob error increases while the validation error decreases.
Would you guess this deserves looking closer at the oob error calculation, or, should this resolve with some parameter optimisation.
Im guessing (hopefully) that this is a result of using weights to select the items in training sample and as time goes on the oob samples are more filled with now accuratly predicted items and the rules coming out are rules built on the more exceptional items. The error does stabilise though is well worse than its initial value after 20 trees.
But having said that i havnt come across any refrences where its noted that generalisability would increase even though oob error increases. But i am seeing this as whats happening which makes me feel there is an error in the oob implementation.
So has anyone come acros this problem in multiclass classification, i do see the weights drift quite alot so when oob is best the predictor is 65 percent of items are one class when the true values are 16 to 20 percent each. So maybe averaged f1 error is the issue?