Data transformations in hierarchical classification
I am building a hierarchical text classifier using the Local Classifier Per Parent Node (LCPN) approach with the 'siblings' policy as described in the A survey of hierarchical classification across different application domains:
E.g. if we have the classes 1.1, 1.2, 2.1, 2.2, 2.3 then in the first level we use all the training set to train a classifier to distinguish between class 1 (1.1,1.2) and 2 (2.1,2.2,2.3), at the second level we use two multiclass classifier the first one to classify between 1.1 and 1.2 using as training set only the data belonging to these classes and the second classifier for the rest.
Should any data transformation (e.g. scaling, tf-idf) that we do to the data happen at each level of the classifier? I.e. since at the first level the tf-idf vectors are created by fitting to the whole training set, can we use them at the second level or should we fit to the new training subsets?
Topic text multiclass-classification classification
Category Data Science