Data transformations in hierarchical classification

Question

Data transformations in hierarchical classification

matentzn

2022年4月11日 19:00

I am building a hierarchical text classifier using the Local Classifier Per Parent Node (LCPN) approach with the 'siblings' policy as described in the A survey of hierarchical classification across different application domains:

E.g. if we have the classes 1.1, 1.2, 2.1, 2.2, 2.3 then in the first level we use all the training set to train a classifier to distinguish between class 1 (1.1,1.2) and 2 (2.1,2.2,2.3), at the second level we use two multiclass classifier the first one to classify between 1.1 and 1.2 using as training set only the data belonging to these classes and the second classifier for the rest.

Should any data transformation (e.g. scaling, tf-idf) that we do to the data happen at each level of the classifier? I.e. since at the first level the tf-idf vectors are created by fitting to the whole training set, can we use them at the second level or should we fit to the new training subsets?

Topic text multiclass-classification classification

Category Data Science

Brian Spiering · Accepted Answer · 2020年7月9日 14:35

It is generally best practice to perform all feature engineering before applying classifiers.

The two primary reasons are:

Simplicity - If feature engineering is conditional on model performance then it is harder to find and debug edge cases.
Handle of sample issues - Especially in text, there are novel examples (e.g., words that appear during prediction that do not appear during training). Applying feature engineering to as much as possible increases the robustness of the transforms.

Noah Weber · Accepted Answer · 2019年12月20日 13:09

It depends on the dataset, but generally fit again

why? If you dont fit again on the second level when classifying 1.1 and 1.2 you are introducing bias that you got from the first level when you classified between classes 1 and 2.

why it depends? if information is intertwined between all of the parent and children classes and you will use these models again in the future, you could be loosing important information when fitting again, in other words you will be only over-fitting on the current train (classify 1.1 1.2)

Data transformations in hierarchical classification

About