How to conclude the generality of any classification methods?
Suppose a classification task A, and there exist a lot of methods $M_1, M_2, M_3$. The task $A$ is measured by a consistent measure. For instance, the task A can be a binary classification. In this case, F-score, ROC curve can be used.
I did a survey on some research are and found that
- $M_1$ is evaluated with dataset $D_1$ (open) using pre-processing $P_1$ only (seems the seminal work).
- $M_2$ is evaluated with dataset $D_1$ (open), $D_2$ (private) and compared with $M_1$, claiming $M_2$ has more accurate result, but using different data pre-processing $P_2$.
- $M_3$ proposes new way with dataset $D_3$ (private) and did not provide any comparison against $M_2$ and $M_1$
I'm trying to work on this area, but there are lots of inconsistency. None of the methods are validated with validation data. They just used train and test data. I think some parameters are tuned for test dataset although authors do not claim so. Since this field is not a data science-oriented and the amount of dataset is few, this may happen.
Which method can we consider as a state-of-the-art?
How can we conclude the generality of each of the method?
Topic data preprocessing research
Category Data Science