Reviewing a paper - common practice

I've been asked to review a paper in which the authors compare their new model (let's call it Model A) to other models (B, C, and D), and conclude theirs is superior on some metric (I know, big surprise!).

Here's the problem: in my research, my supervisors always instructed me to code up the competing models and compare my model that way. The paper I'm reviewing, by contrast, just quotes results from previous literature.

To clarify, here's what I would have had to do if I had been these authors:

  1. Code up model A.
  2. Code up models B, C, and D
  3. Run all models on the data set, and obtain metrics to compare the models.

Whereas this is what the authors did:

  1. Code up model A.
  2. Look up the results in published literature for models B, C, and D on the same data set to obtain metrics.
  3. Run the data through model A, and obtain the metric to compare against models B, C, and D.

Is their method incorrect, or somehow unethical? They make no claims regarding training time.

Topic data-science-model

Category Data Science


In theory, their method is correct as long as the experiment is exactly equivalent:

  • Exact same dataset, same proportion of training data and preferably even exact same training/test data (i.e. same split if there is a split).
  • Identical preprocessing, if there is any preprocessing.
  • Identical methodology with respect to:
    • hyper-parameter tuning, any feature selection, etc.
    • any experimental setup such as number of epochs/iterations for training, etc.
  • (anything else that I may have forgotten...)

Since it's often difficult to make sure in practice that the experimental design is equivalent, you're right that redoing all the other experiments is the safest way to guarantee this equivalence. It also has the additional advantage to reproduce the original results (normally they should be confirmed, but it could happen that they don't).

This is not unethical by itself (btw the ethics and review part of the question is more relevant on AcademiaSE), it's only a methodology which is not optimal.

About your review: if the paper gives all the relevant details and does everything possible to make sure the experimental design is equivalent (showing that the authors understand the potential issue here), I would barely mention this point and not really hold it against them in my evaluation. On the contrary, if they happily compare results neglecting to check that the design is equivalent, I would mention it and count this as a significant limitation of the work (but not necessarily reject only for this reason, depends if the rest of the work is convincing).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.