How can I learn and apply the scientific method in machine learning?

Rigor Theory. I wish to learn the scientific method and how to apply it in machine learning. Specifically, how to verify that a model captured the pattern in data; how to rigorously reach conclusions based on well-justified empirical evidence.

Verification in Practice. My colleagues in both academia and industry tell me measuring the accuracy of the model on testing data is sufficient, but I don't feel confident such criteria are sufficient.

Data Science Books. I have picked up multiple data science books, like Skiena's manual, Dell EMC's book, and Waikato's data mining. Even though there had been a section for diagnosing the model and measuring results, my instinct worries are these are heuristics, but not rigour-based.

Scientific Method Books. Searching for the scientific method I found, Statistics and Scientific Method: An Introduction for Students and Researchers and Principles of Scientific Methods, which seem to answer the crux of my question. I am planning to study both of them.

My Questions. Here are couple of questions I hope to gain guidance on, from your wonderful community.

  • Is it feasible to rigorously apply the scientific method in machine learning applications like recommendation engines or social sciences, or is it the case that so far our scientific/technological advancement didn't reach that degree of maturity, and that the best we can hope for is heuristics-based approximations.
  • Is it feasible to do machine learning in practical industry, by applying the scientific method, or is it the case that industry leaders prefer cheap heuristics in order to minimize a project's costs?
  • Are the scientific method books I mentioned above useful for enhancing my own skills in machine learning? Are they worthwhile the effort and time?
  • Are you aware of better alternative resources for learning the scientific method? Are there more helpful courses or recorded lectures?
  • Do you have any recommendations or advise, while studying the scientific method, for someone who is mainly motivated by machine learning in industry like recommendation engines applications and logistical optimization?

Topic methodology methods

Category Data Science


I think it's an excellent idea to acquire and apply a solid scientific basis in general and in ML in particular. Here are a few comments:

  • The scientific method is a general set of good principles for obtaining reliable scientific conclusions. It's not especially precise and it's not always clear if it's applied correctly in a specific case, there's no clear binary way to say whether a study satisfies the scientific method or not.
  • Mind that what is considered scientifically valid evolves over time. For example, various shortcomings of using significance tests have been demonstrated recently.
  • In science and in statistics in particular, the main scientific point is usually not to prove something with 100% confidence (virtually never possible) but to quantify the confidence in some reliable way. A common mistake is to expect a ML prediction to be 100% reliable: of course this is not possible (it wouldn't be statistical learning otherwise), but it is possible to measure how likely a prediction is correct (to some extent).
  • So the level of maturity of the field doesn't matter: solid scientific principles can be applied to any existing method. This even applies to heuristics: it's just a matter of estimating their performance reliably.
  • Last point: let's not forget that the scientific method is not a rigid set of formal rules to apply systematically, it's also about questioning whether a particular dataset or approach is adapted to the goal. Another common mistake I see is to apply a specific kind of evaluation to a system without thinking whether this evaluation truly measures the task that the system is supposed to do.

(I don't have any specific book recommendation)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.