Suggestion on practice to model and dataset version documentation

Question

Suggestion on practice to model and dataset version documentation

Hing

2022年1月4日 08:42

I want to steer my question towards the practical side of ML. As a practitioner, I feel keeping different versions of models and datasets is difficult. From time to time I need to revisit my data and model code to verify if certain assumptions are ensured/implemented, which becomes difficult when the the number of runs/experiments increase exponentially.

Thus, I would to hear some advice from senior practitioners about how you version your things (data/model code)? I know you must tell me to use git-like things, but I guess my concern is not about which tools to use. But how to document properly things I have done for later retrieval and quick reference without going through all the code in my current repository for affirmation.

Shortly put, what are the suggestions for documenting data and model code? In ML projects, changes are so rapid across experiments and soon I easily lose track without proper documentation. But the rate of changing also lead to fresh documentation becoming obsolete at an unexpectedly fast rate, leading to the question of the existence of the documentation at all.

All these confuse me. Please advise, thank you!

Topic version-control

Category Data Science

Suggestion on practice to model and dataset version documentation

About