Is it possible to automate generating reproducibility documentation?
First, think it's worth me stating what I mean by replication reproducibility:
- Replication of analysis A results in an exact copy of all inputs and processes that are supply and result in incidental outputs in analysis B.
- Reproducibility of analysis A results in inputs, processes, and outputs that are semantically incidental to analysis A, without access to the exact inputs and processes.
Putting aside how easy it might be to replicate a given build, especially an ad-hoc one, to me replication always possible if it's planned for and worth doing. That said, it is unclear to me is how to execute a data science workflow that allows for reproducibility.
The closet comparison I'm able to think of is documentation generators that generates software documentation intended for programmers - though the main difference I see is that in theory, if two sets of analysis ran the "reproducibility documentation generators" the documentation should match.
Another issue, is that while I get the concept of reproducibility documentation, I am having a hard time imagining what it would look like in usable form without just being a guide to replicating the analysis.
Lastly, whole intent of this is to understand if it's possible to "bake-in" reproducibility documentation as you build out a stack, not after the stack is built.
So, Is it possible to automate generating reproducibility documentation, and if so how, and what would it look like?
UPDATE: Please note that this is the second draft of this question and that Christopher Louden was kind enough to let me edit the question after I realized it was likely the first draft was unclear. Thanks!
Topic processing
Category Data Science