Sequence to Sequence learning applied to list of numbers
I am looking to apply ML methods to genetic data. My goal is to predict which rare (generally de novo) mutations a person has based on what non-rare (generally inherited) mutations.
I have worked on this mutation data before, and stored the mutation data as one-hot vectors: a person X can have mutation Y zero times, once on chromatid A, once on chromatid B, or once on each chromatid. This is represented as {'0|0', '0|1', '1|0', '1|1'}
.
The target data to predict would be a list of positions in the genome, which are large numbers. This list is of variable size, as not everybody has the same number of rare mutations.
I found this blog post which explains sequence to sequence learning, which looks close to what I would like to do. However, my source data is very different to what they use, and I'm not sure if having a list of numbers as target would work as well as having a list of characters.
Should I try to adapt their code to my problem, or is there a better model architecture that I should use? And if I do adapt this to my case, which major modifications should I start with? (I am fairly new to ML, and for now my applications have all been quite simple)