Can anyone interpret this Recurrent Network Encoder-Decoder question?

I'm trying to earn some extra credit, so the professor won't elaborate further on what's being asked in this question:

The dataset that we're given is a line-by-line file of protein sequences (something like this: LVPRGSHMASMTGGQQMGRGSMVSSSSSGSDSLLLLSEECLLSASSGSGIQIQICKQIPKDWIYSYQVEEGSDLT)

What on earth is he asking about the encoder-decoder? Aren't these used to encode some information (like an English sentence) and then decode it into some other data (like a Spanish sentence)? What should I be encoding and decoding in this scenario?

Thank you

Topic encoder rnn neural-network machine-learning

Category Data Science


Well, the wording is pretty unclear, but my guess is that he wants you to encode the protein sequence into DNA codons and decode again into a protein and look at the similarity Admittedly, it's a very weird use case for autoencoder since there is a fixed mapping between codons and amino acids, and no real noise to clean I can think of (it would make more sense to me to use it for DNA- Protein and not Protein-DNA-Protein) Anyhow, that's my best guess. It's not unusual in general to reconstruct the same original data with autoencoder, to clean noise or reduce dimensionality. An example of embbedings with proteins can be found here https://github.com/samsledje/Deep_PPI, but take into account that this is not the same task, just for inspiration

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.