How Pretraining part actually work in Wav2vec models? Which data is qualify to be the adequat for fine-tuning part the model of speech2text
Pretraining and fine-tuning the algorithm of wav2vec2.0, the new one using in FAcebookAI to do speech to text for low-resource language.
I didn't actually get how the model does the pretraining part if someone can help me, I read the article https://arxiv.org/abs/2006.11477 but I ended up not getting the notion of pre-train in this regard. the question is HOW do we do pretraining?!
Note : i'm a beginner in ML, so far , i've done some project with nlp,I have an idea on Transformers but no work done on that. The simpler the answer the easier for me to understand , Thanks!