Computer science corpus for training a language model
I am looking for a domain specific computer science corpus of at least 20M words (preferable >50M words), for the purpose of training a language model in it.
Is there anything out-of-the box that I could use? *I tried to look for the sciBERT corpus, can not find how to access it.
Thanks!
Topic corpus text text-mining nlp data-mining
Category Data Science