BERT base uncased required gpu ram
I'm working on an NLP task, using BERT, and I have a little doubt about GPU memory.
I already made a model (using DistilBERT) since I had out-of-memory problems with tensorflow on a RTX3090 (24gb gpu's ram, but ~20.5gb usable) with BERT base model.
To make it working, I limited my data to 1.1 milion of sentences in training set (truncating sentences at 128 words), and like 300k in validation, but using an high batch size (256).
Now I have the possibility to retrain the model on a Nvidia A100 (with 40gb gpu's ram), so it's time to use BERT base, and not the distilled version.
My question is, if I reduce the batch size (e.g. from 256 to 64), will I have some possibilities to increase the size of my training data (e.g. from 1.1 to 2-3 milions), the lenght of sentences (e.g. from 128 to 256, or 198) and use the bert base (which has a lot of trainable params more than distilled version) on the 40gb of the A100, or it's probably that I will get an OOM error?
I ask this because I haven't unlimited tries on this cluster, since I'm not alone using it (plus I have to prepare data differently in each case, and it has a quite high size), so I would have an estimation on what could happen.
Topic bert transformer gpu
Category Data Science