BERT base uncased required gpu ram

I'm working on an NLP task, using BERT, and I have a little doubt about GPU memory.

I already made a model (using DistilBERT) since I had out-of-memory problems with tensorflow on a RTX3090 (24gb gpu's ram, but ~20.5gb usable) with BERT base model.

To make it working, I limited my data to 1.1 milion of sentences in training set (truncating sentences at 128 words), and like 300k in validation, but using an high batch size (256).

Now I have the possibility to retrain the model on a Nvidia A100 (with 40gb gpu's ram), so it's time to use BERT base, and not the distilled version.

My question is, if I reduce the batch size (e.g. from 256 to 64), will I have some possibilities to increase the size of my training data (e.g. from 1.1 to 2-3 milions), the lenght of sentences (e.g. from 128 to 256, or 198) and use the bert base (which has a lot of trainable params more than distilled version) on the 40gb of the A100, or it's probably that I will get an OOM error?

I ask this because I haven't unlimited tries on this cluster, since I'm not alone using it (plus I have to prepare data differently in each case, and it has a quite high size), so I would have an estimation on what could happen.

Topic bert transformer gpu

Category Data Science


As you pointed out in your comments, you pre-tokenized the data and kept in in tensors in GPU memory.

Only the current batch should be loaded in GPU RAM, so you should not need to reduce your training data size (assuming your data loading and training routines are implemented properly). To keep you training data tensor in CPU, you can use with tf.device(...):.

However, take into account that the size of the training data can also be huge for the size of the CPU memory. A typical approach for this is to save the token IDs on disk and then load them from there.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.