What is the difference between batch_encode_plus() and encode_plus()

I am doing a project using T5 Transformer. I have read documentations related to T5 Transformer model. While using T5Tokenizer I am kind of confused with tokenizing my sentences.

Can someone please help me understand the difference between batch_encode_plus() and encode_plus() and when should I use either of the tokenizers.

Topic transformer nlg transfer-learning nlp

Category Data Science


See also the huggingface documentation, but as the name suggests batch_encode_plus tokenizes a batch of (pairs of) sequences whereas encode_plus tokenizes just a single sequence. Looking at the documentation both of these methods are deprecated and you use __call__ instead, which checks by itself if the inputs are batched or not and calls the correct method (see the source code with the is_batched variable and if statement).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.