BPE vs WordPiece Tokenization - when to use / which?
What's the general tradeoff between choosing BPE vs WordPiece Tokenization? When is one preferable to the other? Are there any differences in model performance between the two? I'm looking for a general overall answer, backed up with specific examples.
Topic transformer sentiment-analysis machine-translation nlp machine-learning
Category Data Science