Why does my char level Keras tokenizer add spaces when converting sequences to texts?
I create a tokenizer with
import tf
tokenizer = tf.keras.preprocessing.text.Tokenizer(split='', char_level=True, ...)
tokenizer.fit_to_texts(...)
But when I convert sequences of tokens to texts, the result contains a space after each character (except for the last one):
test_text = 'this is a test'
seq = tokenizer.texts_to_sequences([test_text])
r = tokenizer.sequences_to_texts(seq)[0]
assert(r == ''.join([ c+' ' for c in test_text ])[:-1])
Is there a way to avoid this added spaces? Am I missing some configuration parameter?
Topic tokenization python-3.x keras
Category Data Science