What is the pooled output when using tensorflows implementation of BERT for text classification (multiple sentences)
I stumbled upon different sources that state that each sentence starts with a CLS token when passed to BERT. I'm passing text documents with multiple sentences to BERT. This would mean that for each sentence, I would have one CLS token.
Pooled output is however only returning a vector of size hidden state. Does this mean that all CLS tokens are somehow compressed to one (averaging?)? Or does my text document only contain one single CLS token for the whole input sequence?
Topic bert sentiment-analysis
Category Data Science