How to use pre-trained word2vec model generated by Gensim with Convolutional neural networks (CNN)

Question

How to use pre-trained word2vec model generated by Gensim with Convolutional neural networks (CNN)

Angelica

2021年11月24日 20:27

I have generated a pre-trained word2vec model using the Gensim framework (https://radimrehurek.com/gensim/auto_examples/index.html#documentation). The dataset has 507 sentiments(sentences) which are labeled as positive or negative. After performing all text processing, I used Gensim to generate the pre-trained word2Vec model. the model has 234 unique words with each vector having 300 dimension. However, I have a question.

How can I use the generated word2vec embedding vectors as input to CNN?

Topic text-classification word2vec convnet nlp

Category Data Science

Peter · Accepted Answer · 2021年11月22日 16:19

Keras provides a good example how to load pretrained word embeddings and train a model on it. The tricky part is to load the pretrained embeddings, but this is well explained in the code and can be adopted easily. Also note that you need to load the embeddings in the embedding layer, which must be "frozen" (should not be trainable). This can be achieved by seting trainable=False:

from tensorflow.keras.layers import Embedding

embedding_layer = Embedding(
    num_tokens,
    embedding_dim,
    embeddings_initializer=keras.initializers.Constant(embedding_matrix),
    trainable=False,
)

There are a number of useful resources provided by Keras related to natural language processing (NLP), including examples on semantic similarity (BERT), NER transformers, and sequence-to-sequence learning.

How to use pre-trained word2vec model generated by Gensim with Convolutional neural networks (CNN)

About