BERT Optimization for Production

I'm using BERT to transform text into 768 dim vector, It's multilingual :

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2') 

Now i want to put the model into production but the embedding time is too much and i want to reduce and optimize the model to reduce the embedding time What are the libraries that enable me to do this ?

Topic semantic-similarity bert transformer nlp

Category Data Science


you can start by using torchscript, it may require changing ur whole code, and switching to transformers( by loading the backbone of the model and the last layers) so basically u get out from GIL interpreter, coz it does not support multithreading. by with torchscript u can run ur model in c++ env, there's also onnx which I believe it enhances performance.

if ur use case is not a real-time and you are using an API, you can use a queue mechanism like rabbitmq

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.