BERT Optimization for Production

Question

BERT Optimization for Production

Mohy Mohamed

2021年7月9日 09:34

I'm using BERT to transform text into 768 dim vector, It's multilingual :

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')

Now i want to put the model into production but the embedding time is too much and i want to reduce and optimize the model to reduce the embedding time What are the libraries that enable me to do this ?

Topic semantic-similarity bert transformer nlp

Category Data Science

simone · Accepted Answer · 2021年7月9日 09:34

you can start by using torchscript, it may require changing ur whole code, and switching to transformers( by loading the backbone of the model and the last layers) so basically u get out from GIL interpreter, coz it does not support multithreading. by with torchscript u can run ur model in c++ env, there's also onnx which I believe it enhances performance.

if ur use case is not a real-time and you are using an API, you can use a queue mechanism like rabbitmq

BERT Optimization for Production

About