BERT in production

Question

BERT in production

illuminato

2020年4月10日 04:02

I've created a BERT model. What are the ways to do the deployment of this model? Is it possible to use it with Spark, Hadoop or Docker?

Topic bert apache-spark apache-hadoop

Category Data Science

Sean Owen · Accepted Answer · 2020年3月10日 20:43

You can just apply it with Spark. There is no reason you can't use Pytorch in a Spark job; just add it as a dependency when you submit the job. Spark's pandas UDFs can be pretty useful for scoring large models as they let you score in mini batches. See https://spark.apache.org/docs/3.0.0-preview/sql-pyspark-pandas-with-arrow.html#scalar-iterator

One complication is that you can use GPUs in Spark 2.x, but can't allocate GPUs as resources. So you may have multiple tasks on one GPU, and need to tune a little bit to reduce contention. Spark 3 however will have GPU resource allocation.

Hadoop isn't a thing that runs computations, unless you mean MapReduce, which is obsolete, or if you mean Spark, which is above.

Docker is also an option; just bottle up your scoring code and run on a cluster. You don't really get the same help with data movement and access that you would in Spark; it's all up to you. But sure that can work.

BERT in production

About