Does BERT has any advantage over GPT3?

Question

Does BERT has any advantage over GPT3?

Bipin

2021年1月14日 03:39

I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better than GPT-3 in any particular area of NLP?

It's quite interesting to note that OpenAI's GPT-3 is not open-sourced whereas tech behemoth Google's BERT is open-sourced. I felt OpenAI's stance and the hefty price tag for GPT-3 api is in stark contrast to its mission statement(OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity).

https://analyticsindiamag.com/gpt-3-vs-bert-for-nlp-tasks/ https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/ https://medium.com/towards-artificial-intelligence/gpt-3-from-openai-is-here-and-its-a-monster-f0ab164ea2f8

Topic openai-gpt bert nlp

Category Data Science

MWB · Accepted Answer · 2021年1月14日 03:39

BERT needs to be fine-tuned to do what you want.

GPT-3 cannot be fine-tuned (even if you had access to the actual weights, fine-tuning it would be very expensive)

If you have enough data for fine-tuning, then per unit of compute (i.e. inference cost), you'll probably get much better performance out of BERT.

Langley · Accepted Answer · 2020年12月23日 13:48

This article on Medium introduces GPT-3 makes some comparisons with BERT.

Specifically, section 4 examines how GPT-3 and BERT differ and mentions that: "On the Architecture dimension, BERT still holds the edge. It’ s trained-on challenges which are better able to capture the latent relationship between text in different problem contexts."

Also, in section 6 from the article, author lists areas where GPT-3 struggles. It may be that BERT and other bi-directional encoder/transformers may do better, although I have no data/references to support this yet.

Does BERT has any advantage over GPT3?

About