What exactly are the parameters in GPT-3's 175 billion parameters?

Question

What exactly are the parameters in GPT-3's 175 billion parameters?

user16584277

2021年9月22日 01:28

What exactly are the parameters in GPT-3's 175 billion parameters? Are these the words in text on which model is trained?

Topic openai-gpt nlp

Category Data Science

Derek H · Accepted Answer · 2021年9月22日 01:28

The parameters in GPT-3, like any neural network, are the weights and biases of the layers.

From the following table taken from the GTP-3 paper

there are different versions of GPT-3 of various sizes. The more layers a version has the more parameters it has since it has more weights and biases. Regardless of the model version, the words it was trained on are the 300 billion tokens the caption references with what appears to be around 45TB of data scraped from the internet.

noe · Accepted Answer · 2021年9月21日 06:45

This has been answered in ai.stackexchange.com:

Parameters is a synonym for weights, which is the term most people use for a neural networks parameters (and indeed in my experience it is a term that machine learners will use in general whereas parameters is more often found in statistics literature). Batch size, learning rate etc. are hyper-parameters which basically means they are user specified, whereas weights are what the learning algorithm will learn through training.

What exactly are the parameters in GPT-3's 175 billion parameters?

About