What exactly are the parameters in GPT-3's 175 billion parameters?
What exactly are the parameters in GPT-3's 175 billion parameters? Are these the words in text on which model is trained?
Topic openai-gpt nlp
Category Data Science
What exactly are the parameters in GPT-3's 175 billion parameters? Are these the words in text on which model is trained?
Topic openai-gpt nlp
Category Data Science
The parameters in GPT-3, like any neural network, are the weights and biases of the layers.
From the following table taken from the GTP-3 paper
there are different versions of GPT-3 of various sizes. The more layers a version has the more parameters it has since it has more weights and biases. Regardless of the model version, the words it was trained on are the 300 billion tokens the caption references with what appears to be around 45TB of data scraped from the internet.
This has been answered in ai.stackexchange.com:
Parameters is a synonym for weights, which is the term most people use for a neural networks parameters (and indeed in my experience it is a term that machine learners will use in general whereas parameters is more often found in statistics literature). Batch size, learning rate etc. are hyper-parameters which basically means they are user specified, whereas weights are what the learning algorithm will learn through training.
Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.