Optimisation of neural networks
Do neural networks get optimized by trial and error, by data scientists, or is there some way of optimizing values through accurate mathematical equations?
Do neural networks get optimized by trial and error, by data scientists, or is there some way of optimizing values through accurate mathematical equations?
Neural networks are trained with a mix of mathematical optimization and trial and error exploration:
Neural networks are comprised of trainable parameters. These parameters are trained with some variant of stochastic gradient descent (SGD). Trainable parameters include those in dense layers, convolutional layers, attention layers, LSTMs, etc.
There are other aspects of neural networks that cannot be trained but are equally decisive in the performance of the network. They are known as hyperparameters. Some of these hyperparameters are the number and size of filters in convolutional layers, the number of layers, the dimensionality of embeddings, etc. To decide which hyperparameters values are optimal, you either choose them "by intuition", or explore different combinations of values and check which combination performs best (e.g. random search, grid search), or apply some sort of black-box optimization (Bayesian optimization, genetic algorithms, etc).
The most often used method of optimising neural networks is a process called (stochastic) gradient descent. You provide the network the inputs and expected output, during training the model outputs get compared to the expected output. The difference between the two is what is called the error or loss. Based on how wrong or right the network is, you can calculate how should adjust the parameters/internal weights of the model to lower the error of the model. A more in-depth explanation of gradient descent (and neural networks in general) can be found on this site.
Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.