Dropout does not actually removes neurons, its just that those particular neurons don't play any role (don't get activated) for the given batch of data.
Example - Suppose there is a road of 8 lanes - When Trucks come, they pass through lanes 1,2,4,6,7, when Cars come, they pass through lanes 2,3,4,7,8 and when Bikes come, they pass through lanes 1,2,5,8. So regardless of any vehicle, all lanes are there, but only some of them are used.
Similarly, all neurons are used in whole model, but only a subset of neurons are activated for a particular batch of data. And the model is not cut down later, the model complexity remains as it is.
Why to use dropout?
As given in Deep learning book by Ian Goodfellow,
dropout is more effective than other
standard computationally inexpensive regularizers, such as weight decay, filter norm constraints and sparse activity regularization.
He also says-
One advantage of dropout is that it is very computationally cheap.
Another significant advantage of dropout is that it does not significantly limit the type of model or training procedure that can be used. It works well with nearly any model that uses a distributed representation and can be trained with stochastic gradient descent. This includes feedforward neural networks, probabilistic models such as restricted Boltzmann machines (Srivastava et al., 2014), and recurrent neural networks (Bayer and Osendorfer, 2014; Pascanu et al., 2014a).
This book says-
The core idea is that introducing noise in the output values of a layer can
break up happenstance patterns that aren’t significant, which the network will start memorizing if no noise is present.