Normal vs Uniform Distribution for machine learning
I have a dataset that follows Zipf's law such that the majority of the values are concentrated at one end, with the remaining items containing a very small percentage. Training on the dataset as is would introduce a bias, and thus I was thinking of restructuring the data to fall into buckets. Thus my model would be a multi-class classification model, rather than a regression model (I am training a NN).
My question is whether I should draw up the buckets such that the distribution of the items is uniform, or normal. A uniform distribution would ensure that the NN has the same amount of examples of each bucket, however, some say that normal distributions work better for machine learning. Which one should I use?
Thanks
Topic distribution data machine-learning
Category Data Science