Difference between a log scale and linear scale : np.random.rand()

How and why will a linear np.random.rand() (to generate a linear scale between 0.0001 and 1) not result in better distributed result but np.power(10,-4 * np.random.rand()) (log scale) will.
I am just accepting the below explanation as it is, But I can't understand the reason behind it.(simple math, i believe, which i am missing. Kindly help with some example)


Reference to my query:
Using an Appropriate Scale to Pick Hyperparameters

To understand this, consider the number of hidden units hyperparameter. The range we are interested in is from 50 to 100. We can use a grid which contains values between 50 and 100 and use that to find the best value:

Now consider the learning rate with a range between 0.0001 and 1. If we draw a number line with these extreme values and sample the values uniformly at random, around 90% of the values will fall between 0.1 to 1. In other words, we are using 90% resources to search between 0.1 to 1, and only 10% to search between 0.0001 to 0.1. This does not look correct! Instead, we can use a log scale to choose the values:

This query is from Andrew Ng's Specialization Course on Deep Learning - Coursera
Course 2 Week 3

Topic coursera hyperparameter-tuning deep-learning

Category Data Science


Let us assume you want to do hyperparameteroptimization with a hyperparameter $h\in[0,1]$. Let us additionally assume that we want to test hundred possible values for the parameter.

If you choose a linear scale your parameter values will be uniformly selected form $[0,10^4]$. It will be very unlikely that you will small parameter values smaller than $10^{-4}$, because it is very unlikely to choose a number from this interval. The probability for randomly choosing a number from this interval is given by the following fraction

$$\dfrac{10^{-4}}{10^4}=10^{-8}.$$

Now if we choose our random numbers from the intervall $[-5,5]$ and use this as the power of the number $10$ we will stretch the interval onto an exponential scale which will make it more likely that we have numbers in $[0,10^{-4}]$ because the chance of choosing a negative number for the exponent is much more likely. The probability will be $1/10$, because the wanted interval from $[-5,-4]$ has a width of $1$ and the whole interval from $[-5,5]$ has the width $10$.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.