Histogram plot with plt.hist()

I am a Python-Newbie and want to plot a list of values between -0.2 and 0.2. The list looks like this

[...-0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01489985147131656,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088...and so on].

In statistics I've learned to group my data into classes to get a useful plot for a histogram, which depends on such large data.

How can I add classes in python to my plot?

My code is

plt.hist(data)

and histogram looks like

But it should look like

Topic matplotlib historgram python

Category Data Science


Your histogram is valid, but it has too many bins to be useful.

If you want a number of equally spaced bins, you can simply pass that number through the bins argument of plt.hist, e.g.:

plt.hist(data, bins=10)

If you want your bins to have specific edges, you can pass these as a list to bins:

plt.hist(data, bins=[0, 5, 10, 15, 20, 25, 30, 35, 40, 60, 100])

Finally, you can also specify a method to calculate the bin edges automatically, such as auto (available methods are specified in the documentation of numpy.histogram_bin_edges):

plt.hist(data, bins='auto')

Complete code sample

import matplotlib.pyplot as plt
import numpy as np

# fix the random state for reproducibility
np.random.seed(19680801);

# sum of 2 normal distributions
n = 500;
data = 10 * np.random.randn(n) + 20 * np.random.randn(n) + 20;

# plot histograms with various bins
fig, axs = plt.subplots(1, 3, sharey=True, tight_layout=True, figsize=(9,3));
axs[0].hist(data, bins=10);
axs[1].hist(data, bins=[0, 5, 10, 15, 20, 25, 30, 35, 40, 60, 100]);
axs[2].hist(data, bins='auto');

enter image description here


You have to specify the bin size, if I've figured out the question. As stated here.

You can give a list with the bin boundaries.

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

You can also take a look at here and here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.