How to reshape or clean data to be able to visualize it with violin plots?

Question

How to reshape or clean data to be able to visualize it with violin plots?

zmike

2021年6月26日 02:10

My end goal is to visualize some data using a violin plot or something similar using Python.

I have the following data in a file (test.csv). The first column is a list of species. The other columns determine abundance of the species at a certain latitude (e.g. how abundant is species A at altitude 1000, 2000?). (Ignoring units for now.) How can I plot this as a violin plot (or something similar)?

test.csv

species,1000,2000,3000,4000,5000,6000,7000
species_A,0.5,0.5,,,2,1,2
species_B,0.5,1,0.5,0.5,1,1,10
species_C,1,1,10,3,15,4,5
species_D,15,3,2,1,0.5,1,3

The Python code I tried so far is below. This does not work because it only plots the distribution of altitudes, which is the same for all species (because they were all sampled from the same set of altitudes).

file = test.csv
df = pd.read_csv(file)

# convert columns to list
colnames = list(df.columns)
colnames.remove(species)

# Transform the data so that I have a dataframe with only three columns: species, Altitude, and Count
df = pd.melt(df, id_vars=['species'], value_vars=colnames, value_name=Count, var_name=Altitude)
df.species = df.species.astype('category')
df.Altitude = df.Altitude.astype('int')

# Plot the data
sns.violinplot(x=species, y=Altitude, data=df)
plt.title(Abundance of Species at Various Altitudes)
plt.grid(alpha=0.5, ls=--)
plt.xticks(rotation=90)

# show graph
plt.show()
```

Topic transformation visualization python data-cleaning

Category Data Science

Ben Reiniger · Accepted Answer · 2021年6月26日 02:10

You can make the "ungrouped" dataframe by reindexing on a repeated index:

df_2d = df.loc[df.index.repeat(
    df["Count"].fillna(0).astype(int)
)]

There should be a more direct way to generate a plot, but I don't know it. That your latitudes are discretized might not help.

zmike · Accepted Answer · 2021年6月15日 23:33

I ended up creating a new Pandas DataFrame using the code below. I wash hoping for something simpler or more elegant.

# Create a new dataframe
df_2d = pd.DataFrame()
for _, sp in df.iterrows():
    count = 0 if np.isnan(sp['Count']) else int(np.ceil(sp['Count']))
    df_2d = df_2d.append([{"species": sp["species"], "Altitude": sp["Altitude"]}] * count)

How to reshape or clean data to be able to visualize it with violin plots?

About