How to reshape or clean data to be able to visualize it with violin plots?
My end goal is to visualize some data using a violin plot or something similar using Python.
I have the following data in a file (test.csv
). The first column is a list of species. The other columns determine abundance of the species at a certain latitude (e.g. how abundant is species A at altitude 1000, 2000?). (Ignoring units for now.) How can I plot this as a violin plot (or something similar)?
test.csv
species,1000,2000,3000,4000,5000,6000,7000
species_A,0.5,0.5,,,2,1,2
species_B,0.5,1,0.5,0.5,1,1,10
species_C,1,1,10,3,15,4,5
species_D,15,3,2,1,0.5,1,3
The Python code I tried so far is below. This does not work because it only plots the distribution of altitudes, which is the same for all species (because they were all sampled from the same set of altitudes).
file = test.csv
df = pd.read_csv(file)
# convert columns to list
colnames = list(df.columns)
colnames.remove(species)
# Transform the data so that I have a dataframe with only three columns: species, Altitude, and Count
df = pd.melt(df, id_vars=['species'], value_vars=colnames, value_name=Count, var_name=Altitude)
df.species = df.species.astype('category')
df.Altitude = df.Altitude.astype('int')
# Plot the data
sns.violinplot(x=species, y=Altitude, data=df)
plt.title(Abundance of Species at Various Altitudes)
plt.grid(alpha=0.5, ls=--)
plt.xticks(rotation=90)
# show graph
plt.show()
```
Topic transformation visualization python data-cleaning
Category Data Science