The function you are looking for is gather
from the tidyr
package. This function takes a wide data.frame and makes it a long data.frame. gather
is easy to use:
library(dplyr)
library(tidyr)
# Building a sample data.frame like your example data.
df <- data.frame(Boardgame = c("Game1", "Game2", "Game3", "Game4", "Game5"),
categorie1 = c("Deduction", "Deduction", "Card Game", "Horror", "Horror"),
categorie2 = c("Medieval", "Medieval", "Medieval", NA, "Medieval"),
categorie3 = c("Word Game", NA, "Zombies", NA, "Zombies"),
stringsAsFactors = FALSE)
# Using gather from tidyr and some dplyr functions:
df %>%
gather(key = "Cat_Label", # Key is the name of the column that will holdthe old column names
value = "Categorie", # Value is the name of the column that will hold the data
-Boardgame, # Ignore the Boardgame column, use every other column
na.rm = TRUE) %>% # Remove NA values
arrange(Boardgame) %>% # Sort by Boardgame
select(-Cat_Label) # Remove the unneeded Cat_Label column (if you want)
# Results:
Boardgame Categorie
1 Game1 Deduction
2 Game1 Medieval
3 Game1 Word Game
4 Game2 Deduction
5 Game2 Medieval
6 Game3 Card Game
7 Game3 Medieval
8 Game3 Zombies
9 Game4 Horror
10 Game5 Horror
11 Game5 Medieval
12 Game5 Zombies
The -boardgame
notation in the gather
function means that every column but Boardgame
will be gathered, even if you have 200 categorie columns. Once you have your data.frame, you can use ggplot2
to visualize the Categorie
column as you see fit.
It should be noted that development of gather
by the Tidyverse team is complete. A new, more general function pivot_longer
has been implemented to replace gather
. It has similar usage in a simple case, but the arguments are a little different:
df %>%
pivot_longer(cols = -Boardgame, # You now explicitly declare the columns
names_to = "Cat_Label", # New column containing old column names
values_to = "Categorie", # New column containing old column values
values_drop_na = TRUE) # Remove NA values