R : Counting the number of observations per category

I'm currently starting out in R and wondering how to count the number of observations per day, per node, per replicate from the below dataset, and store in a different data set. The original dataset looks like this:

Would like the resulting dataset to look like this:

Can someone help me find out how I could do this in R? Thanks

Topic counts preprocessing dataset r

Category Data Science


You can also do this with the dplyr package. The dplyr package has the functions group_by to group your data by one or more variables and summarise to do some aggregation function. The dplyr package also supports the 'pipe' notation %>%. This notation means the output of the previous function is the first argument of the next function. Here is what it might look like for one of your variables. The dplyr package is also nice in that its function do not require column names to be quoted or in character vectors.

library(dplyr)
my_summary_data <- mydata %>%
    group_by(Replicate) %>%
    summarise(Count = n())      
# The last line creates a new column named Count with a value calculated by n(), 
# which counts observations (rows) per group.

The output looks something like:

my_summary_data
Replicate  Count
1          8
2          7

The group_by function can group by multiple columns, so

my_summary_data <- mydata %>%
    group_by(Replicate, Node) %>%
    summarise(Count = n())      
    

will produce:

Replicate  Node  Count
1          1     5
1          2     3
2          1     7

I like to use the plyr library but there are other ways:

library(plyr)
ddply(mydata, c('Replicate','Node','Day'), nrow)
  • the dd in ddply means that the input is a dataframe and the output is also a dataframe
  • the rows are grouped by the values of columns given as second argument
  • the last argument is the function to apply on every group, in this case nrow to simply count the number of rows in the group.

If you want to name the column in the same time you can do:

library(plyr)
ddply(mydata, c('Replicate','Node','Day'), function(groupDF) {
  data.frame(countObservations=nrow(groupDF))
})

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.