How to program conditional statements for this problem in r

Situation: I'm trying to program the following in r.

Task: I am trying to select for words that appear as nouns in my dataset more than they do as adjectives, verbs, or adverbs etc. I have all these counts and below is an example of one instance of what I am trying to do. Imagine the information below is in a dataframe. I do not want to select for this lemma (ability), because it appears most times as a VERB; i.e., its appearance as a noun is not greater than VERB or ADJ:

id - (c(4, 4, 4))
lemma - (c(ability, ability, ability))
count_lemma+pos - (21, 66, 89332)
pos - (ADJ, NOUN, VERB) 

Action: I tried to start programming the fail below to get to the following logic:

  1. group the data by id
  2. for every row i id, check if pos == NOUN
  3. If not, then delete the row in id
  4. check id for max value
  5. return pos
  6. pos != NOUN, then delete id

#This is my failed attempt at the first step in r:

noun_count_all - ddply(noun_count, .(lemma), function(noun_count) {
  filter1 - filter(noun_count, pos==NOUN)
  #filter2 -
  return(filter1)
} )

Result: Not getting anywhere. If I've written this question incorrectly, sorry about that. Not a programmer or data scientist, I'm just trying to use R to do this thing I can't do in excel.

Topic corpus r

Category Data Science


Using dplyr, the following code selects only the rows where the pos column has the value "NOUN" and where the count_lemma+pos is the highest within the group.

library(dplyr)

df %>%
    # group by id
    groupby(id) %>%
    # filter on rows where pos == "NOUN" and count_lemma_pos is the max value within the group
    filter(pos == "NOUN" & count_lemma_pos == max(count_lemma_pos))

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.