How to program conditional statements for this problem in r
Situation: I'm trying to program the following in r.
Task: I am trying to select for words that appear as nouns in my dataset more than they do as adjectives, verbs, or adverbs etc. I have all these counts and below is an example of one instance of what I am trying to do. Imagine the information below is in a dataframe. I do not want to select for this lemma (ability), because it appears most times as a VERB; i.e., its appearance as a noun is not greater than VERB or ADJ:
id - (c(4, 4, 4))
lemma - (c(ability, ability, ability))
count_lemma+pos - (21, 66, 89332)
pos - (ADJ, NOUN, VERB)
Action: I tried to start programming the fail below to get to the following logic:
- group the data by id
- for every row i id, check if pos == NOUN
- If not, then delete the row in id
- check id for max value
- return pos
- pos != NOUN, then delete id
#This is my failed attempt at the first step in r:
noun_count_all - ddply(noun_count, .(lemma), function(noun_count) {
filter1 - filter(noun_count, pos==NOUN)
#filter2 -
return(filter1)
} )
Result: Not getting anywhere. If I've written this question incorrectly, sorry about that. Not a programmer or data scientist, I'm just trying to use R to do this thing I can't do in excel.
Category Data Science