How to program conditional statements for this problem in r

Question

How to program conditional statements for this problem in r

n.baes

2021年8月3日 14:42

Situation: I'm trying to program the following in r.

Task: I am trying to select for words that appear as nouns in my dataset more than they do as adjectives, verbs, or adverbs etc. I have all these counts and below is an example of one instance of what I am trying to do. Imagine the information below is in a dataframe. I do not want to select for this lemma (ability), because it appears most times as a VERB; i.e., its appearance as a noun is not greater than VERB or ADJ:

id - (c(4, 4, 4))
lemma - (c(ability, ability, ability))
count_lemma+pos - (21, 66, 89332)
pos - (ADJ, NOUN, VERB)

Action: I tried to start programming the fail below to get to the following logic:

group the data by id
for every row i id, check if pos == NOUN
If not, then delete the row in id
check id for max value
return pos
pos != NOUN, then delete id


#This is my failed attempt at the first step in r:

noun_count_all - ddply(noun_count, .(lemma), function(noun_count) {
  filter1 - filter(noun_count, pos==NOUN)
  #filter2 -
  return(filter1)
} )

Result: Not getting anywhere. If I've written this question incorrectly, sorry about that. Not a programmer or data scientist, I'm just trying to use R to do this thing I can't do in excel.

Topic corpus r

Category Data Science

Oxbowerce · Accepted Answer · 2021年8月3日 14:42

Using dplyr, the following code selects only the rows where the pos column has the value "NOUN" and where the count_lemma+pos is the highest within the group.

library(dplyr)

df %>%
    # group by id
    groupby(id) %>%
    # filter on rows where pos == "NOUN" and count_lemma_pos is the max value within the group
    filter(pos == "NOUN" & count_lemma_pos == max(count_lemma_pos))

How to program conditional statements for this problem in r

About