Finding a pattern in reponses in R programming

I did a study in which around 1000 participants took a test (100 questions). In this study participants were asked in each question to choose between two texts (text 1 and text 2) and decide which text is easier for them. Now in R I want to check if there are any participants who followed a pattern. For example, he or she have only chosen texts 1 or text 2. I also want to examine response string screening for participants that alternated left/right/left/right etc... 20, 30, 40, 50, 60, 70, 80, 90, 100 times in a row. Can anyone help me that how I can do it in R?

Topic pattern-recognition r

Category Data Science


Generally I would suggest to look for differences with the general pattern, the general pattern being the answers from most users. A very basic way to do that:

  • Calculate for every question the proportion of text 1, store it as a vector. This distribution is the "mean vector".
  • For every player represent the vector of their answers: for every question the value is 1 if they chose text 1, 0 if they chose text 2.
  • For every player compare the vector of their answer with the "mean vector", for instance with cosine similarity (or any other distance/similarity measure).

Then look at the distribution of the similarity scores: normally most players have a high similarity scores, so if some of them have very low similarity it's likely that they didn't do the test seriously.

If you want a more advanced method, you could cluster the players' vectors or do anomaly detection.


[edit based on comment]

  • Take the proportion of "text 1" for every participant and compare to the mean proportion across all participants. This would catch the most obvious outliers, but probably not any subtle pattern.
  • Measure how much correlation there is between the question number and the answer: normally the two are independent so the correlation should be very low, if not there is pattern. Note that I think there would be a better statistical test than Pearson correlation, but I don't remember which one would suit this case (Spearman correlation might be relevant, I'm not sure).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.