Real-Time Outlier/Anomaly Detection?
My data is the usage/playing statistics for players of a specific game. One data point for a user is aggregated statistics for one week. The goal is to be able to detect when the account of the player was stolen/hacked/anything else went wrong. So my idea is for each player to have data points that each represent one week and then check whether the latest week is an outlier in the cluster. If it is - something is wrong with the account.
My question is what algorithm/method would be suitable for such situation? I am well familiar with clustering and things like autoencoders, but this doesn't feel very suited to my problem, because:
- I have few samples for each user, i.e. we can go 25 weeks back so only 25 samples of what is 'right'.
- I don't need outlier detection for all the data, what I need is to tell if the latest sample is an outlier with respect to the other data points.
Currently I have two ideas:
- Dixon's Q-test.
- Simply measuring whether the latest sample is further from the cluster center than all the other samples.
They could work, but they both sound a little 'hacky'. I feel like there should be a more elegant solution for such a relatively simple problem, but my mind is just blanking. What approach would you recommend?
Topic unsupervised-learning anomaly-detection outlier statistics clustering
Category Data Science