Find recurrent dates in a small set (and get rid of non recurrent ones)

I need help in the analyse of a categorization problem.

Given a set of dates (small set: 20 elements maximum), I would like to group dates which are equally distributed (with a tolerance). It can be, for instance, monthly or weekly separated dates.

Here is an example. Given this repartition :

I would like to categorize into these two groups:

The problem is that I am a developer, not a data scientist. I have an intuition that it should be possible to do a kind of regression.

I have no clue how to analyse this problem. Can you help me with that, please ?

Cheers

PS: I have already seen this thread (Recurring events - finding in a time series) but I have not helped me.

Topic regression classification time-series

Category Data Science


I don't think this is a problem to which machine learning is the answer. I can't think of any kind of clustering that would work here. My instinctive approach would be to remove the trend of the data and then use a fourier transform to assess the recurrent frequencies. It should then be reasonably straightforward to classify the points as being part of the patterns that are identified there, and everything else can be dropped into an "other" bucket.


You can use clustering algorithm to cluster closer dates together. But since you've mentioned the number of dates to be clustered won't be more than 20, seems like you can just create a simple logic to group them together.

Pick a base date which can be anything and find the num of days/weeks/months from the base date to each date in your dataset. you'll get a bunch of numbers now. You can now bucket them together according to a threshold you like.

Although clustering algorithm too would do the same. Just the thresholding would be be taken care of automatically based on optimal cutoff. Try the simplest (read: easy to understand) clustering algorithm: K-Means.


If it is a categorization problem then you should look for a classification algorithm, not a regression technique. The simplest classification algorithm is Logistic Regression.

But by the looks of it, seems like you do not have a labelled data-set and if that's the case you should look for Clustering techniques. Clustering is a part of Unsupervised Learning Technique in ML which create clusters or groups of similar data points.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.