Suggestions for studying *Clickstream* data
I've essentially been handed a dataset of website access history and I'm trying to draw some conclusions from it.
The data supplied gives me the web URL, the datetime for when it was accessed, an the unique ID of the user accessing that data. This means that for a given user ID, I can see a timeline of how they went through the website and what pages they looked at.
I'd quite like to try clustering these users into different categories (it's obvious that some users look at a specific portion of the website compared to others) but I really don't know how to do this.
Things I've looked at:
- Markovclick - This allows me to supply a clickstream of pages, and get a Markov Probability Matrix. I've binned the number of pages down to around ~60 but this library doesn't allow for comparing users which accessed exclusive pages.
- Predicting website exits with machine learning - I quite like the approach here for calculating various metrics based on a user's history but I haven't found anything particularly interesting.
Are there any suggestions for approaches? I'm quite surprised at how little I've managed to find on this kind of work because I naively assumed this is a very popular topic.
Many thanks
Topic web-scraping markov-hidden-model markov-process
Category Data Science