Using Spark for finding similar users to a user?
I read about https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
but couldn't find a spark library for this implementation.
I have columnar string dataset.
I have a dataset with around data of 15-20 million users with their show_watched, times_watched, genre, channel and some more columns, I need to calculate lookalike/s for a user(or 100k users).
How do I find lookalikes for them within less time,
I have tried by indexing data in Solr, and then using Solr MLT for finding similar users, but that takes a lot of time, also it uses TF-IDF for MLT and I need users which have times_show_watched close to that user's times_show_watched.
Can anyone recommend a better approach for this, maybe using any other framework for faster processing?
I also tried to implement clustering using Spark MLLIB and later search in which cluster a user belongs so that search space is less, but I couldn't get this approach finished.
I am open to any approaches which would be efficient.
Thanks!
Topic similar-documents apache-mahout apache-spark
Category Data Science