Is spark als item feature comparable between several runs

I am using spark als.train() to build my user-items recommendation system.

The problem is I want to cover more item feature. So, I need to input 7 days user action data. But the als train become slow than just input 1 day data.

So, is it possible that I just input 1 day data, and compare the similarities between other runs(every time just input 1 day)?

Topic apache-spark machine-learning

Category Data Science


You can train and evaluate the model day-by-day.

Something like this

from pyspark import SparkContext
from pyspark.mllib.recommendation import ALS
from pyspark.mllib.evaluation import RegressionMetrics

sc = SparkContext()

for day in range(1, 7):

    # Load and parse the data
    data = sc.textFile(f"data/day_{day}.data")

    # Build the recommendation model using Alternating Least Squares
    rank = 10
    numIterations = 10
    model = ALS.train(data, rank, numIterations)

    # Evaluate the model 
    metrics = RegressionMetrics(model.predict(data))
    print(f"RMSE = {metrics.rootMeanSquaredError}")

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.