faster alternatives to sparse.model.matrix?
I have a large dataset that is entirely categorical. I'm trying to train with it using xgboost, so I must first convert this categorical data to numerical. So far I've been using sparse.model.matrix() in the Matrix library but it is far too slow. I found a great solution here, however, the sparse matrix it returns in not the same one that sparse.model.matrix returns. I know there is a way to force sparse.model.matrix to return identical output as the solution in the link (by providing contrasts), however, that is not an effective solution as it is still too slow and results in different representation (and hence different training model).
Is there a way to accomplish the job sparse.model.matrix does even nearly as fast as the solution I posted? For my data, the solution I posted does it in about 15% of the time.
Topic representation r categorical-data
Category Data Science