Word2vec to encode medical procedures when using isolation forests
I am planning to use Isolation Forests in R (solitude package) to identify outlier medical claims in my data.
Each row of my data represents the group of drugs that each provider has administered in the last 12 months.
There are approximately 700+ unique drugs in my dataset and using one-hot encoding with a variety of numerical features will blow out the number of columns in my data.
As an alternative to one-hot encoding I've reading about using word2vec to convert words or in my case the collection of drugs per provider to numerical vectors.
My question is can these numerical features per provider be using as input features in my isolation forest model?
Topic isolation-forest unsupervised-learning anomaly-detection outlier r
Category Data Science