Clustering using both text and numerical features
I have a dataset that contains 2 types of features, one is generated from doc2vec and one is numerical feature. I would like to perform clustering analysis on them. However, due to the size of doc2vec features, if I simply combine them into one array, clustering algorithm would distribute the weight on the doc2vec features more, how do I overcome this problem?
For example, for a given label, say I have features from doc2vec that look like this [1,2,3,4,5], and numerical feature [2]. I don't want to simply combine them into [1,2,3,4,5,2] and perform clustering analysis. Ideally, I would like my clustering algorithm to give the numerical feature equal importance as the doc2vec feature.