Currency Normalization for Salary Prediction

I have a dataset (350k data points) with data of employees across different regions over the last 10 years. The dataset consists of their skills, the region they are in, the industry, their current role, their salary in the respective currency.
After doing some analysis, I have found 60% of the salaries are in SGD, 30% in INR, and the rest are divided across 15 other currencies. Is it recommended that I have a model for each currency or is there a way I can convert all the currencies to a universal value so I can use all my data points to train?
Currently, I have used the 40% of the points available in SGD to train a random forest model and I have found that the results on the test set are reasonably accurate. For this model, I have considered skills, role, and industry as features and nothing else. Is there any better model I can explore? Thank you

Topic deep-learning dataset nlp clustering machine-learning

Category Data Science


I would suggest convert all currency into one currency using a mapping. (Say we will convert all currency to SGD). Use the following python code to convert.

conversion = dict{
    'SGD' : 1.0,
    'INR' : --- , #Conversion Rate
    ..........

}
map(lambda x : conversion[x], mylist) # Your currency list.

#Then apply standard normalization
mylist = np.array(mylist)
mylist = (mylist - min(mylist))/(max(mylist) - min(mylist))

Within each currency (say SGD or INR), you can divide the salary by quartiles, for e.g the top 25 percentiles of salaries within each currency can be assigned a value of 4, the next 25th percentile a 3 and so and so forth. This way you are able to normalize salaries across currencies.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.