Equitable selection of users through ranking

Question

Equitable selection of users through ranking

ShoDaKhan

2021年3月29日 02:52

I am looking to take a dataset largely derived of user input in categorical form, this sign up sheet asks for many data points such as age group, race, sign up date, as well as a few others. My goal is to create a weighted system to choose users equitably based on their responses, I've tried a frequency approach but there are pit falls to that, if 65% of the sign ups are White/Caucasian there will be a disproportionate number of those users selected. I looked into various packages such as survey pewmethods R package to try and standardize the approach but that directly corresponds to survey data and this is not a survey problem. I have triedvarious other packages in python relating to OLS, Random Tree, however I do not feel this is a regression based problem( I could be wrong here). I've tried one hot encoding, and the category encoder library in python to little effect. Below is an example with some data to show what I am seeing. I am also not sure how to go about weighing the date as users who sign up earlier should be given a higher score. Any help or direction would be greatly appreciated, apologies for the poor formatting first time using stack exchange.

Edited for Clarity: Please review the below image, I have taken a percent of whole and conducted a count on Race and then created a column called percent of whole and then taken (1-percentOfWhole) to create an indexcolumn, this process has been replicated to other categorical variables such as over/under XX age, Ethnicity and Gender. I then took the index column in each of these variables and applied them to the original row level data so JaneDoe received .35 from race, .5 from age, .5 from ethnicity and .45 from sex for a total of 1.8, this was done for thousands of records then sorted high to low to create a index or elo of sorts to contact these people from most under represented to over represented. To me this feels like too simple of a process and there is a more elegant approach, that removes all bias and can be purely math driven. Right now an obvious pitfall to this approach is there are penalties for certain group types which could lead to bias.

Topic one-hot-encoding indexing categorical-data

Category Data Science

Equitable selection of users through ranking

About