Basic Machine Learning Question, Looking at where to start

Was recommended to post here instead of StackOverflow

I am looking to do some ML, and I just need to know the words to start going off and which library/path to go down.

I have two data sets that look something like the below,

| UserName | Location | Department |
|test.user | Chicago  | IT         |
|asd.smith | LA       | Marketing  |   
|qwe.smith | Chicago  | IT         |   
|dfg.smith | Chicago  | Marketing  |

and

| UserName | Permission | 
|test.user | 1          | 
|asd.smith | 2          | 
|asd.smith | 4          |   
|qwe.smith | 1          | 
|dfg.smith | 1          |   
|dfg.smith | 2          | 
|dfg.smith | 3          | 

The problem I am trying to solve is, If a new person is hired into Chicago/Marketing, what is the % chance they would have permission X.

So with the above datasets I would expect it to say, There is a 100% chance that they would have Permission 1, 100% they have Permission 2 50% chance they have Permission 3

I am really just looking for a point in the right direction on where to start/what models exist for a problem like this/the right words to google.

Topic multiclass-classification

Category Data Science


Like mentioned by Brian Spiering, this is a probability-based problem and best tackled with Bayes.

When considering Machine Learning, which is a bit of a buzz term these days, you need to really consider the problem. Are you looking to study features of the data (Feature Analysis) or are you looking to make a prediction for the next person to be added to the list (Prediction)? Machine learning can achieve both of these but can be a lot of work if you're just looking to study the features.

Software like STATA, SPSS, etc, can handle the size of data you mentioned and can be a lot more straightforward when you just want to understand the data and gain insights.

To provide a more meaningful response, it would be great to get a detailed picture of the other variables mentioned, as these variables could also have a large impact on the model and use case selected.


One option is to build a naive Bayes classifier. Given your two categorical features (i.e., location and department), predict a single category member out of a group of categories (i.e., permission).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.