How can I use a class variable with many possible values in logistic regression?

I am attempting to build a logistic regression model that determines the probability of an outcome based on a set of independent variables.

For context, the data is based on a project in which sales representatives and branch managers for a builders merchant were given price recommendations for their customers' deals, and were given the option of saying 'Yes' or 'No' to these price recommendations. The Yes or No answer is my dependent variable; I need to determine which variables can predict with the highest probability whether the respondent will say 'no' to a price recommendation.

Most of the independent variables work fine in this model, except for one which I am currently unable to test; individual differences between the respondents.

My hypothesis is that the propensity to say 'no' will be stronger in some individual respondents than others, be it for psychological or geographical reasons, and that these individual differences will be a stronger determiner for the dependent variable than any other class variable.

There are about 800 respondents, so simply shoving them in as an independent variable does not produce the desired results.

Is there a method of doing this in logistic regression? Should I use another analysis technique for this?

I am using Statistical Analysis Software (SAS) to carry out the logistic regression.

Topic sas logistic-regression statistics

Category Data Science


This may not be the answer you are looking for, but I think this is a telling part of your challenge

My hypothesis is that the propensity to say 'no' will be stronger in some individual respondents than others, be it for psychological or geographical reasons, and that these individual differences will be a stronger determiner for the dependent variable than any other class variable.

Unless I misunderstood you, it seems you want to have an input parameter that indicates how reflexively a respondent will be to just respond with 'No'. The problem is, as I see it, you don't know the answer to this. You have no way to measure this directly. No?

If you have more data about the respondents, maybe you can back into this. Huge assumption here, but if you could get a count of the number of recommendations made vs the number of times they have said 'No' you could come up with some factor: $$f_{(No)} = {{Number\ of\ "No"\ Responses}\over{Number\ of\ Recommendations}}$$

This may get you what you are looking for, but it assumes you can tie back the data to the respondent and that the respondent is, in fact, the Branch Manger or Sales Representative, not the customer. The problem with this approach is that it assumes that all the recommendations were equally competitive.

HTH

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.