Best way to represent a version feature based on percentiles
We're training a binary classifier in AutoML, and one of the features consist of browser versions. Currently these versions are provided normalized to the model, according to the percentile of the browser the current observation falls into. For example, if the percentiles of some specific browser versions are:
percentile | version |
---|---|
p25 | 34 |
p50 | 45 |
p75 | 53 |
p99 | 70 |
then an observation with said browser and version=54
would be represented as:
p25 | p50 | p75 | p99 |
---|---|---|---|
1 | 1 | 1 | 0 |
My question is, wouldn't it be better to provide a single integer feature called percentile_version
that shows the maximum percentile reached? For the previous example it would be represented as:
percentile_version |
---|
3 |
Given that the observation's version is greater than the first 3 percentiles, in a fixed amount of percentiles to check, of course.