How to represent a time duration feature for cases where time is still counting

I have a problem where I am trying to classify the outcome of costumer complaint cases. I have several features already such as type of item bought, reason for complaint etc...

I am trying to add a feature that represents how long a case is 'open' (meaning waiting for resolution). The logic being that a case that is 'open' for long is unlikely to have a positive outcome.

Issue is, I am traning my model on 'closed' cases, hence have a set closing date. When I apply this model in production it will be for 'open' cases, which have no set closing date.

The most logical thing to do is to calculate de closing date as: duration = OPENING_DATE - Now()

But this seems like It will lead the model into assuming the case will close at the present moment which is most likely not the case.

Is there a better way of feature engineer this?

Thank you

Topic feature-engineering feature-construction classification feature-selection machine-learning

Category Data Science


In my opinion OPENING_DATE - Now() is the right variable as it gives you idea about outcomes of case. A case open for long time may not give positive results. I think you need to think from production persepective rather than feature engineering.

If you score a case only once you will face the problem what we discussed. But if you score your case every day or everyweek and calculate this variable evry time you can create a whole system which monitors tickets and highlights ticktes which may lead to unhappy customers.

SO the solution may lie in how you score tickets in production rather that feature engineering in this case

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.