SMOTE and oversampling with constraints

I'm trying to apply SMOTE to a dataset that has time-constraints. I have information about users visiting a website. For some features, there are time constraints, e.g having the first visit and the last visit at the website, the first visit (timestamp) is always lower or equal than the last visit. If I apply SMOTE(or SMOTENC for categorical), I end up having synthetic samples for which the last visit occurred before the first visit. This leads to a sample that cannot exist in the real-world, hence can affect negatively the performance of the model. Is there a way to apply SMOTE and impose certain rules. Or in alternative, are there oversampling techniques that can deal with this problem?

Topic smotenc imbalanced-learn smote class-imbalance

Category Data Science


One option would be to do something more similar to bootstrapping since that would be re-sampling existing data.

Another option would be to generate extra samples then prune based on the constraints.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.