How to feature engineering after getting test data in deployment?
I am kind of confuse about this topic of feature engineering. I am trying to make an web app in which people can upload test data as csv. Now I am confuse about how to do feature engineering after deploy the app, especially how to handle outliers and missing value?
- Suppose I want to change all the outliers of the test data with Q3+(1.5*IQR) value. My confusion is should I use the training dataset's calculated Q3+(1.5*IQR) value to change all test data's outliers or should I calculate Q3+(1.5*IQR) value for test data separately and change all the outliers.
- Same confusion for missing values. Should I impute missing values with training dataset's mean/median/mode or with test data's mean/median/mode.
- I know that all types of transformer which we fit and transform for the training data, has to be used to only transform test data. But suppose I do any normal transformation which doesn't have
fit
andtransform
, to make a feature more gaussian like, like I usednp.cos()
to make one of the feature more gaussian like. Should I usenp.cos()
to that feature also in test data ?
My overall confusion is how to do feature engineering after deploy of the model, where the test data can be anything like a single row or multiple row with missing values, outliers blah blah.
Topic transformation feature-engineering
Category Data Science