Do I need to encode numerical variables like "year"?
I have a simple time-series dataset. it has a date-time feature column.
user,amount,date,job
chris, 9500, 05/19/2022, clean
chris, 14600, 05/12/2021, clean
chris, 67900, 03/27/2021, cooking
chris, 495900, 04/25/2021, fixing
Using Pandas, I split this column into multiple features like year, month, day
.
## Convert Date Coloumn into Date Time type
data[date] = pd.to_datetime(data[date], errors=coerce)
## Order by User and Date
data = data.sort_values(by=[user, date])
## Split Date into Year, Month, Day
data[year] = data[date].dt.year
data[month] = data[date].dt.month
data[day] = data[date].dt.day
I applied feature_engine's CyclicalTransformer on month, day
features leaving year
feature alone.
data = CyclicalTransformer(variables=[month, day], drop_original=True).fit_transform(data)
Now, I'm unsure what to do with year
feature. I was thinking of applying MinMaxScaler on it, but I wonder whether I could leave it as it is since it is numerical already.
Topic normalization feature-scaling encoding dataset
Category Data Science