Clustering algorithm for time series data with categorical dtypes
I have a large dataset with around 200 features, consisting mostly of timeseries and categorical data, with some continuous. The dataset is extracted from/by a postal service. Small example:
Random (scrambled) entries:
  shipment        delivery          cost        location                weight_kg
 2020-04-22      2020-04-23         77.31       UK:66c54f531....           0.5
 2020-04-23      2020-04-25         44.14       DK:22c54f531....           2.23
 2020-04-24      2020-04-27         53.84       UK:66c54f531....           1.57 
 2020-04-25      2020-04-26         22.09       UK:66c54f531....            
My first inclination was to make a demand-forecast model on shipment/count_monthly(shipment), but considering the amount of features, a multivariate case seemed more relevant. I am just not sure which additional features to add - and without this project becoming to generic (linear regression). Mine initial EDA depicted variables with low correlation, or removed otherwise to avoid multicollinearity.
Then, instead I considered a clustering approach, to gather and depict relations between the features in more detail. Just not sure how to approach it with such a data size and with timeseries, never really worked with that dtype, especially in combination with categorical dtypes. Any advice would be appreciated.
Edit: the various date columns (like shipment and delivery) are not chronological, and their values appear numerous times, thus cannot be timeseries either. This begs another question: does it even make sense to convert the columns in question to a datetime object?
Topic time-series categorical-data clustering
Category Data Science