Data preprocessing framework/library alternatives
I am currently working on some python machine learning projects that are soon to be deployed to production. As such, in our team we are interested in doing this the most correct way, following MLOps principles.
Specifically, I am currently researching the step of data preprocessing and how to implement it in a robust way against training-serving skew. I've considered Tensorflow Transform, that after a single run of some defined preprocessing steps, generates a graph artifact that can be reused after training. Although a downside of using it would be the need to stick with Tensorflow data formats. Is there any good alternative?
The only similar examples of frameworks/libraries that I've found until now are Keras preprocessing layers and sklearn preprocessing pipelines. I have searched on a lot of sites and blogs but still haven't found a similar kind of discussion.
Topic mlops tensorflow preprocessing python machine-learning
Category Data Science