Storage of N-dimensional matrices (tensors) as part of machine learning pipelines
I'm an infra person working on a storage product. I've been googling quite a bit to find an answer to the following question but unable to do so. Hence, I am attemping to ask the question here.
I am aware that relational data or structured data can often be represented in 2-dimensional tables like DataFrames and that can be used for ML training input. If we want to store the DataFrames they can easily be stored as tables in a data-store.
I have also come to know that Tensors (N-dimenional matrices) are used these days for Deep learning tasks. My question is:
- Is there a need to store these Tensors back to the disk as part of the overall ML pipeline process? Or do people usually read back the 2-D data frames and then start all the way from there.
- What is the format that is used these days for persisting Tensors to the disk.
My understanding is that there is no Tensor storage format. Instead people just load the source data from disk (be a 2-D data frame or Images/videos etc.) and then recompute the Tensors if need be instead of persisting it to disk. Is that correct?
Topic dataframe tensorflow apache-spark apache-hadoop
Category Data Science