Organizing datasets, dataset version control, MLOps and other questions
I am currently looking into structuring data and work flows for my ML end to end pipeline.
I therefore have multiple problems, and ideally I am looking for one platform that can do all:
- Visualize and organize multiple datasets. ideally something like the Kaggle datset webinterface
- Do dataset exploration to quickly visualize errors in data, biases in annotations etc.
- Annotate images and potentially point clouds
- commenting functionality for all features
- Keep track of who annotated what on what date
- dataset version control to keep track of changes to annotations, new images added etc, with options for tags like production or release
- List item
- Be able to log and tag specific trained models: production etc.
- Be able to organize training and prediction experiments
- Have traceability on what training runs or models used which dataset version
I have been able to find individual platforms to solve part of these problems, but not a single end-to-end platform
Visualizing datasets and annotating: Remo
Pros:
- Can vizualize multiple datasets
- It is possible to annotate in the webinterface
Cons:
- Data has to be uploaded to the platform, instead of just linking it to a stored location
- No commenting or discussions about annotations
- Not possible to version control data
Image annotation and data versioning: V7Labs
Pros:
- possible to annotate images
- possible to comment
- possible to track and do versioning on datasets
Cons:
- Pricey
Log Experiments, etc: Weights and Biases
Pros:
- Easy way of keeping track of experiments
- Tracking of datasets and matching with experiements and trained models
Cons:
- pricey
Its going to be very expensive if i have to subscribe to multiple platforms, as well as time consuming to keep the tracking of data and annotations connected between the annnotation tool and an experiement MLOps tool like WandB.
Is there a magical tool that I have missed?
Topic image annotation version-control dataset
Category Data Science