What are common problems around HADOOP storage?
I've been asked to lead a program to understand why our Hadoop storage is constantly near capacity. What questions should I ask?
- Data age,
- Data size?
- Housekeeping schedule?
- How do we identify the different types of compression used by different applications?
- How can we identify where the duplicate data sources are?
- Are jobs designated for edge nodes only on edge nodes?
Topic apache-hadoop
Category Data Science