Loading file into and out of HDFS via system call/cmd line vs using libhdfs
I am trying to implement a simple C/C++ program for the HDFS file system like word count, it takes a file from the input path puts it into HDFS (where it gets split), processed my map-reduce function and gives an output file which I place back to the local file system.
My question is what makes better design choice to load the files into HDFS: From a C program call bin/hdfs dfs -put ../inputFile /someDirectory
or make use of libhdfs?
Topic c map-reduce apache-hadoop bigdata
Category Data Science