How to read in all text files from UNIX bash directory in Cloudera's Python API
I'm still pretty new to Cloudera and using the UNIX environment. I have written a mapper that reads in .txt files from a directory in my Windows system, which works just fine. I read files in like this:
import glob
files = glob.glob("*.txt")
Is there an equivalent way to do this in the UNIX environment? I know I can read in one file by
infile=sys.stdin
but as far as reading all in from one directory I'm not sure.
Thanks!
Topic text-mining map-reduce python
Category Data Science