Reading and Processing twitter network data
I have this big data collected from here. So what I would like to asks are
- How to perform file I/O with the file? From the download like it was mentioned that it used .tsv format, but after unpacking I got .twitter file which is foreign for me and so far I haven't found any reliable documentation regarding file I/O of this file type.
- Since the file is huge, supposed i could do file I/O it is still impossible to load everything to a single machine (It is 23 gigabytes in size). What is the tools that is perfect for this, say for graph processing? Is pyspark the right tools?
Category Data Science