Efficiently modify a large csv file in Pandas
I have a csv file and would like to do the following modification on it:
df = pandas.read_csv('some_file.csv')
df.index = df.index.map(lambda x: x[:-1])
df.to_csv('some_file.csv')
This takes the index, removes the last character, and then saves it again.
I have multiple problems with this solution since my csv is quite large (around 500GB).
First of all, reading and then writing seems not to be very efficient since every line will be fully overwritten, which is not necessary, right?
Furthermore, due to a lack of RAM, I opened this csv in chunks using pandas.read_csv
's option of a chunksize. Explicitly, here I do not think this option is a good idea to save every individual chunk and append them to a long csv - especially if I use multiprocessing, since the structure of the csv will be completely messed up.
Is there a better solution to this problem?
Thank you very much in advance.