File converter: from CSV to HDF5
Can anyone recommend any command line tool for converting large CSV file into HDF5 format?
Topic csv
Category Data Science
Can anyone recommend any command line tool for converting large CSV file into HDF5 format?
Topic csv
Category Data Science
to_hdf
:import numpy as np
import pandas as pd
#filename = '/tmp/test.hdf5'
filename = 'D:\test.hdf5'
df = pd.DataFrame(np.arange(10).reshape((5,2)), columns=['C1', 'C2'])
print(df)
# C1 C2
# 0 0 1
# 1 2 3
# 2 4 5
# 3 6 7
# Save to HDF5
df.to_hdf(filename, 'data', mode='w', format='table')
del df # allow df to be garbage collected
# Append more data
df2 = pd.DataFrame(np.arange(10).reshape((5,2))*10, columns=['C1', 'C2'])
df2.to_hdf(filename, 'data', append=True)
print(pd.read_hdf(filename, 'data'))
df.to_hdf
:import numpy as np
import pandas as pd
#filename = '/tmp/test.hdf5'
filename = 'D:\test.hdf5'
store = pd.HDFStore(filename)
for i in range(2):
df = pd.DataFrame(np.arange(10).reshape((5,2)) * 10**i, columns=['C1', 'C2'])
store.append('data', df)
store.close()
store = pd.HDFStore(filename)
data = store['data']
print(data)
store.close()
chunksize
parameter and append each chunk to the HDF file which was answered here.Personally, I like the 1st and 2nd approaches.
Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.