How to create a parquet file from a query to a mysql table
Updating a legacy ~ETL; on it's base it exports some tables of the prod DB to s3, the export contains a query. The export process generates a csv file using the following logic:
res = sh.sed(
sh.mysql(
'-u',
settings_dict['USER'],
'--password={0}'.format(settings_dict['PASSWORD']),
'-D', settings_dict['NAME'],
'-h', settings_dict['HOST'],
'--port={0}'.format(settings_dict['PORT']),
'--batch',
'--quick',
'--max_allowed_packet=512M',
'-e', '{0}'.format(query)
),
r's/"/\\"/g;s/\t/","/g;s/^/"/;s/$/"/;s/\n//g',
_out=filename
)
the mid term solution with more traction is AWS Glue, but if I could have a similar function to generate parquet files instead of csv files there would be much needed big short term gains
Category Data Science