Data cleaning in Pandas, where the csv file has all data of each row in 1 field

Question

Data cleaning in Pandas, where the csv file has all data of each row in 1 field

PlatinumMaths

2021年6月1日 18:24

I have really messy data that looks like this:

As you can see all the data in each row is contained in 1 column separated by a semi colon.

How do I arrange this data so that they are spread out over more columns? For example, category_id, category_id_lvl_0 etc., to be in separate columns and the rows underneath corresponding to that columns i.e ones that are separated by the semi colon to fall under the column of category_id, category_id_lvl_0...

Topic data-wrangling data-cleaning

Category Data Science

Oxbowerce · Accepted Answer · 2021年5月31日 15:30

That to me doesn't seem like messy data at all, it is just a csv file with a ; delimiter. Depending on the region settings excel can use different delimiters when saving data as .csv file, ; being one of them. By default pandas assumes a , as the delimiter, which in this case does not apply. Try reading it in by specifying the correct delimiter using the sep argument as follows:

import pandas as pd

df = pd.read_csv(filename, delimiter=";")

Data cleaning in Pandas, where the csv file has all data of each row in 1 field

About