How to deal with errors of defining data types in pandas' read_csv ()?

Question

How to deal with errors of defining data types in pandas' read_csv ()?

Liliana

2022年5月29日 02:01

I have a table with 118,000 rows and 80 columns. I would like to select 8 columns from the table. I am reading the file using the pandas function pd.read_csv command as:

df = pd.read_csv(filename, header=None, sep='|',
                 usecols=[1,3,4,5,37,40,51,76])

I would like to change the data type of each column inside of read_csv using dtype={'5': np.float, '37': np.float, ....}, but this does not work.

There is a message that column 5 has mixed types. The command print(df.dtypes) shows all columns of the type object. When I examine the column 5, I cannot see any problems. I have to change the data type for each column separately using pd.to_numeric.

My question is: Is there a way of setting data types inside read_csv and what is the problem in this case?

Topic pandas python

Category Data Science

n1k31t4 · Accepted Answer · 2020年4月21日 16:13

You could try just using your own solution, replacing np.float:

dtype={'5': pd.to_numeric, '37': np.float, ....}

Or make a function that does what you want:

def convert(val):
    try:
        return np.float(val)
    except:
        return float(val)
    except:
        return pd.to_numeric(val)

    return val

Then:

dtype={'5': convert, '37': np.float, ....}

That is a bit exaggerated, but you get the idea :)

Bruno Lubascher · Accepted Answer · 2020年4月21日 14:35

If you see the warning that your column has mixed types, but you only see numbers there, it could be that missing values are causing the problem.

In Pandas 1.0.0, a new function has been introduced to try to solve that problem. Namely, the Dataframe.convert_dtypes (docs).

You can use it like this:

df = pd.read_csv(filename, header=None, sep='|', usecols=[1,3,4,5,37,40,51,76])
df = df.convert_dtypes()

then check the type of the columns

print(df.dtypes)

How to deal with errors of defining data types in pandas' read_csv ()?

About