Is there any way to collect categorical features quickly in Julia DataFrames?

I'm using Julia 0.6.3 with Dataframes.jl

I was wondering if there was any way to get categorial features easily in Julia?

For large datasets it can be impossible to enter everything by hand.

My workaround is to rely on strings and usually low cardinality but it's not fool-proof.

My workaround so far :

cat_cols = []
for col in cols
    if contains(string(typeof(X_train[col])),"String") == true
        push!(cat_cols,col)
    end
end

But it seems kind of ugly and I don't catch label encoded values because they are integers.

I could also try to rely on low unique counts but then sparse features would be taken in aswell.

Any idea? Thanks!

Topic dataframe julia categorical-data

Category Data Science


I think you can use the eltypes function in DataFrames.

categorical_indices = eltypes(X_train) .== String
categorical_columns = names(X_train)[categorical_indices] 

This should provide a vector of Symbols for each of the categorical columns.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.