Query regarding the 'Data type' of features in Machine Learning

Should all the features in a dataset be converted to the same data type? For instance, if all the features have numerical values, some int some float, should they all be converted to float? What difference would this conversion make?

Topic features preprocessing dataset machine-learning

Category Data Science


In essense all ML models are numerical algorithms which correlate numbers in various ways in order to arrive at other numbers.

So, backstage, all algorithms somehow use only numerical data, even if they do not force the programmer to do so from the start (like CARTs, for example).

Some models, eg all types of Neural Networks, require the programmer to convert all data to strictly numeric values explicitly.

int vs float is not so much a problem and most algorithms will convert between the two if they need so, however some implementations might require type-casting to float explicitly.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.