Should I apply a transformation to columns with INTEGERS, in case I want to reduce the skewness of that column?

I am performing EDA on a dataset of Hotel Reservations. Target is Categorical stating if a given customer will cancel the reservation or not. Dataset has 25 features, 30244 entries.

I have two features stating the number of adults and the number of babies coming with the person who made the reservation.

  • Number of adults can be 1, 2, 3, 4, or 5. (Range specifically given in dataset description)
  • Number of babies in the train set take values 0, 1, or 2 (but a range is NOT specified in the dataset description)

When I checked for the skewness of the dataset, the number of adults and the number of babies columns had skewness 0.75 (I was going to apply log transformation to columns with skewness |0.75| to normalize their distribution)

As these two columns only contain integer values, I am unsure whether to apply a transformation or not because the transformation will give floating values to these columns.

Should I apply the transformation or not? Skew 1.710768 1.407404 0.858807

Topic transformation dataset data-cleaning

Category Data Science


No - do not apply a log transformation to integer features.

The distribution of the features (including what you are calling skewness) will help the enable the model to learn which features combinations could predict the target.

One way to model the data is conditional probability. Given a certain number of adults and babies, how likely is the reservation to be canceled? Transforming the features will distort the calculation of conditional probability.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.