What is the best practice to normalize/standardize imbalanced data for outlier detection or binary classification task?
I'm researching Anomaly/outlier/fraud detection, and I'm looking for the best practice to pre-process the synthetic data for imbalanced data. I have checked all methodology for normalizing/standardizing, which are not sensitive to the presence of outliers and fit this case study. Based on scikit-learn 0.24.2 study about Compare the effect of different scalers on data with outliers, it has been stated here:
If some outliers are present in the set, robust scalers or transformers are more appropriate.
I'm using CTU-13 dataset, which you can see the overview of its distributions in the dataset here.
Concerning the synthetic nature of the dataset, I need to use categorical-encoding for some features/columns to convert them into numerical values for my representation-based learning model (e.g., using an image form of data as inputs for learning algorithms like CNN. Please check the Ref.: Figure 6 in this paper ).
My question: What is the best normalizing method I can use that fits my research case, Anomaly/outlier/fraud detection for imbalance data at the end of the day in preprocessing stage to have the robust outlier detection model or binary classifier?
Any help/update about the state-of-the-art in this concept will be appreciated!