An Unsupervised learning method suitable for large categorical data sets
I want to detect anomalies in the bank data set in an unsupervised learning method. However, in the bank data set, all columns except time and amount were categorical data, and about half of them had more than 90 percent missing values.
This data set tries to detect anomalies through unsupervised learning. I'm currently using Autoencoder to access it, but I wondered if this would work. Also, because the purpose is to detect whether data is abnormal when data comes in in real time, clustering techniques such as dbscan are limited.
If you want to apply this categorical and many missing values to unsupervised learning, I am curious about how to organize the data and whether there is an appropriate unsupervised learning method.
Since the dataset is the data used in the project, we don't currently have it.