Classifying transactions as malicious

Question

Classifying transactions as malicious

thaweatherman

2022年5月2日 17:04

I have a big data set of fake transactions for a company. Each row contains the username, credit card number, time, device used, and amount of money in the transaction. I need to classify each transaction as either malicious or not malicious and I am lost for ideas on where to start. Doing it by hand would be silly.

I was thinking possibly checking for how often a credit card is used, if it is consistently used at a certain time, or if it is used from lots of different devices (iOS AND Android, as an example) would be possible starting places. I'm still fairly new to all this and ML. Would there be some ML algorithm optimal for this problem?

Also, side question: what would be a good place to host the 600 or so GB of data for cheaps?

Thanks

Topic classification bigdata

Category Data Science

Dawny33 · Accepted Answer · 2022年5月2日 17:04

This problem is popularly called the "Credit Card Fraud Detection"

There are several classification algorithms, which aim to tackle this problem.

With the knowledge of the dataset you possess, the Decision Trees algorithm can be employed for detecting malicious transactions from the non-malicious ones. This paper is a nice resource to learn and develop the intuition about fraud detection and the usage of basic classification algorithms like the Decision Trees and the SVMs for solving the problem.

There are several other papers which solve this problems employing algorithms like Neural Networks, Logistic Regression, Genetic Algorithms, etc. However, the paper which uses the decision trees algorithm is a nice place to start learning.

what would be a good place to host the 600 or so GB of data for cheaps?

Aws S3 would be a nice, cheap way to do that. It also integrates nicely with Redshift, in case you want to do complex analytics on the data.

Ajeeth Majhi · Accepted Answer · 2018年6月10日 10:07

Xgboost algorithm has a special parameter named scale-pos weight to deal with imbalanced classification problems. It basically controls the balance of positive and negative weights. You can refer to this link for further details. http://xgboost.readthedocs.io/en/latest/parameter.html

Ram · Accepted Answer · 2016年7月22日 20:11

A rule based classifier is generally suited more for this problem where most of your features are going to contain discrete values.

So, Decision trees, Boosting, Random forests should do the job for you.

One thing you should always keep in mind is how you are going to evaluate your model. For fraud detection, make sure that False negative is eliminated completely. A false positive is fine, but the opposite is dangerous.

Classifying transactions as malicious

About