Where can i download a benign PE dataset? or at least which website is the best candidate for crawling and downloading normal executables?

I'm planning to gather a benign dataset for my ML malware detection model

the problem I'm having is finding benign PE files, i just need a source that has a dataset of normal executables, i will scan them with VT and extract benign ones, but i cant find anything useful

if there is nothing out there, then at least what is the best website that has the potential to be useful for a PE downloader crawler? (meaning its easy to crawl and automatically download .exe files without running into problems)

also another problem of using a download website is Installers, considering most of their files are installer and i need to install the program first, is there any good solution to this? is there any AutoIT script that somehow can install all types of installers ?

(I tried looking at surveys on using ML in malware detection like [1], but seems like non of the papers have released any useful benign dataset other than simple windows files which anyone can gather and is less than 10k, and very small amounts like 1000, i need to gather a large benign dataset, more than 50,000 benign files because my malware dataset is really large)

[1] https://www.sciencedirect.com/science/article/pii/S0167404818303808

Topic windows deep-learning dataset machine-learning

Category Data Science


You can check my repo https://github.com/bormaa/Benign-NET It contains about 14,000 benign .NET files and I am working on a new repo for all benign exe files


I finally solved this by using the Virusshare website. It has millions of malwares, and is free.

Note that around 1-2% of their PE files are probably benign, meaning less than 1-2 detection on VirusTotal, so just labeling every single PE file as malware might not be academically complete.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.