Where can i download a benign PE dataset? or at least which website is the best candidate for crawling and downloading normal executables?
I'm planning to gather a benign dataset for my ML malware detection model
the problem I'm having is finding benign PE files, i just need a source that has a dataset of normal executables, i will scan them with VT and extract benign ones, but i cant find anything useful
if there is nothing out there, then at least what is the best website that has the potential to be useful for a PE downloader crawler? (meaning its easy to crawl and automatically download .exe files without running into problems)
also another problem of using a download website is Installers, considering most of their files are installer and i need to install the program first, is there any good solution to this? is there any AutoIT script that somehow can install all types of installers ?
(I tried looking at surveys on using ML in malware detection like [1], but seems like non of the papers have released any useful benign dataset other than simple windows files which anyone can gather and is less than 10k, and very small amounts like 1000, i need to gather a large benign dataset, more than 50,000 benign files because my malware dataset is really large)
[1] https://www.sciencedirect.com/science/article/pii/S0167404818303808
Topic windows deep-learning dataset machine-learning
Category Data Science