Security News > 2020 > December > Sophos, ReversingLabs Release 20 Million Sample Dataset for Malware Research
Sophos and ReversingLabs on Monday announced SoReL-20M, a database of 20 million Windows Portable Executable files, including 10 million malware samples.
Aimed at driving security improvements across the industry, the database provides metadata, labels, and features for the files within, and enables interested parties to download the available malware samples for further research.
Containing a curated and labeled set of samples and relevant metadata, the publicly-accessible dataset is expected to help accelerate machine learning research for malware detection.
A production-scale dataset covering 20 million samples, including 10 million disarmed pieces of malware, the SoReL-20M dataset aims to address the problem.
For each sample, the dataset includes features that have been extracted based on the EMBER 2.0 dataset, labels, detection metadata, and complete binaries for the included malware samples.