By : Basel Khaled - Saber Mohamed
From 2019-20, we noticed a dramatic 1,160% increase in malicious PDF files – from 411,800 malicious files to 5,224,056. PDF files are an enticing phishing vector as they are cross-platform and allow attackers to engage with users, making their schemes more believable as opposed to a text-based email with just a plain link.
To lure users into clicking on embedded links and buttons in phishing PDF files, we have identified the top five schemes used by attackers in 2020 to carry out phishing attacks, which we have grouped as Fake Captcha, Coupon, Play Button, File Sharing and E-commerce.
Palo Alto Networks customers are protected against attacks from phishing documents through various services, such as Cortex XDR, AutoFocus and Next-Generation Firewalls with security subscriptions including WildFire, Threat Prevention, URL Filtering and DNS Security.
Data Collection
To analyze the trends that we observed in 2020, we leveraged the data collected from the Palo Alto Networks WildFire platform. We collected a subset of phishing PDF samples throughout 2020 on a weekly basis. We then employed various heuristic-based processing and manual analysis to identify top themes in the collected dataset. Once these were identified, we created Yara rules that matched the files in each bucket, and applied the Yara rules across all the malicious PDF files that we observed through WildFire.
Data Overview
In 2020, we observed more than 5 million malicious PDF files. Table 1 shows the increase in the percentage of malicious PDF files we observed in 2020 compared to 2019.
Malware |
Total PDF Files Seen |
Percentage of PDF Malware |
Percentage Increase |
|
2019 |
411,800 |
4,558,826,227 |
0.009% |
1,160% |
2020 |
5,224,056 |
6,707,266,410 |
0.08% |
Table 1. Distribution of malicious PDF samples in 2019 and 2020.
The pie chart in Figure 1 gives an overview of how each of the top trends and schemes were distributed. The largest number of malicious PDF files that we observed through WildFire belonged to the fake “CAPTCHA” category. In the following sections, we will go over each scheme in detail. We do not discuss the ones that fall into the “Other” category, as they include too much variation and do not demonstrate a common theme.