URLhaus: A database of malicious URLs used for malware distribution


157 points by fanf2 on 2024-05-15 | 38 comments

Automated Summary

URLhaus is a resource for protecting networks from malware distribution through malicious URLs. It offers a database of malicious URLs, available for download in various formats, and provides insights and recent additions. Users can contribute to the database by providing malware URLs, helping to protect others' networks.


amne on 2024-05-15

To me this looks that for ~90% (eyballed) you just need to tell your browser somehow to stay away from any port other than 80 or 443. If some script/link-rel/src attrib points to a non-http(s) port just pop a warning to confirm you're ok with it. Is there a browser today with this feature?

Or go wild to neuter your browser and configure your firewall to allow only dst ports 80 and 443 for mozilla/chrome/edge/etc.

Also curious what are the performance implications for adding to your firewall about 2 million IPs or maybe a couple 100k if you're brave to do ranges.

yuliyp on 2024-05-15

This feels like the wrong conclusion in an adversarial game. Such a heuristic works only because it's not done at any reasonable rate. As soon as it gets applied nontrivially you'll see scammers adapt back to ports 80/443. And you've got a broken browser to boot.

amne on 2024-05-15

I agree that is the likely outcome. It is interesting to me though that they're not concentrated on ports 80/443. There must be a reason why and one answer, from a comment below, could be that boxes hosting this already serve legit http content on those ports. Having this kind of traffic show up in http monitoring tools would make the hack obvious.

jdsnape on 2024-05-15

A lot of stuff is hosted on compromised sites or devices - which are often running some insecure admin interface on an odd port. Very rare in my experience that someone would spin up a new web server on a box they’ve popped.

Glazui on 2024-05-15

I recently got a pop up in Firefox doing exactly that. I was messing around with my homelab and entered a URL/Port that Firefox deemed suspicious and warned me that “this port is usually not used for web browsing. Are you sure you want to visit that?”

amne on 2024-05-15

I'm hearing more and more nice things about firefox lately. Been on chrome for the last 10 years or so but I might have to switch soon if they break extensions the way the plan to.

bauruine on 2024-05-15

Where do you get those 2 million IPs? The plaintext url list [0] only contains 90k entries and after filtering it to ips only and de duplication it's just 39k.

I've just added it to my firewall that does around 160Mbit/s right now using an ipset and the only increase in CPU I can see is a small blip from the ipset restore. And that's just an APU2 with a AMD GX-412TC (1GHz Quad core from 2014) and not a beefy box.

[0]: https://urlhaus.abuse.ch/downloads/text/

sambazi on 2024-05-17

yea, it should not have any performance implications.

be aware that blocking stuff in your infrastructure will have hard to diagnose fallout and you're generally better of if you police content on the client (ad-blocker)

toast0 on 2024-05-15

On performance, it depends a bit.

If you're running a stateful firewall, those generally don't evaluate firewall rules for established states, and most of your traffic is to established states, so no big deal.

If you're not running a stateful firewall, it's not totally unreasonable to skip the firewall for tcp packets with ACK and not SYN, so again no big deal on those. But http/3 is udp, so no shortcuts there.

Afaik, most firewalls have a lookup table available, you'd want to use that, rather than 2 million rules. On FreeBSD, ipfw and pf have lookup tables, ipf calls them pools, but it looks like the same thing. A lookup table for IP addresses is pretty fast, even with 2M entries.

vladvasiliu on 2024-05-15

> But http/3 is udp, so no shortcuts there.

Usually stateful firewalls create a "state" for UDP connections, so "shortcuts" are still possible. See, for example, pf: https://www.openbsd.org/faq/pf/filter.html#udpstate

accrual on 2024-05-15

> Also curious what are the performance implications for adding to your firewall about 2 million IPs or maybe a couple 100k if you're brave to do ranges.

OpenBSD's pf firewall supports tables of IP address which can be black or whitelisted. From their FAQ:

> A table is ideal for holding a large group of addresses as the lookup time on a table holding 50,000 addresses is only slightly more than for one holding 50 addresses.


plufz on 2024-05-15

i have little snitch configured to only allow 443 and 80 on my browser and have it display a dialog with accept decline on non-default port requests.

sambazi on 2024-05-17

that's cute

plufz on 2024-05-19

I try to be!

wannacboatmovie on 2024-05-15

Google has been doing a pretty good job at breaking the web on their own via "features" in Chrome, let's not give them any other ideas.

ajsnigrutin on 2024-05-15

> Also curious what are the performance implications for adding to your firewall about 2 million IPs or maybe a couple 100k if you're brave to do ranges.

The problem here is, that the "bad guys" move around, and sooner or later you ban most of the eg. digitaloceans IPs, amazon IPs, azures IPs, etc., and you break conectivity for other, "good" uses.

schoen on 2024-05-15

Can anyone explain why malware distributors would prefer (or be somehow forced into using) non-default TCP ports?

rvnx on 2024-05-15

It's because they typically hack web servers, and then they spawn a fresh web server on a random port.

Anything below 1024 requires root access, which can be problematic, and 80/443 may already be used.

VoidWhisperer on 2024-05-15

My (naive) guess is that it makes them slightly less likely to be picked up by scanners like shodan. I'm probably wrong though

9cb14c1ec0 on 2024-05-15

This is a big part of it. There are companies that scan the internet in search of malware, and if you run on a different port it makes the search space so much bigger.

cess11 on 2024-05-15

The low ports typically require root privileges to map to, and people are more likely to keep track of what runs on them.

It's not about Shodan, Shodan will find it. Probably Greynoise too.

sambazi on 2024-05-15

because the original owner of the box would mind if the application on default port stops working

amne on 2024-05-15

or wonder why weird routes like "/i" or "/bin.sh" are among the top accessed URLs in their monitoring dashboards

blueflow on 2024-05-15

Their own waf (?) is messing with the data: https://urlhaus.abuse.ch/browse/tag/mirai/ returns "405 banned".

svacko on 2024-05-15

Surprised to see so much malware served from GitHub domains https://urlhaus.abuse.ch/browse.php?search=github

tomashertus on 2024-05-15

In my day-to-day work, we analyze millions of files every day, and it's well-known and well-utilized detection evasion techniques to host and serve malware from "trusted" websites. It's so widespread that I did extensive research on that issue. There are well-known apps with $Ms in funding and revenue with a plethora of malware hosted on their servers. Some are even used as C2 servers for data exfiltration. I see an increasing number of companies proactively blocking all traffic to those notorious sites to increase overall network security.

The outcome of my research was the following:

- Disjointed content moderation and cybersecurity departments: Not many companies have content moderation teams equipped to perform malware analysis or make cybersecurity-related decisions (the only company that does an exceptional job in this regard is Meta).

- If hosting malware doesn't impact the company's revenue and reputation, the content moderation team has other priorities.

- Section 230: Companies will refer to Section 230 when asked about hosting malicious content or scanning the content for potential malware.

Sephr on 2024-05-15

I see a few false positives. It appears that unsigned software is being labeled as malware, and as grayware on some pages.

Unsigned software is not malware or 'grayware'. It's not inherently malicious.

I'm also seeing coin miners being labeled as malware. They often are, but I'm sure there are misclassificatons along those lines as well in this dataset.

billwashere on 2024-05-15
whirlwin on 2024-05-15

How does it keep the records up-to-date? An IP nowadays is highly elastic and can be relocated to different tenants on your cloud provider.

sambazi on 2024-05-15

it was never a good idea, but works somewhat

tkzed49 on 2024-05-15

Who are abuse.ch? Are they well-known? I assume the hosts file could be useful to add to pihole?

supriyo-biswas on 2024-05-15

Sometimes popular domains like drive.google.com get added, and at other times some domains are just that popular to reach a TRANCO rank of 20,000, so I generally advice against blocking using sources like URLhaus.

sambazi on 2024-05-15

i generally advise against embedding any blocklists into infrastructure; content policy should be done on clients within reach of the user

cess11 on 2024-05-15

Yes, they are fairly well-known. They have been partnering with Spamhaus: https://abuse.ch/blog/abuse-ch-appoints-spamhaus-as-a-licens...

bauruine on 2024-05-15

They seem to be partners of spamhaus. From the recent feed [0] it looks like it's often just IP addresses so you would need to add it to your firewall.

[0]: https://urlhaus-api.abuse.ch/v1/urls/recent/

underlines on 2024-05-15

abuse.ch is a non-profit, initially private. Working on cyber security issues for 15 years. Mainly focused on botnets and malware. Since 2021, abuse.ch is under the Institute for Cybersecurity and Engineering ICE at Bern University of Applied Sciences. To date the project has been funded entirely from private-sector donations.

They have mainly two goals:

1 Research: Research into malware and botnets

2 Open source threat intelligence: indicator of compromise – IOC for the public to prevent threats

EasyMark on 2024-05-15
sharpshadow on 2024-05-15

Great set also for collecting malware.