I Built an Ld_preload Worm

https://lcamtuf.substack.com/p/that-time-i-built-an-ld_preload-worm

110 points by zdw on 2024-04-29 | 58 comments

Automated Summary

The article is about the author's discovery of an old proof-of-concept code, a worm they created in the late 1990s named 'unicorns.so'. The worm uses LD_PRELOAD, a debugging mechanism on Unix-like systems, to replace any library calls and infect other systems. The worm was created to demonstrate the fragility of distributed trust and the risks of using su and sudo instead of logging in as root. It was designed to hide its existence, inject commands, and detect and intercept the execution of su or sudo. The author decided not to discuss the implementation publicly due to its risky nature, and now that multi-tenant Unix systems are less common, the issues are less pressing.

Comments

TheAdamist on 2024-04-30

LD_AUDIT is even more interesting, and lesser known. It will load even before preload, and has handy hooks for everything that the loader does. https://man7.org/linux/man-pages/man7/rtld-audit.7.html

And had some nasty easy to exploit privilege escalation problems a while ago, https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2010-3847

aidenn0 on 2024-04-30

There's something poetic when information about an environment-variable vulnerability being served over a CGI script...

kibwen on 2024-04-30

The xz debacle made me keenly aware of what a weak point the linker is, and now I kinda want a system that doesn't have a dynamic linker at all.

singron on 2024-04-30

The linker is a convenient way to get your code to run in the right place, but the linker runs in userspace, and your code can do anything the linker does. E.g. you can scan function tables, remap pages r/w, and interpose functions.

The OpenBSD folks have some good ideas against these issues, although they are targeted at things like ROP exploits. E.g. pledge, which disables certain syscalls, and immutable memory maps (https://marc.info/?l=openbsd-tech&m=166203784715942).

A really scary version of the xz backdoor is if it had an entire implementation of openssh in its own ctor function. Then it wouldn't even need to interpose anything. It's way harder to get rid of ctors.

marcosdumay on 2024-04-30

> E.g. you can scan function tables, remap pages r/w, and interpose functions.

Your code doesn't start doing those things without notice due to external interference.

But a linker will add them, on your behalf, without raising any flags.

cryptonector on 2024-04-30

You might want static linking, but static link semantics are still stuck in 1978. We really need to teach the static linker a bunch of tricks that the dynamic linker knows.

Also, as u/singron notes, a debugger can do to a statically linked executable everything that TFA's unicorns.so can do. Static linking only makes that a wee bit harder, not that much harder, and recall that once a unicorns executable is published then the fact that static linking made it harder up to that time becomes completely irrelevant.

Dwedit on 2024-04-30

Meanwhile in Windows land, "static linking" is very much a thing, but it means you dynamically link only to the basic Windows API libraries (Kernel32, User32, Gdi32, Ole32, Advapi32, etc...).

You can't have a windows program that doesn't have some way to import API functions. System call numbers are shuffled with every version change of Windows.

marcosdumay on 2024-04-30

> You might want static linking

Personally, I really want to go the other way around, with microkernel-like userspace services. That way everybody can drop the pretense that the software author did any verification on his dependencies, and we can add real access controls to the ABI.

But yeah, the current situation is close to the worst possible one. It would improve if we change things on any direction at all.

cryptonector on 2024-04-30

> Personally, I really want to go the other way around, with microkernel-like userspace services.

Yes! This requires very small protocols. Indeed, systemd has such a tiny protocol, and sshd did not need to link to libsystemd to use that protocol, but linking with libsystemd was what Debian had chosen to do.

marcosdumay on 2024-04-30

Except that when only systemd does it, everybody still keeps the pretense and nobody works on security.

And the systemd protocol is made in a way completely unfit for using everywhere. (Not by systemd's fault.)

jiveturkey on 2024-04-30

Could you expand on how xz took advantage of the dynamic loader (I think that's why you most likely meant)?

If you built your binary statically against the compromised version of xz, you'd still suffer wouldn't you? Or did it depend on dynamic linking in some way? I'm not really that familiar with the mechanism it used, so sorry if that's a basic question.

So, eg an entire distro built statically would still have been fully compromised. And without dynamic libs, you wouldn't be able to just update the single lib, you'd have to update everything. So to the extent your own software is statically linked, your own software may be fine (because you never picked up the new update) but you are still vulnerable if the statically linked system had the bad lib linked in.

jcranmer on 2024-04-30

The xz hack used an ifunc resolver to load its payload, where the ifunc resolver is a function that the dynamic loader calls to figure out what the address of a symbol should be (i.e., to make something dependent on the current hardware capabilities).

thenonameguy on 2024-04-30

NixOS/GNU Guix is uniquely positioned in this area, as it tracks all dependencies explicitly, with exact versions. If there is a paradigm where this can be achieved on the OS level, they are the closest to it today.

See this related talk from NixOS 2022: https://www.youtube.com/watch?v=HZKFe4mCkr4

theamk on 2024-04-30

The general idea works fine without static linker though... Just replace LD_PRELOAD manipulation with PATH manipulation, and create wrapper binaries for common apps.

Slightly more detectable, but not by much.

nottorp on 2024-04-30

Two different "philosophical" answers to this:

1. Patching known holes after the fact is just an arm's race that will never end. Better to do secure foundations. But that leads to 2:

2. We can probably build a software environment that is a lot more secure than what we have now, but no one could afford to pay for building it, and possibly no one could afford the operating costs to use it.

klysm on 2024-04-30

I’ve been convinced static linking is the way to go for a number of years. The benefits of dynamic linking don’t make sense anymore imo

tssge on 2024-04-30

On single user systems, sure. However multi user systems can benefit greatly from dynamic linking; some highly multi user systems become impossible to create at all with only static linking (RAM demand becomes too great).

Dynamic linking also leads to better CPU cache utilization and thus higher performance; provided if multiple applications are actually utilizing the dynamically linked library. While RAM is somewhat abundant these days, L1/L2/L3 cache on your CPU most definitely isn't.

Application startup times are also of higher performance when part of the code is already in memory.

EDIT: Let me elaborate a bit on what I mean by "benefiting greatly" with a real world example.

I have a system with around 300 tenants with a minimum of 500 PHP processes online at any given time. A single PHP process is able to load 70MB worth of dynamically linked libraries (PHP extensions) if so chosen by the tenant. I'll just skip libc and other such libraries to give static linking a chance.

With dynamic linking, the maximum memory requirement is 70MB for said extensions.

With static linking, the maximum memory requirement is 35GB for said extensions.

Dynamic linking thus leads to 99,8% cost saving on memory usage. Bear in mind this is at minimum load: static linking requirements will scale linearly with load.

Not to even get started on the performance benefits this brings...

cryptonector on 2024-04-30

When the changes to Solaris 10 got pushed that removed static link archives for the OS/Net core, and which replaced all the statically-linked /bin executables with dynamically-linked versions, boot times improved drastically owing to less I/O being needed because of the sharing.

gpm on 2024-04-30

Are these extensions standard across your users, or are they user supplied?

If the former is the case, in a world where dynamic linking wasn't "the way things were done", you'd just just statically link them all into the PHP interpreters binary, and then only run the code to actually start them for the users that want them. Linux shares memory between the different instances of the same executable the same way it does between different instances of a shared library, there should be no difference in cost. Today, I doubt php supports this, particularly the "only run the code to actually start them" part, but in principle it's a lot simpler than dynamic linking to do that.

If the latter is the case, how are you getting any savings today?

tssge on 2024-04-30

>Are these extensions standard across your users, or are they user supplied?

These extensions are standard PHP extensions, so they are the same for each process and user. Extensions like MySQL driver, Postgres driver, ImageMagick bindings and such.

You are correct that no savings would be possible if these were different across each user.

a1369209993 on 2024-04-30

> These extensions are standard PHP extensions, so they are the same for each process and user.

Then with static linking, the maximum memory requirement is 70MB for said extensions (statically linked into the PHP executable).

tssge on 2024-04-30

Are you basically saying there is no difference in memory usage when comparing static linking and dynamic linking? Any links to more info on the matter?

I tested this on the machine with a static executable I compiled it looks to me that memory is not shared when statically linked. I thought the point of dynamic linking anyways was to save memory by sharing it (and disk space, though these days we have more than enough for executable code).

Though it could be just that I misunderstand the output of smaps_rollup.

rcxdude on 2024-04-30

Multiple executions of the same executable will share the code pages, so assuming all the users are executing the same binary on disk, the memory should be shared. Shared libraries only give you additional deduplication in the case that multiple binaries are loading the same shared libraries.

rfoo on 2024-04-30

> Are you basically saying there is no difference in memory usage when comparing static linking and dynamic linking?

In your case, yes, there is (almost [1]) no difference at all.

If you have 500 different PHP binaries (say, different version of PHP) with same set of extension `.so`-s shared between them (is this even possible?? but let's assume it is), then your example works.

[1] Almost, because if you also linked libc statically (not recommended as glibc is specifically designed to be hostile to static linking), you can't share libc with other processes (e.g. nginx, systemd, etc) so you need ~2MB more memory, this is still paid only once, not multiplied by how many php binary you run, though.

philsnow on 2024-04-30

Dynamic linking gets you two benefits: reduced size on disk (which was a bigger deal a few decades ago) and also the dynamic libraries can be updated/patched without needing to recompile the entire system (so that when openssl.so gets upgraded on disk, you just restart all the processes on the system that link it (because processes still have the old library in memory) and you're done).

This is going to sound a little weird, but if the filesystem did chunked content-aware deduplication, and if the linker knew about this and wrote out compilation units on chunk boundaries... then you could statically compile everything on the system and yet, all different binaries that have libc.a, libm.a, openssl.a etc statically linked would share the same storage and thus the same page cache.

In the example of 500 different PHP versions, if the changes were very minor (as a contrived example, a change in version.c that puts the user's username in the version string), version.c would compile to a different version.o for each user, but main.c/main.o could be identical, and then the magic dedup-aware linker mentioned above writes seven and a half disk chunks' worth of the compiled code from main.o (but those are identical to the chunks written previously, so they don't actually cause writes), one mostly-empty disk chunk for the code from version.o (which will be different than any other chunk, so you'll have 500 disk chunks each with a different version of version.o).

If the changes were more major (say, the tip of main on [0] and the 499 previous versions), there would be less sharing, but still probably quite a lot. There's also wasted space like the ends of all the version.o chunks; if a chunk is 64kB then there's probably a fair amount of wasted space and maybe the whole experiment doesn't net you anything.

Of course, under this kind of system you'd have to recompile the world when you update a core library like libc.

edit: oh man there's tons of fun stuff happening in this area already and has been for years! nix is chunking NARs as of [0] (though I haven't found anything yet about linkers being made to chunk them better for dedup ahead of time, or about re-linkers that do this with already-linked binaries, but maybe their implementation of FastCDC can do it!?). this is so cool!

[0] https://discourse.nixos.org/t/introducing-attic-a-self-hosta...

aidenn0 on 2024-04-30

Also don't forget that statically linked executables pull in less code than dynamically linked executables, so DISTINCT_EXECUTABLES_RUNNING*sizeof(libc) is very much a worst-case number rather than a typical number.

josephg on 2024-04-30

I go back and forth on this.

On MacOS, I do like that the UI libraries for all applications are shipped with the OS. When the OS updates, the UI for all installed programs updates to include the new look and feel. Apple can also update them to change the way that applications render to the screen.

These libraries are quite large, and they probably shouldn't be shipped with every application. If they were, it would make it a lot harder to compile an app to work with multiple versions of macos. Apps would probably break a lot more when the OS updates.

For shell scripts and such, I agree and I'd be happier if everything was statically linked.

My biggest criticism of dynamic linking is how ridiculous the situation is on debian and similar OSes. Imagine if I make some software with rust (or javascript or python or something). My software might pull in 20-100 dependencies from cargo, npm, pip, etc. If I want my software to appear in apt, the debian maintainers insist on adding a mirror of all of those dependencies into apt. So apt sort of maintains a crappy, out of date mirror of a bunch of other language-specific package managers. And as far as I know, they do this work by hand - adding misery to misery.

Nobody wants that. Least of all the software developers - who will get bug reports for their own software compiled with mysteriously out-of-date dependencies that they aren't testing themselves. Its horrible.

AnonymousPlanet on 2024-04-30

I don't understand what you think pip, npm etc. have to do with dynamic linking. Those libraries get loaded in the respective interpreter, the linker doesn't come into action at all. And security wise npm, pip and the likes are nightmares waiting to be exploited xz style. No static linking will save you from it.

josephg on 2024-04-30

It maps.

If you throw a nodejs or python package in apt, you can either include all of its js / python dependencies in the package itself or make separate apt packages for each of the projects dependencies and have the apt package depend on those packages. The difference is pretty similar to static / dynamic linking. Including the dependencies gives the package author explicit control over the versions of all your dependencies. But depending on external apt packages means you can install security updates to shared libraries system wide. (Think log4j or xz). And as I understand it, that is what Debian prefers.

In JavaScript projects it’s even become common practice to use a bundler to “compile” server side JavaScript (with all your dependencies) into a single large unreadable .js file. Doing that can reduce memory usage and dramatically improve startup time. The bundled code usually has dead code elimination. It’s more or less identical to static linking - except the resulting artifact is another JavaScript file.

AnonymousPlanet on 2024-04-30

Ah, it was an analogy. That makes sense, thanks!

cozzyd on 2024-04-30

well, how do you handle multi-language dependencies? You can't tell pip about a cargo project, instead you mirror it on pip anyway...

intelVISA on 2024-04-30

Awoken

rurban on 2024-04-30

You are confusing the loader with the linker. The loader is the problem, always was. Just think of ldd exploits

theamk on 2024-04-30

This is a nice POC, but would be detected pretty fast in "real world".

That "printf" hiding hook seems to be specifically tailored to one command (bash's "set"? "env"?). Any other command to look at environment which does not use printf would show LD_PRELOAD just fine. I, for example, routinely use "strings" to examine environ of desktop processes - try intercepting that. There is also sorted(os.environ.keys()) from Python, and env dump at the start of CI jobs...

Not to mention that even if variable itself is perfectly hidden, the preload action itself is visible. LD_PRELOAD-ed libraries show up in "ldd" output, and in "strace" output as well, and in "gdb" outputs.

The "propagation" part is just appending to ~/.bash_profile, which is trivially discoverable as well. You can "cat" the file and it'd be right there, or if you have some sort of VCS for you dotfiles, they will flag file as having been changed.

With enough imagination, you can figure a way to bypass most of those detection method.. but that "most" is not going to be enough. In the hypothetical example of xz-like attack, you only need one slip-up for such worm to be quickly detected, brought to light, and countermeasures brought up.

(targeted attacks are much scarier, but they aren't likely to use such a crude method anyway... even PATH mangling is more subtle that that!)

planede on 2024-04-30

The "printf" hiding is probably too crude and tailored to bash. But I don't see why the so file couldn't just modify the environment once it's loaded.

wolfendin on 2024-04-30

The real world of 2024, or the real world when this software was written?

theamk on 2024-04-30

Good point actually, the article says "late 1990's".. but most things would still work back then. I believe ldd, strace, procfs all existed back then. Python would not, but one could dump environment using perl or some unusual shell.

Keeping dotfiles in version control was be much more rare, so that detection method would not work back then.

But the shell variety was much greater back then, so the worm could end up on server with tcsh or ksh or rsh (the "restricted shell") which would render it inert. And there was a chance of ending up on Solaris box which would just throw errors about wrong architecture and cause immediate discovery.

Also, there were many more shared systems, so granting "sudo" access to everyone was much less frequent back then. And we could have friendly sysadmins examine someone's configs file if they ask for help, which would lead to discovery as well.

wolfendin on 2024-05-01

I think thres’s also been a large sea change in the thinking that happens in finding the reason for anomalous behavior. Nowadays remote compromise is one of the first things on my mind when troubleshooting but, back in the 90s it was much lower on the list. I think the tooling would have been there to find it easy, but I think getting in the mindset where it needs to be found would be harder.

cookiengineer on 2024-04-30

> but would be detected pretty fast in "real world".

I call bullshit on that one. When did you last cat your bashrc? Every morning? Probably not.

Nobody ever knows what kind of environment variables are modified inside /proc fs. Nobody.

The ones that do uninstall glibc.

theamk on 2024-04-30

Maybe don't interpolate your experience so much? Not everyone is in the Web/AI space, there are still system programmers out there.

My bashrc is under version control, so any changes would immediately appear if I do "status" command.. for me this was last week. (For the record, I have no desire to uninstall glibc, I like my DNS working than you very much.)

The environment variables are pretty simple, and you should really take your time to learn them. When I help others troubleshoot their systems, one of my first steps is to look at their environments for anything suspicious, this is solving at least 30% of all the problems. And sure, sometimes they are somewhat mysterious ("UBUNTU_MENUPROXY=1"?) but you can often guess from context.

But the most important thing is, a self-propagating worm has no idea which machine is it getting into - is it going to be old server which people want to touch as little as possible? Or is it a dev machine of a C++ programmer (or even worse, build engineer) who know every .so file and "strace" things daily? So while there might be specific orgs where such worm can live for a while, it will be found once it gets to a wider audience.

timtzm on 2024-04-30

I still want dynamic linking, but only a few trusted library files would be allowed to make system calls. Like libc. Sorry but golang would have to change to use libc.

This breaks the ABI, but it breaks it for naughty programs the most.

saagarjha on 2024-04-30

OpenBSD does this; it’s not very useful unless you have strong CFI to prevent people from doing a return-oriented attack into those libraries that are in your address space. And also note that there is a lot that you can without system calls to mess with stuff :)

jdsalaro on 2024-04-30

> CFI

They're referring to Control Flow Integrity [1]

[1] https://en.m.wikipedia.org/wiki/Control-flow_integrity

jiveturkey on 2024-04-30

I'm not sure how it's relevant exactly to TFA. The mechanism of propagation is an existing feature of libdl that uses an environment variable. With this worm, the loader still runs exactly as before, from libc and libdl.

As to restricting syscalls from certain calling libraries, macOS has this via entitlements, and I believe OpenBSD and/or NetBSD has this in some form as well.

saagarjha on 2024-04-30

Entitlements cannot protect against things in your own process. They are always used to gate clients either across a kernel-user or XPC boundary.

jiveturkey on 2024-04-30

isn't that exactly what the parent was asking for? limiting syscalls.

EDIT: oh. but not limited to the caller from a specific system library.

rfoo on 2024-04-30

> but only a few trusted library files would be allowed to make system calls. Like libc

This is impossible (without having to do libc.so.7) on Linux, as:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep syscall
    000000000011b520 T syscall

https://elixir.bootlin.com/glibc/glibc-2.39/source/sysdeps/u...

buildbot on 2024-04-30

Isn’t this still actually wormable to some degree? Just among system you have access too. But imagine if any of those at all are shared…

fullspectrumdev on 2024-04-30

Yes. https://thc.org/ssh-it/ being a more recent implementation of the same concept, using $PATH modification instead of LD_PRELOAD.

It also has some other optional tricks such as searching for unprotected SSH keys and the likes.

This kind of thing can be deeply interesting to run in environments where developers share things like jump boxes, to determine the “blast radius” of a compromised account over time as part of a security exercise.

theamk on 2024-04-30

This can only propagate to other users if you are superuser on host and used "sudo".

So yes, they will definitely propagate for a bit, but they won't go very far unless it's a very unusual environment.

buildbot on 2024-04-30

Shared lab machines come to mind! Often many people have sudo and sometimes there is just one shared account.

theamk on 2024-04-30

Do people actually have lab machines with shared accounts and everyone having sudo access? This sounds horrifying.. students do want to experiment, and it's enough for a single person to install gcc-5 as system compiler to ruin everyone's day.

Surely the sysadmins are doing something to prevent this? Like NFS root or at least "no sudo access" by default?

buildbot on 2024-04-30

Good point,, in school the undergrad lab machines where very locked down and secure. In grad school, every prof had their own few machines in the lab that where shared, limited sudo access to me, I'm sure some shared the machines more openly.

At work though, our lab machines that are not cloud based are shared across the team, everyone has sudo. They get broken all the time whenever someone decides they need a different cuda version. I've brought it up as an issue but nobody really cares haha.

fullspectrumdev on 2024-04-30

With some additional tricks, lab machines at universities would be fertile soil for such.

cryptonector on 2024-04-30

TFA's unicorns.so would still work today.

lukaszwojtow on 2024-04-30

Old lcamtuf is back!

loa_in_ on 2024-05-01

Not that they ever went anywhere.

nottorp on 2024-04-30

> My motivation for this code was to demonstrate the fragility of...

Yeah right. I wrote a simple .com infecting virus back when DOS was all the rage. Just to see how it's done. Didn't need any fancy motivational stuff.

Never extended it to exes and never distributed it.

mfreeman451 on 2024-04-30

[dead]