Loading Arbitrary Executables as Kernel Modules
Alexei Starovoitov posted some patches to allow the kernel to load regular ELF binaries (aka plain executables) as kernel modules. These modules would be able to run user-mode helper routines instead of being absolutely confined to kernel space.
Alexei listed a variety of benefits for this. For one thing, as a user process, an ELF-based module could crash without bringing down the rest of the kernel. And although the ELF modules would run with root privileges, he said that a security breach would not lead directly into accessing the kernel's inner workings, but at least initially would be confined to userspace. The ELF module also could be terminated by the out-of-memory (OOM) killer, in case of need, or ended directly by a human administrator. It additionally would be feasible to subject ELF-based modules to regular userspace debugging and profiling, using the vast array of tools available for that.
Initially there were various technical questions and criticisms, but no one spoke out immediately against it. Linus Torvalds said he liked the feature, but he wanted one change: to make the type of module visible in the system logs. He said:
When we load a regular module, at least it shows in lsmod afterwards, although I have a few times wanted to really see module load as an event in the logs too. When we load a module that just executes a user program, and there is no sign of it in the module list, I think we *really* need to make that event show to the admin some way.
And he said specifically, "I do *not* want this to be a magical way to hide things."
Andy Lutomirski raised a pertinent question: why not just retool the modprobe program to handle ELF binaries as desired, rather than doing anything with kernel code at all? In other words, why couldn't this feature be implemented entirely outside the kernel?
But Linus replied:
The less we have to mess with user-mode tooling, the better.
We've been *so* much better off moving most of the module loading logic to the kernel, we should not go back in the old broken direction.
I do *not* want the kmod project that is then taken over by systemd, and breaks it the same way they broke firmware loading.
Keep modprobe doing one thing, and one thing only: track dependencies and mindlessly just load the modules. Do *not* ask for it to do anything else.
Right now kmod is a nice simple project. Lots of testsuite stuff, and a very clear goal. Let's keep kmod doing one thing, and not even have to care about internal kernel decisions like "oh, this module might not be a module, but an executable".
If anything, I think we want to keep our options open, in the case we need or want to ever consider short-circuiting things and allowing direct loading of the simple cases and bypassing modprobe entirely.
At one point, Kees Cook did offer some more serious criticisms of the patch's basic goal. Primarily, he noticed that Alexei's patch could be used to—and was intentionally designed to—execute arbitrary code in userspace automatically. This was different from the kernel's normal approach to modules, in which they could be loaded but not executed automatically.
Kees said, "This just extends all the problems we've had with defining security boundaries with modules out to umh [user-mode helper code] too. I would need some major convincing that this can be made safe."
He pointed out that a certain class of kernel bugs—apparently prevalent in the recent past—could redirect module loading outside of a virtual machine (that is, a container) and into the main kernel itself. And since containers could trigger loading an arbitrary module, this meant that a hostile user potentially could load an ELF module, redirect it back to the main kernel, and execute its attacking code immediately with full privileges.
Kees refused to let the patch go into the kernel as written. He said:
At the very least, you need to solve the execution environment problems here: the ELF should run with no greater privileges than what loaded the module, and very importantly, must not be allowed to bypass these checks through autoloading. What triggered the autoload must be the environment, not the "modprobe", since that's running with full privileges.
On the flip side, however, Kees acknowledged that Alexei's patch was an "interesting idea. I think it can work, it just needs much much more careful security boundaries and to solve our autoloading exposures too."
However, Alexei characterized Kees' response as "security paranoia without single concrete example of a security issue."
And Andy also disagreed with Kees' assessment. He pointed out that Kees' issue depended on an attacker finding and exploiting an additional vulnerability that would allow containers to redirect a module outside of itself—something that was not a kernel feature and that would be treated as a bug if it were ever discovered.
Kees agreed with Andy that the problem was not with Alexei's code but instead with potential vulnerabilities elsewhere in the kernel. He said, "I just don't want to extend that problem further." And he added that he wasn't opposed to Alexei's patch, but that his concerns were not paranoia, and "there are very real security boundary violations in this model."
At one point, in defense of Alexei's approach, Andy said, "I don't see how this is any more exploitable than any other init_module()." And Linus replied:
Absolutely. If Kees doesn't trust the files to be loaded, an executable—even if it's running with root privileges and in the initns—is still fundamentally weaker than a kernel module.
So I don't understand the security argument AT ALL. It's nonsensical. The executable loading does all the same security checks that the module loading does, including the signing check.
Kees acknowledged that his concern was not with Alexei's code itself, or even with the design of the feature. But he felt that if certain other bugs did appear in the kernel—as they had before—then someone would be able to exploit the feature to run arbitrary code at the root level.
However, Linus had spoken, and Kees' concern over potential future bugs were apparently not a showstopper. And just to hammer it home, David S. Miller reiterated Linus's point that kernel modules were far more dangerous than executable code, because they could access any container and namespace they pleased.
But this was not the end of the story!
Close by to this part of the conversation, Linus said to Kees:
My own personal worry is actually different—we do check the signature of the file we're loading, but we're then passing it off to execve() not as the image we loaded, but as the file pointer. So the execve() will end up not using the actual buffer we checked the signature on, but instead just re-reading the file.
Among other things, Linus said, "somebody could maybe try to time it and modify the file after-the-fact of the signature check, and then we execute something else."
He went on to say:
Initially, I thought it was a non-issue, because anybody who controls the module subdirectory enough to rewrite files would be in a position to just execute the file itself directly instead. But it turns out that isn't needed. Some bad actor could just do
finit_module()
with a file that they just *copied* from the module directory.
Linus said this issue had to be addressed before the patch could go into the kernel.
Andy also noticed something else that might be a deal-killer. He said:
This patch is a potentially severe ABI break. Right now, loading a module *copies* it into memory and does not hold a reference to the underlying fs. With the patch applied, all kinds of use cases can break in gnarly ways. Initramfs is maybe okay, but initrd may be screwed. If you load an ET_EXEC module from initrd, then umount it, then clear the ramdisk, something will go horribly wrong. Exactly what goes wrong depends on whether userspace notices that umount() failed. Similarly, if you load one of these modules over a network and then lose your connection, you have a problem.
He explained further, "Without your patch, init_module doesn't keep using the file, so it's common practice to load a module and then delete or unmount it. With your patch, the unmount case breaks. This is likely to break existing userspace, so, in Linux speak, it's an ABI break."
At this point—regarding Linus' security exploit—Andy felt that Kees' thumbs-up would be more important than he had at first. Kees' responsibility was module security, which Andy had thought was not an issue earlier in the discussion. Now that it was, it had become more important to get Kees' blessing on this patch. Andy pointed out to Alexei, "Kees is very reasonable, and he'll change his mind and ack a patch that he's nacked when presented with a valid technical argument."
However, he also said:
My ABI break observation is also a major problem, and Linus is going to be pissed if this thing lands in his tree and breaks systems due to an issue that was raised during review. So I think you need to either rework the patch or do a serious survey of how all the distros deal with modules (dracut, initramfs-tools, all the older stuff, and probably more) and make sure they can all handle your patch.
Alexei replied that neither of these problems were real issues. The ABI (application binary interface) break didn't really break the kernel ABI, and the security issue was not a real concern. He said, "I think you need to stop overreacting on a non-issue."
There was a bit of back and forth between them. It turned out that Alexei didn't believe there was an ABI breakage because in his intended use case, everything would be done identically to the way it was now, and so nothing would be broken. But Greg Kroah-Hartman replied, "For your use case, yes. For mine and Andy's and someone else's in the future, it might be." He added, "You are creating a very generic, new, user/kernel api that a whole bunch of people are going to want to use. Let's not hamper the ability for us all to use this right from the beginning please."
And in terms of specific use cases, Greg said:
We have userspace drivers for USB today, being able to drag that out-of-tree codebase into the kernel is a HUGE bonus, and something that I would love to do for a lot of reasons. I also can see moving some of our existing in-kernel drivers out of the kernel in a way that provides "it just works" functionality by using this type of feature.
A bunch of folks, including Linus, started debating ways to address the problems that had been identified so far. Alexei got on board after awhile, and started implemented changes as they were identified by the group.
It seems clear that this feature will go into the kernel. It provides cool functionality that is hotly desired by Linus and others. But the timing of getting the code into the kernel will depend on how well Alexei fixes the various problems, and whether new security or ABI issues arise.
This whole discussion was interesting on a number of levels. I particularly like the speed with which the critics and defenders of Alexei's patch would change positions, without regard to ego, or fear of being seen as "wrong" or anything like that. And what had started as a wholehearted acceptance of a new feature, became concern over its possible problems and a quest to resolve them in a useful way.
I also particularly like that Alexei's initial minor defensiveness was not treated as cause for bullying, and everyone simply tried to keep the discussion productive. It doesn't always go that way, in kernel development.
And of course, I also find it exciting to be able to look in on discussions of potential kernel weaknesses, how they might be exploited by hostile actors, and what might be done to stop them. In the early days, that was exactly how the kernel folks used to talk about Microsoft—right out in the open! What might Microsoft do to destroy open source? How might it destroy the GPL? How might it destroy Linux? And all the while, the kernel people knew that the Microsoft people were reading their every post, just as they know now that hostile attackers are eagerly poking and prodding the mailing list discussions and every patch submission, looking for usable exploits.
Note: if you're mentioned above and want to post a response above the comment section, send a message with your response text to ljeditor@linuxjournal.com.