Re: [RFC 00/12] module: avoid userspace pressure on unwanted allocations

From: Luis Chamberlain <mcgrof@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>,
	Kees Cook <keescook@chromium.org>,
	linux-modules@vger.kernel.org, linux-kernel@vger.kernel.org,
	pmladek@suse.com, petr.pavlu@suse.com, prarit@redhat.com,
	christophe.leroy@csgroup.eu, song@kernel.org, dave@stgolabs.net,
	fan.ni@samsung.com, vincent.fu@samsung.com,
	a.manzanares@samsung.com, colin.i.king@gmail.com
Subject: Re: [RFC 00/12] module: avoid userspace pressure on unwanted allocations
Date: Fri, 24 Mar 2023 14:14:11 -0700	[thread overview]
Message-ID: <ZB4SoxgM6vydrxrj@bombadil.infradead.org> (raw)
In-Reply-To: <CAHk-=whkj6=wyi201JXkw9iT_eTUTsSx+Yb9d4OgmZFjDJA18g@mail.gmail.com>

On Fri, Mar 24, 2023 at 01:28:51PM -0700, Linus Torvalds wrote:
> On Fri, Mar 24, 2023 at 1:00 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On the modules side of things we can be super defensive on the second
> > vmalloc allocation defensive [0] but other than this the initial kread
> > also needs care too.
> 
> Please don't re-implement semaphores. They are a *classic* concurrency limiter.
> 
> In fact, probably *THE* classic one.
> 
> So just do something like this instead:
> 
>    --- a/kernel/module/main.c
>    +++ b/kernel/module/main.c
>    @@ -2937,6 +2937,11 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
>         return load_module(&info, uargs, 0);
>     }
> 
>    +#define CONCURRENCY_LIMITER(name, n) \
>    +    struct semaphore name = __SEMAPHORE_INITIALIZER(name, n)
>    +
>    +static CONCURRENCY_LIMITER(module_loading_concurrency, 50);
>    +
>     SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs,
> int, flags)
>     {
>         struct load_info info = { };
>    @@ -2955,8 +2960,12 @@ SYSCALL_DEFINE3(finit_module, int, fd, const
> char __user *, uargs, int, flags)
>                       |MODULE_INIT_COMPRESSED_FILE))
>                 return -EINVAL;
> 
>    +    err = down_killable(&module_loading_concurrency);
>    +    if (err)
>    +            return err;
>         len = kernel_read_file_from_fd(fd, 0, &buf, INT_MAX, NULL,
>                                        READING_MODULE);
>    +    up(&module_loading_concurrency);
>         if (len < 0)
>                 return len;
> 
> NOTE! Entirely untested. Surprise surprise.

I'll give it a good wack thanks.

But it still begs the question if *other* vmalloc user-interfacing
places need similar practices. It's not just system calls that use it
willy nilly but anything that could in the end use it. Surely a lot of
"issues" could only happen in an insane pathological use case, but it's
a generic thing to keep in mind in the end.

> I'm a tiny bit worried about deadlocks here, so somebody needs to make
> sure that the kernel_read_file_from_fd() case cannot possibly in turn
> cause a "request_module()" recursion.

Automount on a path where the module lies in a path for a modue not
loaded yet triggering a kernel module autoload is the only thing
I can think of that could cause that, but that just calls userspace
modprobe. And I think that'd be an insane setup.

> We most definitely have had module recursion before, but I *think*
> it's always just in the module init function (ie one module requests
> another when ithe mod->init() function is called).

Since you up() right after the kernel_read_file_from_fd() we would not
be racing module inits with this. If however we place the up() after
the load_module() then that does incur that recurssion possibilty.

> I think by the time we have opened the module file descriptors and are
> just reading the data, we should be ok and the above would never
> deadlock, but I want people to at least think about it.
> 
> Of course, with that "max 50 concurrent loaders" limit, maybe it's
> never an issue anyway. Even if you get a recursion a few deep, at most
> you just wait for somebody else to get out of their loaders. Unless
> *everybody* ends up waiting on some progress.

Yeah, these certainly are pathalogical corner cases. I'm more interested
in solving doing something sane for 1000 CPUs or 2000 CPUs for now, even
if the issue was the kernel (not userspace) to blame (and even if those
use cases are being fixed now like the queued up linux-next ACPI
CPU frequency modules per CPU).

  Luis