Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
	Matt Redfearn <matt.redfearn@imgtec.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Howells <dhowells@redhat.com>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Jessica Yu <jeyu@redhat.com>, Michal Marek <mmarek@suse.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux MIPS Mailing List <linux-mips@linux-mips.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Petr Mladek <pmladek@suse.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers
Date: Fri, 4 Aug 2017 02:10:35 +0200	[thread overview]
Message-ID: <20170804001035.GI27873@wotan.suse.de> (raw)
In-Reply-To: <CAGXu5jKDpsZAEEvoUQWghhbnF=g30Z1tkQxdv+vhmLJM5FW+qQ@mail.gmail.com>

On Thu, Aug 03, 2017 at 05:02:40PM -0700, Kees Cook wrote:
> On Wed, Aug 2, 2017 at 4:23 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
> >> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >> > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> >> >> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> >> >> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> >> >> no longer caught, it just ends up waiting indefinitely for the kmod_wq
> >> >> wait queue. Hence the kernel appears to hang silently when starting
> >> >> userspace.
> >> >
> >> > Indeed, the recursive issue were no longer expected to exist.
> >>
> >> Errr, yeah, recursive binfmt loads can still happen.
> >>
> >> > The *old* implementation would also prevent a set of binaries to daisy chain
> >> > a set of 50 different binaries which require different binfmt loaders. The
> >> > current implementation enables this and we'd just wait. There's a bound to
> >> > the number of binfmd loaders though, so this would be bounded. If however
> >> > a 2nd loader loaded the first binary we'd run into the same issue I think.
> >> >
> >> > If we can't think of a good way to resolve this we'll just have to revert
> >> > 6d7964a722af for now.
> >>
> >> The weird but "normal" recursive case is usually a script calling a
> >> script calling a misc format. Getting a chain of modprobes running,
> >> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
> >> it'd be better for the request_module() to fail.
> >
> > In that case how about we just have each waiter only wait max X seconds,
> > if the number of concurrent ongoing modprobe calls hasn't reduced by
> > a single digit in X seconds we give up on request_module() for the
> > module and clearly indicate what happened.
> >
> > Matt, can you test?
> >
> > Note I've used wait_event_killable_timeout() to only accept SIGKILL
> > for now. I've seen issues wit SIGCHILD and at modprobe this could
> > even be a bigger issue, so this would restrict the signals received
> > *only* to SIGKILL.
> >
> > It would be good to come up with a simple test case for this in
> > tools/testing/selftests/kmod/kmod.sh
> >
> >   Luis
> >
> > diff --git a/include/linux/wait.h b/include/linux/wait.h
> > index 5b74e36c0ca8..dc19880c02f5 100644
> > --- a/include/linux/wait.h
> > +++ b/include/linux/wait.h
> > @@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
> >         __ret;                                                                  \
> >  })
> >
> > +#define __wait_event_killable_timeout(wq_head, condition, timeout)             \
> > +       ___wait_event(wq_head, ___wait_cond_timeout(condition),                 \
> > +                     TASK_KILLABLE, 0, timeout,                                \
> > +                     __ret = schedule_timeout(__ret))
> > +
> > +/**
> > + * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
> > + * @wq_head: the waitqueue to wait on
> > + * @condition: a C expression for the event to wait for
> > + * @timeout: timeout, in jiffies
> > + *
> > + * The process is put to sleep (TASK_KILLABLE) until the
> > + * @condition evaluates to true or a kill signal is received.
> > + * The @condition is checked each time the waitqueue @wq_head is woken up.
> > + *
> > + * wake_up() has to be called after changing any variable that could
> > + * change the result of the wait condition.
> > + *
> > + * Returns:
> > + * 0 if the @condition evaluated to %false after the @timeout elapsed,
> > + * 1 if the @condition evaluated to %true after the @timeout elapsed,
> > + * the remaining jiffies (at least 1) if the @condition evaluated
> > + * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
> > + * interrupted by a kill signal.
> > + *
> > + * Only kill signals interrupt this process.
> > + */
> > +#define wait_event_killable_timeout(wq_head, condition, timeout)               \
> > +({                                                                             \
> > +       long __ret = timeout;                                                   \
> > +       might_sleep();                                                          \
> > +       if (!___wait_cond_timeout(condition))                                   \
> > +               __ret = __wait_event_killable_timeout(wq_head,                  \
> > +                                               condition, timeout);            \
> > +       __ret;                                                                  \
> > +})
> > +
> >
> >  #define __wait_event_lock_irq(wq_head, condition, lock, cmd)                   \
> >         (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0,     \
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 6d016c5d97c8..1b5f7bada8d2 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
> >  static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
> >
> >  /*
> > + * If modprobe can't be called after this time we assume its very likely
> > + * your userspace has created a recursive dependency, and we'll have no
> > + * option but to fail.
> > + */
> > +#define MAX_KMOD_TIMEOUT 5
> 
> Would this mean slow (swappy) systems could start failing modprobe
> just due to access times?

No, this is pre-launch and depends on *all* running kmod threads.
The wait would *only* fail if we already hit the limit of 50 concurrent
kmod threads running at the same time and they *all* don't finish for 5 seconds
straight. If at any point in time any modprobe call finishes that would clear
this and the waiting modprobe waiting would chug on. So this would only happen
if we were maxed out busy without any return for X seconds straight with all
kmod threads busy.

The name probably should reflect that better then, MAX_KMOD_ALL_BUSY_TIMEOUT
maybe?

  Luis