Re: kmod: provide wrappers for kmod_concurrent inc/dec

From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Jessica Yu <jeyu@redhat.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
	Petr Mladek <pmladek@suse.com>, Kees Cook <keescook@chromium.org>,
	shuah@kernel.org, Rusty Russell <rusty@rustcorp.com.au>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	martin.wilck@suse.com, Michal Marek <mmarek@suse.com>,
	hare@suse.com, rwright@hpe.com, Jeff Mahoney <jeffm@suse.com>,
	DSterba@suse.com, fdmanana@suse.com, neilb@suse.com,
	Guenter Roeck <linux@roeck-us.net>,
	rgoldwyn@suse.com, subashab@codeaurora.org,
	Heinrich Schuchardt <xypron.glpk@gmx.de>,
	Aaron Tomlin <atomlin@redhat.com>,
	mbenes@suse.cz, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kselftest@vger.kernel.org,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: kmod: provide wrappers for kmod_concurrent inc/dec
Date: Fri, 6 Jan 2017 21:54:42 +0100	[thread overview]
Message-ID: <20170106205442.GW13946@wotan.suse.de> (raw)
In-Reply-To: <20161222044806.bajz6tdg5gtvud2t@jeyu>

On Wed, Dec 21, 2016 at 08:48:06PM -0800, Jessica Yu wrote:
> +++ Luis R. Rodriguez [16/12/16 09:05 +0100]:
> > On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> > > On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > > > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > > kmod_concurrent is used as an atomic counter for enabling
> > > > > > the allowed limit of modprobe calls, provide wrappers for it
> > > > > > to enable this to be expanded on more easily. This will be done
> > > > > > later.
> > > > > >
> > > > > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > > > > > ---
> > > > > >  kernel/kmod.c | 27 +++++++++++++++++++++------
> > > > > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > > > > index cb6f7ca7b8a5..049d7eabda38 100644
> > > > > > --- a/kernel/kmod.c
> > > > > > +++ b/kernel/kmod.c
> > > > > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > > > > >         return -ENOMEM;
> > > > > >  }
> > > > > >
> > > > > > +static int kmod_umh_threads_get(void)
> > > > > > +{
> > > > > > +       atomic_inc(&kmod_concurrent);
> > > 
> > > This approach might actually cause false failures. If we
> > > are on the limit and more processes do this increment
> > > in parallel, it makes the number bigger that it should be.
> > 
> > This approach is *exactly* what the existing code does :P
> > I just provided wrappers. I agree with the old approach though,
> > reason is it acts as a lock in for the bump.
> 
> I think what Petr meant was that we could run into false failures when multiple
> atomic increments happen between the first increment and the subsequent
> atomic_read.
> 
> Say max_modprobes is 64 -
> 
>       atomic_inc(&kmod_concurrent); // thread 1: kmod_concurrent is 63
>            atomic_inc(&kmod_concurrent); // thread 2: kmod_concurrent is 64
>                 atomic_inc(&kmod_concurrent); // thread 3: kmod_concurrent is 65
>       if (atomic_read(&kmod_concurrent) < max_modprobes) // if all threads read 65 here, then all will error out
>               return 0;                                  // when the first two should have succeeded (false failures)
>       atomic_dec(&kmod_concurrent);
>       return -ENOMEM;
> 
> But yeah, I think this issue was already in the existing kmod code..

Ah right, but the code was very simple and there is only one operation
in between which we'd race against given the old code just incremented
first nd immediately checked for the limit. The more code we have the
more chances for what you describe to happen.

I've added another change into my series, a clutch, its at the end of this
email. With this we change we check for the limit right away and put on
hold any items reaching the limit, while other requests passing the limit
will be bumped. We have then:
 
        if (!kmod_concurrent_sane()) {                                          
                pr_warn_ratelimited("request_module: kmod_concurrent (%u) close to critical levels (max_modprobes: %u) for module %s\n, backing off for a bit",
                                    atomic_read(&kmod_concurrent), max_modprobes, module_name);
                wait_event_interruptible(kmod_wq, kmod_concurrent_sane());      
        }                                                                       
                                                                                
        ret = kmod_umh_threads_get();                                           
        if (ret) {                                                              
                pr_err_ratelimited("%s: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
                                   __func__, module_name, max_modprobes);       
                return ret;                                                     
        }  

The same race you describe is possible -- but we now would at least use
a clutch immediately as we approach the limit. Maybe it makes sense to
post a new series after I fold the alias code and sanity check into a
debug kconfig option ?

  Luis

commit 95c55552283cf99e2a48b84dc766d5fa547f046e
Author: Luis R. Rodriguez <mcgrof@kernel.org>
Date:   Thu Dec 15 23:24:22 2016 -0600

    kmod: add a clutch around 1/4 of modprobe thread limit
    
    If we reach the limit of modprobe_limit threads running the next
    request_module() call will fail. The original reason for adding
    a kill was to do away with possible issues with in old circumstances
    which would create a recursive series of request_module() calls.
    We can do better than just be super aggressive and reject calls
    once we've reached the limit by adding a clutch so that if we're
    1/4th of the way close to the limit we make these new calls wait
    until pending threads complete.
    
    There is still a chance you can fail new incomming requests which
    can bump kmod_concurrent beyond the limit, however the clutch helps
    with a bit of breathing room to allow the system to process pending
    requests before activating the upper last 1/4th of the limit requests.
    
    This fixes test cases 0008 and 0009 of the selftest for kmod:
    
    tools/testing/selftests/kmod/kmod.sh -t 0008
    tools/testing/selftests/kmod/kmod.sh -t 0009
    
    Both tests reveal the clutch in action:
    
    Dec 15 16:12:14 piggy kernel: request_module: kmod_concurrent (96) close critical levels (max_modprobes: 128) for module test_module
    ...
    Dec 15 16:12:23 piggy kernel: request_module: kmod_concurrent (96) close critical levels (max_modprobes: 128) for module test_module
    ...
    
    The only difference is the clutch helps with avoiding making
    request_module() requests fatal more often. With x86_64 qemu,
    with 4 cores, 4 GiB of RAM it takes the following run time to
    run both tests:
    
    time kmod.sh -t 0008
    real    0m22.247s
    user    0m0.084s
    sys     0m11.328s
    
    time kmod.sh -t 0009
    real    0m58.785s
    user    0m0.492s
    sys     0m10.852s
    
    Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>

diff --git a/kernel/kmod.c b/kernel/kmod.c
index d6595d2de209..f8c880bbf658 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -58,6 +58,7 @@ static DECLARE_RWSEM(umhelper_sem);
 
 #ifdef CONFIG_MODULES
 static atomic_t kmod_concurrent = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
 
 /*
 	modprobe_path is set via /proc/sys.
@@ -156,6 +157,16 @@ int get_kmod_umh_count(void)
 	return atomic_read(&kmod_concurrent);
 }
 
+static bool kmod_concurrent_sane(void)
+{
+	unsigned int clutch;
+
+	clutch = get_kmod_umh_limit() - (get_kmod_umh_limit()/4);
+	if (get_kmod_umh_count() < clutch)
+		return true;
+	return false;
+}
+
 /**
  * __request_module - try to load a kernel module
  * @wait: wait (or not) for the operation to complete
@@ -199,6 +210,12 @@ int __request_module(bool wait, const char *fmt, ...)
 	if (ret)
 		return ret;
 
+	if (!kmod_concurrent_sane()) {
+		pr_warn_ratelimited("request_module: kmod_concurrent (%u) close to critical levels (max_modprobes: %u) for module %s\n, backing off for a bit",
+				    atomic_read(&kmod_concurrent), max_modprobes, module_name);
+		wait_event_interruptible(kmod_wq, kmod_concurrent_sane());
+	}
+
 	ret = kmod_umh_threads_get();
 	if (ret) {
 		pr_err_ratelimited("%s: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
@@ -211,6 +228,7 @@ int __request_module(bool wait, const char *fmt, ...)
 	ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
 
 	kmod_umh_threads_put();
+	wake_up_all(&kmod_wq);
 	return ret;
 }
 EXPORT_SYMBOL(__request_module);
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index f8ccc938e0fb..08d9bea4bade 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -52,28 +52,8 @@ ALL_TESTS="$ALL_TESTS 0004:1:1"
 ALL_TESTS="$ALL_TESTS 0005:10:1"
 ALL_TESTS="$ALL_TESTS 0006:10:1"
 ALL_TESTS="$ALL_TESTS 0007:5:1"
-
-# Disabled tests:
-#
-# 0008 x 150 -  multithreaded - push kmod_concurrent over max_modprobes for request_module()"
-# Current best-effort failure interpretation:
-# Enough module requests get loaded in place fast enough to reach over the
-# max_modprobes limit and trigger a failure -- before we're even able to
-# start processing pending requests.
-ALL_TESTS="$ALL_TESTS 0008:150:0"
-
-# 0009 x 150 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()"
-# Current best-effort failure interpretation:
-#
-# get_fs_type() requests modules using aliases as such the optimization in
-# place today to look for already loaded modules will not take effect and
-# we end up requesting a new module to load, this bumps the kmod_concurrent,
-# and in certain circumstances can lead to pushing the kmod_concurrent over
-# the max_modprobe limit.
-#
-# This test fails much easier than test 0008 since the alias optimizations
-# are not in place.
-ALL_TESTS="$ALL_TESTS 0009:150:0"
+ALL_TESTS="$ALL_TESTS 0008:150:1"
+ALL_TESTS="$ALL_TESTS 0009:150:1"
 
 test_modprobe()
 {