From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S969899AbdEYQjD (ORCPT <rfc822;w@1wt.eu>);
        Thu, 25 May 2017 12:39:03 -0400
Received: from mail-pf0-f196.google.com ([209.85.192.196]:33929 "EHLO
        mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S966777AbdEYQjB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 25 May 2017 12:39:01 -0400
Date: Thu, 25 May 2017 09:38:57 -0700
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
To: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Filipe Manana <fdmanana@suse.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        linux-doc@vger.kernel.org, rgoldwyn@suse.com, hare@suse.com,
        corbet@lwn.net, torvalds@linux-foundation.org,
        linux-kselftest@vger.kernel.org, akpm@linux-foundation.org,
        dan.j.williams@intel.com, atomlin@redhat.com, rwright@hpe.com,
        xypron.glpk@gmx.de, mmarek@suse.com, martin.wilck@suse.com,
        rusty@rustcorp.com.au, jeffm@suse.com, mingo@redhat.com,
        pmladek@suse.com, linux@roeck-us.net, ebiederm@xmission.com,
        shuah@kernel.org, DSterba@suse.com, keescook@chromium.org,
        gregkh@linuxfoundation.org, jpoimboe@redhat.com, acme@redhat.com,
        mbenes@suse.cz, neilb@suse.com, linux-kernel@vger.kernel.org,
        davem@davemloft.net, jeyu@redhat.com, subashab@codeaurora.org
Subject: Re: [PATCH 1/6] kmod: add dynamic max concurrent thread count
Message-ID: <20170525163857.GC26128@dtor-ws>
References: <20170519032444.18416-1-mcgrof@kernel.org>
 <20170519032444.18416-2-mcgrof@kernel.org>
 <20170519204457.GC19281@dtor-ws>
 <CAB=NE6XGL24O+JfTNUG0HO4obhDc-v+HyL0SCrQELiZrj2-qNw@mail.gmail.com>
 <CAB=NE6Wa4Nemh80yaCCwbjrNRLPD+GJMncg12APg9Vq63AWVng@mail.gmail.com>
 <CAB=NE6Vc6RDAytn2Pkv2V58HFo8ncR0eOHZ3===kbZ2NF78ubg@mail.gmail.com>
 <CAB=NE6Vqmx=y6muenpuQKynTP=pGWMF8tzoCA0BXD6d63q9wPg@mail.gmail.com>
 <20170519215829.GE19281@dtor-ws>
 <20170525162201.GV8951@wotan.suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170525162201.GV8951@wotan.suse.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, May 25, 2017 at 06:22:01PM +0200, Luis R. Rodriguez wrote:
> On Fri, May 19, 2017 at 02:58:29PM -0700, Dmitry Torokhov wrote:
> > On Fri, May 19, 2017 at 02:45:29PM -0700, Luis R. Rodriguez wrote:
> > > On May 19, 2017 1:45 PM, "Dmitry Torokhov" <dmitry.torokhov@gmail.com>
> > > wrote:
> > > 
> > > On Thu, May 18, 2017 at 08:24:39PM -0700, Luis R. Rodriguez wrote:
> > > > We currently statically limit the number of modprobe threads which
> > > > we allow to run concurrently to 50. As per Keith Owens, this was a
> > > > completely arbitrary value, and it was set in the 2.3.38 days [0]
> > > > over 16 years ago in year 2000.
> > > >
> > > > Although we haven't yet hit our lower limits, experimentation [1]
> > > > shows that when and if we hit this limit in the worst case, will be
> > > > fatal -- consider get_fs_type() failures upon mount on a system which
> > > > has many partitions, some of which might even be with the same
> > > > filesystem. Its best to be prudent and increase and set this
> > > > value to something more sensible which ensures we're far from hitting
> > > > the limit and also allows default build/user run time override.
> > > >
> > > > The worst case is fatal given that once a module fails to load there
> > > > is a period of time during which subsequent request for the same module
> > > > will fail, so in the case of partitions its not just one request that
> > > > could fail, but whole series of partitions. This later issue of a
> > > > module request failure domino effect can be addressed later, but
> > > > increasing the limit to something more meaninful should at least give us
> > > > enough cushion to avoid this for a while.
> > > >
> > > > Set this value up with a bit more meaninful modern limits:
> > > >
> > > > Bump this up to 64  max for small systems (CONFIG_BASE_SMALL)
> > > > Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL)
> > > >
> > > > Also allow the default max limit to be further fine tuned at compile
> > > > time and at initialization at run time at boot up using the kernel
> > > > parameter: max_modprobes.
> > > >
> > > > [0] https://git.kernel.org/cgit/linux/kernel/git/history/
> > > history.git/commit/?id=ab1c4ec7410f6ec64e1511d1a7d850fc99c09b44
> > > > [1] https://github.com/mcgrof/test_request_module
> > > 
> > > If we actually run into this issue, instead of slamming the system with
> > > bazillion concurrent requests, can we wait for the other modprobes to
> > > finish and then continue?
> > > 
> > > 
> > > Yes ! That I have a patch that does precisely that ! That is actually still
> > > *not enough* to not fail fatally but this would be subject of another
> > > series with more debatable approaches.
> > > 
> > 
> > Then please post it.
> 
> Will do.
> 
> > > This at least pushes us to closer safer limits for now while also making it
> > > configurable.
> > 
> > Making it configurable depending on how big/little box is makes no
> > sense,
> 
> If we set a hard limit then we need to patch a system if we need to increment
> it. This is rather stupid given we have no current heuristics to make kmod
> loading deterministic from userspace, and in the worst case this can be fatal.
> General system size is a good first guess, but making it configurable is
> really key given current limitations. I'll post further patches which reveals
> some of these issues more clearly.
> 
> > especially if the above is implemented, as depth of modprobe
> > invocations depends on configuration and not computing power of the
> > hardware the system is running on.
> 
> You seem to agree making it configurable is sensible , but not depending on
> the system size ?

No, I am saying that making it configurable based on system size makes
no sense at all, and making it configurable given you already have
patches removing hard failures gives no benefit.

Thanks.

-- 
Dmitry