linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: Barret Rhoden <brho@google.com>, Jessica Yu <jeyu@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	David Arcari <darcari@redhat.com>
Subject: Re: [PATCH] modules: fix livelock in add_unformed_module()
Date: Mon, 13 May 2019 07:23:20 -0400	[thread overview]
Message-ID: <dd48a3a4-9046-3917-55ba-d9eb391052b3@redhat.com> (raw)
In-Reply-To: <20190510184204.225451-1-brho@google.com>



On 5/10/19 2:42 PM, Barret Rhoden wrote:
> When add_unformed_module() finds an existing module with the same name,
> it waits until the preexisting module finished loading.  Prior to commit
> f9a75c1d717f, this meant if the module was either UNFORMED or COMING,
> we'd wait.  That commit changed the check to be !LIVE, so that we'd wait
> for UNFORMED, COMING, or GOING.

Hi Barret, thanks for the fix but I dropped that patch for other reasons.
Please see below.

> 
> The problem with that commit was that we wait on finished_loading(), and
> that function's state checks were not changed.  If a module was
> GOING, finished_loading() was returning true, meaning to recheck the
> state and presumably break out of the loop.  This mismatch between the
> state checked by add_unformed_module() and the state checked in
> finished_loading() could result in a hang.
> 
> Specifically, when a module was GOING, wait_event_interruptible() would
> immediately return with no error, we'd goto 'again,' see the state !=
> LIVE, and try again.
> 
> This commit changes finished_loading() such that we only consider a
> module 'finished' when it doesn't exist or is LIVE, which are the cases
> that break from the wait-loop in add_unformed_module().
> 
> Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have finished loading")
> Signed-off-by: Barret Rhoden <brho@google.com>
> ---
>  kernel/module.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..0744eea7bb90 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3407,8 +3407,7 @@ static bool finished_loading(const char *name)
>  	sched_annotate_sleep();
>  	mutex_lock(&module_mutex);
>  	mod = find_module_all(name, strlen(name), true);
> -	ret = !mod || mod->state == MODULE_STATE_LIVE
> -		|| mod->state == MODULE_STATE_GOING;
> +	ret = !mod || mod->state == MODULE_STATE_LIVE;
>  	mutex_unlock(&module_mutex);

With your change the above my x86 system still locks up during boot and adds
minutes to boot time because the many CPUs are sleeping waiting for the
module_wq to be run.

A module is loaded once for each cpu.

CPU 0 loads the module, adds the module to the modules list, and sets the module
state to MODULE_STATE_UNFORMED.

Simultaneously all the other CPUs also load the module and sleep in
add_unformed_module().

If CPU 0 is interrupted/rescheduled for any reason that means the other CPUs are
stuck waiting until the module thread on CPU0 is executed again.  I mentioned
earlier that on some systems systemd is launching greater than the number of CPU
module loads so this occurs quite often on large CPU systems.  This is an odd
bug that also needs to be resolved but I suspect that it is in systemd & not the
kernel.

My follow-up patch changes from wait_event_interruptible() to
wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
make progress on other tasks, which changes the return values from
wait_event_interruptible().

https://marc.info/?l=linux-kernel&m=155724085927589&w=2

I believe this also takes your concern into account?

Please correct me if I'm wrong and am not understanding your issue.

P.

>  
>  	return ret;
> 

  reply	other threads:[~2019-05-13 11:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-30 22:22 [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading Prarit Bhargava
2019-05-01  7:49 ` Jessica Yu
2019-05-01 21:26 ` Prarit Bhargava
2019-05-02  9:48   ` Jessica Yu
2019-05-02 12:41     ` Prarit Bhargava
2019-05-02 17:46       ` Prarit Bhargava
2019-05-10 18:40         ` Barret Rhoden
2019-05-10 18:42           ` [PATCH] modules: fix livelock in add_unformed_module() Barret Rhoden
2019-05-13 11:23             ` Prarit Bhargava [this message]
2019-05-13 14:37               ` Barret Rhoden
2019-05-22 17:08                 ` Prarit Bhargava
2019-05-28 14:30                   ` Prarit Bhargava
2019-05-28 14:47                     ` Jessica Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd48a3a4-9046-3917-55ba-d9eb391052b3@redhat.com \
    --to=prarit@redhat.com \
    --cc=brho@google.com \
    --cc=darcari@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jeyu@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).