All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Qian Cai <cai@redhat.com>
Cc: paulmck@kernel.org, catalin.marinas@arm.com,
	kernel-team@android.com, Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier
Date: Fri, 6 Nov 2020 10:37:56 +0000	[thread overview]
Message-ID: <20201106103755.GA9729@willie-the-truck> (raw)
In-Reply-To: <ec2de23c04e400266fcf98dfd282da0b173a68c3.camel@redhat.com>

On Thu, Nov 05, 2020 at 09:15:24PM -0500, Qian Cai wrote:
> On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote:
> > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote:
> > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote:
> > > > Hmm, this patch has caused a regression in the case that we fail to
> > > > online a CPU because it has incompatible CPU features and so we park it
> > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the
> > > > core will never come online, but is being tracked by RCU. So I'm tempted
> > > > to revert this and live with the lockdep warning while we figure out a
> > > > proper fix.
> > > > 
> > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot
> > > > invoke the full hotplug machinery here? Is it correct to call
> > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the
> > > > CPU doing cpu_up(), or should we do something else?
> > > It looks to me that rcu_report_dead() does the opposite of
> > > rcu_cpu_starting(),
> > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to
> > > rewind,
> > > Paul?
> > 
> > Yes, rcu_report_dead() should do the trick.  Presumably the earlier
> > online-time CPU-hotplug notifiers are also unwound?
> I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL,
> and then __cpu_up() will see a timeout waiting for the AP online and then deal
> with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see
> anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y.

Cheers both for suggesting rcu_report_dead().

> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 09c96f57818c..10729d2d6084 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -421,6 +421,8 @@ void cpu_die_early(void)
>  
>  	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
>  
> +	rcu_report_dead(cpu);

I think this is in the wrong place, see:

https://lore.kernel.org/r/20201106103602.9849-1-will@kernel.org

which seems to fix the problem for me.

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Qian Cai <cai@redhat.com>
Cc: paulmck@kernel.org, Peter Zijlstra <peterz@infradead.org>,
	catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
	kernel-team@android.com, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier
Date: Fri, 6 Nov 2020 10:37:56 +0000	[thread overview]
Message-ID: <20201106103755.GA9729@willie-the-truck> (raw)
In-Reply-To: <ec2de23c04e400266fcf98dfd282da0b173a68c3.camel@redhat.com>

On Thu, Nov 05, 2020 at 09:15:24PM -0500, Qian Cai wrote:
> On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote:
> > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote:
> > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote:
> > > > Hmm, this patch has caused a regression in the case that we fail to
> > > > online a CPU because it has incompatible CPU features and so we park it
> > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the
> > > > core will never come online, but is being tracked by RCU. So I'm tempted
> > > > to revert this and live with the lockdep warning while we figure out a
> > > > proper fix.
> > > > 
> > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot
> > > > invoke the full hotplug machinery here? Is it correct to call
> > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the
> > > > CPU doing cpu_up(), or should we do something else?
> > > It looks to me that rcu_report_dead() does the opposite of
> > > rcu_cpu_starting(),
> > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to
> > > rewind,
> > > Paul?
> > 
> > Yes, rcu_report_dead() should do the trick.  Presumably the earlier
> > online-time CPU-hotplug notifiers are also unwound?
> I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL,
> and then __cpu_up() will see a timeout waiting for the AP online and then deal
> with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see
> anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y.

Cheers both for suggesting rcu_report_dead().

> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 09c96f57818c..10729d2d6084 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -421,6 +421,8 @@ void cpu_die_early(void)
>  
>  	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
>  
> +	rcu_report_dead(cpu);

I think this is in the wrong place, see:

https://lore.kernel.org/r/20201106103602.9849-1-will@kernel.org

which seems to fix the problem for me.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-11-06 10:38 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 18:26 [PATCH] arm64/smp: Move rcu_cpu_starting() earlier Qian Cai
2020-10-28 18:26 ` Qian Cai
2020-10-28 21:00 ` Paul E. McKenney
2020-10-28 21:00   ` Paul E. McKenney
2020-10-29  9:10 ` Will Deacon
2020-10-29  9:10   ` Will Deacon
2020-10-29 13:17   ` Qian Cai
2020-10-29 13:17     ` Qian Cai
2020-10-30  8:15     ` Will Deacon
2020-10-30  8:15       ` Will Deacon
2020-10-29 14:09   ` Paul E. McKenney
2020-10-29 14:09     ` Paul E. McKenney
2020-10-30 16:33 ` Will Deacon
2020-10-30 16:33   ` Will Deacon
2020-11-05 22:22   ` Will Deacon
2020-11-05 22:22     ` Will Deacon
2020-11-05 23:02     ` Qian Cai
2020-11-05 23:02       ` Qian Cai
2020-11-05 23:28       ` Paul E. McKenney
2020-11-05 23:28         ` Paul E. McKenney
2020-11-06  2:15         ` Qian Cai
2020-11-06  2:15           ` Qian Cai
2020-11-06  4:07           ` Paul E. McKenney
2020-11-06  4:07             ` Paul E. McKenney
2020-11-06 10:37           ` Will Deacon [this message]
2020-11-06 10:37             ` Will Deacon
2020-11-06 12:48             ` Qian Cai
2020-11-06 12:48               ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201106103755.GA9729@willie-the-truck \
    --to=will@kernel.org \
    --cc=cai@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=kernel-team@android.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.