linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Qian Cai <cai@redhat.com>
Cc: Will Deacon <will@kernel.org>,
	catalin.marinas@arm.com, kernel-team@android.com,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier
Date: Thu, 5 Nov 2020 20:07:14 -0800	[thread overview]
Message-ID: <20201106040714.GS3249@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <ec2de23c04e400266fcf98dfd282da0b173a68c3.camel@redhat.com>

On Thu, Nov 05, 2020 at 09:15:24PM -0500, Qian Cai wrote:
> On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote:
> > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote:
> > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote:
> > > > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote:
> > > > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote:
> > > > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not
> > > > > > early
> > > > > > enough in the CPU-hotplug onlining process, which results in lockdep
> > > > > > splats as follows:
> > > > > > 
> > > > > >  WARNING: suspicious RCU usage
> > > > > >  -----------------------------
> > > > > >  kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader
> > > > > > section!!
> > > > > > 
> > > > > > [...]
> > > > > 
> > > > > Applied to arm64 (for-next/fixes), thanks!
> > > > > 
> > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier
> > > > >       https://git.kernel.org/arm64/c/ce3d31ad3cac
> > > > 
> > > > Hmm, this patch has caused a regression in the case that we fail to
> > > > online a CPU because it has incompatible CPU features and so we park it
> > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the
> > > > core will never come online, but is being tracked by RCU. So I'm tempted
> > > > to revert this and live with the lockdep warning while we figure out a
> > > > proper fix.
> > > > 
> > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot
> > > > invoke the full hotplug machinery here? Is it correct to call
> > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the
> > > > CPU doing cpu_up(), or should we do something else?
> > > It looks to me that rcu_report_dead() does the opposite of
> > > rcu_cpu_starting(),
> > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to
> > > rewind,
> > > Paul?
> > 
> > Yes, rcu_report_dead() should do the trick.  Presumably the earlier
> > online-time CPU-hotplug notifiers are also unwound?
> I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL,
> and then __cpu_up() will see a timeout waiting for the AP online and then deal
> with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see
> anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y.

If this works for the ARM folks, it seems like a reasonable approach
to me.  I cannot reasonably test this because not only do I not have
an ARM system, I don't have a system on which a kernel can be built
with CONFIG_HOTPLUG_CPU=n, so I must rely on others' testing.

							Thanx, Paul

> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 09c96f57818c..10729d2d6084 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -421,6 +421,8 @@ void cpu_die_early(void)
>  
>  	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
>  
> +	rcu_report_dead(cpu);
> +
>  	cpu_park_loop();
>  }
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2a52f42f64b6..bd04b09b84b3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu)
>  	smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
>  }
>  
> -#ifdef CONFIG_HOTPLUG_CPU
>  /*
>   * The outgoing function has no further need of RCU, so remove it from
>   * the rcu_node tree's ->qsmaskinitnext bit masks.
> @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu)
>  	rdp->cpu_started = false;
>  }
>  
> +#ifdef CONFIG_HOTPLUG_CPU
>  /*
>   * The outgoing CPU has just passed through the dying-idle state, and we
>   * are being invoked from the CPU that was IPIed to continue the offline
> 

  reply	other threads:[~2020-11-06  4:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 18:26 [PATCH] arm64/smp: Move rcu_cpu_starting() earlier Qian Cai
2020-10-28 21:00 ` Paul E. McKenney
2020-10-29  9:10 ` Will Deacon
2020-10-29 13:17   ` Qian Cai
2020-10-30  8:15     ` Will Deacon
2020-10-29 14:09   ` Paul E. McKenney
2020-10-30 16:33 ` Will Deacon
2020-11-05 22:22   ` Will Deacon
2020-11-05 23:02     ` Qian Cai
2020-11-05 23:28       ` Paul E. McKenney
2020-11-06  2:15         ` Qian Cai
2020-11-06  4:07           ` Paul E. McKenney [this message]
2020-11-06 10:37           ` Will Deacon
2020-11-06 12:48             ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201106040714.GS3249@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=cai@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=kernel-team@android.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).