* [PATCH] arm64/smp: Move rcu_cpu_starting() earlier @ 2020-10-28 18:26 Qian Cai 2020-10-28 21:00 ` Paul E. McKenney ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Qian Cai @ 2020-10-28 18:26 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel, Qian Cai The call to rcu_cpu_starting() in secondary_start_kernel() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call trace: dump_backtrace+0x0/0x3c8 show_stack+0x14/0x60 dump_stack+0x14c/0x1c4 lockdep_rcu_suspicious+0x134/0x14c __lock_acquire+0x1c30/0x2600 lock_acquire+0x274/0xc48 _raw_spin_lock+0xc8/0x140 vprintk_emit+0x90/0x3d0 vprintk_default+0x34/0x40 vprintk_func+0x378/0x590 printk+0xa8/0xd4 __cpuinfo_store_cpu+0x71c/0x868 cpuinfo_store_cpu+0x2c/0xc8 secondary_start_kernel+0x244/0x318 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the secondary_start_kernel() function. Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Signed-off-by: Qian Cai <cai@redhat.com> --- arch/arm64/kernel/smp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 82e75fc2c903..09c96f57818c 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -222,6 +222,7 @@ asmlinkage notrace void secondary_start_kernel(void) if (system_uses_irq_prio_masking()) init_gic_priority_masking(); + rcu_cpu_starting(cpu); preempt_disable(); trace_hardirqs_off(); -- 2.28.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-10-28 18:26 [PATCH] arm64/smp: Move rcu_cpu_starting() earlier Qian Cai @ 2020-10-28 21:00 ` Paul E. McKenney 2020-10-29 9:10 ` Will Deacon 2020-10-30 16:33 ` Will Deacon 2 siblings, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2020-10-28 21:00 UTC (permalink / raw) To: Qian Cai Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > enough in the CPU-hotplug onlining process, which results in lockdep > splats as follows: > > WARNING: suspicious RCU usage > ----------------------------- > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > no locks held by swapper/1/0. > > Call trace: > dump_backtrace+0x0/0x3c8 > show_stack+0x14/0x60 > dump_stack+0x14c/0x1c4 > lockdep_rcu_suspicious+0x134/0x14c > __lock_acquire+0x1c30/0x2600 > lock_acquire+0x274/0xc48 > _raw_spin_lock+0xc8/0x140 > vprintk_emit+0x90/0x3d0 > vprintk_default+0x34/0x40 > vprintk_func+0x378/0x590 > printk+0xa8/0xd4 > __cpuinfo_store_cpu+0x71c/0x868 > cpuinfo_store_cpu+0x2c/0xc8 > secondary_start_kernel+0x244/0x318 > > This is avoided by moving the call to rcu_cpu_starting up near the > beginning of the secondary_start_kernel() function. > > Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ > Signed-off-by: Qian Cai <cai@redhat.com> Interesting way to compute "cpu" earlier in the code, but nevertheless: Acked-by: Paul E. McKenney <paulmck@kernel.org> > --- > arch/arm64/kernel/smp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 82e75fc2c903..09c96f57818c 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -222,6 +222,7 @@ asmlinkage notrace void secondary_start_kernel(void) > if (system_uses_irq_prio_masking()) > init_gic_priority_masking(); > > + rcu_cpu_starting(cpu); > preempt_disable(); > trace_hardirqs_off(); > > -- > 2.28.0 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-10-28 18:26 [PATCH] arm64/smp: Move rcu_cpu_starting() earlier Qian Cai 2020-10-28 21:00 ` Paul E. McKenney @ 2020-10-29 9:10 ` Will Deacon 2020-10-29 13:17 ` Qian Cai 2020-10-29 14:09 ` Paul E. McKenney 2020-10-30 16:33 ` Will Deacon 2 siblings, 2 replies; 14+ messages in thread From: Will Deacon @ 2020-10-29 9:10 UTC (permalink / raw) To: Qian Cai Cc: Paul E. McKenney, Peter Zijlstra, Catalin Marinas, linux-arm-kernel, linux-kernel On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > enough in the CPU-hotplug onlining process, which results in lockdep > splats as follows: > > WARNING: suspicious RCU usage > ----------------------------- > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > no locks held by swapper/1/0. > > Call trace: > dump_backtrace+0x0/0x3c8 > show_stack+0x14/0x60 > dump_stack+0x14c/0x1c4 > lockdep_rcu_suspicious+0x134/0x14c > __lock_acquire+0x1c30/0x2600 > lock_acquire+0x274/0xc48 > _raw_spin_lock+0xc8/0x140 > vprintk_emit+0x90/0x3d0 > vprintk_default+0x34/0x40 > vprintk_func+0x378/0x590 > printk+0xa8/0xd4 > __cpuinfo_store_cpu+0x71c/0x868 > cpuinfo_store_cpu+0x2c/0xc8 > secondary_start_kernel+0x244/0x318 > > This is avoided by moving the call to rcu_cpu_starting up near the > beginning of the secondary_start_kernel() function. Hmm, it's not really a move though -- we'll end up calling this thing twice afaict. It would be better to make sure we've called notify_cpu_starting() early enough. Can we do that instead? Will ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-10-29 9:10 ` Will Deacon @ 2020-10-29 13:17 ` Qian Cai 2020-10-30 8:15 ` Will Deacon 2020-10-29 14:09 ` Paul E. McKenney 1 sibling, 1 reply; 14+ messages in thread From: Qian Cai @ 2020-10-29 13:17 UTC (permalink / raw) To: Will Deacon Cc: Paul E. McKenney, Peter Zijlstra, Catalin Marinas, linux-arm-kernel, linux-kernel On Thu, 2020-10-29 at 09:10 +0000, Will Deacon wrote: > On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > enough in the CPU-hotplug onlining process, which results in lockdep > > splats as follows: > > > > WARNING: suspicious RCU usage > > ----------------------------- > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > other info that might help us debug this: > > > > RCU used illegally from offline CPU! > > rcu_scheduler_active = 1, debug_locks = 1 > > no locks held by swapper/1/0. > > > > Call trace: > > dump_backtrace+0x0/0x3c8 > > show_stack+0x14/0x60 > > dump_stack+0x14c/0x1c4 > > lockdep_rcu_suspicious+0x134/0x14c > > __lock_acquire+0x1c30/0x2600 > > lock_acquire+0x274/0xc48 > > _raw_spin_lock+0xc8/0x140 > > vprintk_emit+0x90/0x3d0 > > vprintk_default+0x34/0x40 > > vprintk_func+0x378/0x590 > > printk+0xa8/0xd4 > > __cpuinfo_store_cpu+0x71c/0x868 > > cpuinfo_store_cpu+0x2c/0xc8 > > secondary_start_kernel+0x244/0x318 > > > > This is avoided by moving the call to rcu_cpu_starting up near the > > beginning of the secondary_start_kernel() function. > > Hmm, it's not really a move though -- we'll end up calling this thing twice > afaict. It would be better to make sure we've called notify_cpu_starting() > early enough. Can we do that instead? Paul mentioned that it is fine to call rcu_cpu_starting() multiple times, and Peter mentioned that CPU bringup is complicated. Thus, I thought about doing something safe here. I tested a bit of patch below which seems fine, but I can't tell for sure if it is safe. Any suggestion? --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -224,6 +224,7 @@ asmlinkage notrace void secondary_start_kernel(void) preempt_disable(); trace_hardirqs_off(); + notify_cpu_starting(cpu); /* * If the system has established the capabilities, make sure @@ -244,7 +245,6 @@ asmlinkage notrace void secondary_start_kernel(void) /* * Enable GIC and timers. */ - notify_cpu_starting(cpu); ipi_setup(cpu); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-10-29 13:17 ` Qian Cai @ 2020-10-30 8:15 ` Will Deacon 0 siblings, 0 replies; 14+ messages in thread From: Will Deacon @ 2020-10-30 8:15 UTC (permalink / raw) To: Qian Cai Cc: Paul E. McKenney, Peter Zijlstra, Catalin Marinas, linux-arm-kernel, linux-kernel On Thu, Oct 29, 2020 at 09:17:35AM -0400, Qian Cai wrote: > On Thu, 2020-10-29 at 09:10 +0000, Will Deacon wrote: > > On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > splats as follows: > > > > > > WARNING: suspicious RCU usage > > > ----------------------------- > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > > > other info that might help us debug this: > > > > > > RCU used illegally from offline CPU! > > > rcu_scheduler_active = 1, debug_locks = 1 > > > no locks held by swapper/1/0. > > > > > > Call trace: > > > dump_backtrace+0x0/0x3c8 > > > show_stack+0x14/0x60 > > > dump_stack+0x14c/0x1c4 > > > lockdep_rcu_suspicious+0x134/0x14c > > > __lock_acquire+0x1c30/0x2600 > > > lock_acquire+0x274/0xc48 > > > _raw_spin_lock+0xc8/0x140 > > > vprintk_emit+0x90/0x3d0 > > > vprintk_default+0x34/0x40 > > > vprintk_func+0x378/0x590 > > > printk+0xa8/0xd4 > > > __cpuinfo_store_cpu+0x71c/0x868 > > > cpuinfo_store_cpu+0x2c/0xc8 > > > secondary_start_kernel+0x244/0x318 > > > > > > This is avoided by moving the call to rcu_cpu_starting up near the > > > beginning of the secondary_start_kernel() function. > > > > Hmm, it's not really a move though -- we'll end up calling this thing twice > > afaict. It would be better to make sure we've called notify_cpu_starting() > > early enough. Can we do that instead? > > Paul mentioned that it is fine to call rcu_cpu_starting() multiple times, and > Peter mentioned that CPU bringup is complicated. Thus, I thought about doing > something safe here. > > I tested a bit of patch below which seems fine, but I can't tell for sure if it > is safe. Any suggestion? No, you're right -- this does look dodgy as I think we'll end up kicking the CPU notifiers before things like CPU errata have been figured out. So I'll pick up your original patch with Paul's ack. Thanks! Will ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-10-29 9:10 ` Will Deacon 2020-10-29 13:17 ` Qian Cai @ 2020-10-29 14:09 ` Paul E. McKenney 1 sibling, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2020-10-29 14:09 UTC (permalink / raw) To: Will Deacon Cc: Qian Cai, Peter Zijlstra, Catalin Marinas, linux-arm-kernel, linux-kernel On Thu, Oct 29, 2020 at 09:10:45AM +0000, Will Deacon wrote: > On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > enough in the CPU-hotplug onlining process, which results in lockdep > > splats as follows: > > > > WARNING: suspicious RCU usage > > ----------------------------- > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > other info that might help us debug this: > > > > RCU used illegally from offline CPU! > > rcu_scheduler_active = 1, debug_locks = 1 > > no locks held by swapper/1/0. > > > > Call trace: > > dump_backtrace+0x0/0x3c8 > > show_stack+0x14/0x60 > > dump_stack+0x14c/0x1c4 > > lockdep_rcu_suspicious+0x134/0x14c > > __lock_acquire+0x1c30/0x2600 > > lock_acquire+0x274/0xc48 > > _raw_spin_lock+0xc8/0x140 > > vprintk_emit+0x90/0x3d0 > > vprintk_default+0x34/0x40 > > vprintk_func+0x378/0x590 > > printk+0xa8/0xd4 > > __cpuinfo_store_cpu+0x71c/0x868 > > cpuinfo_store_cpu+0x2c/0xc8 > > secondary_start_kernel+0x244/0x318 > > > > This is avoided by moving the call to rcu_cpu_starting up near the > > beginning of the secondary_start_kernel() function. > > Hmm, it's not really a move though -- we'll end up calling this thing twice > afaict. It would be better to make sure we've called notify_cpu_starting() > early enough. Can we do that instead? It uses a per-CPU variable so that RCU pays attention only to the first call to rcu_cpu_starting() if there is more than one of them. This is even intentional, due to there being a generic arch-independent call to rcu_cpu_starting() in notify_cpu_starting(). So multiple calls to rcu_cpu_starting() are fine by design. Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-10-28 18:26 [PATCH] arm64/smp: Move rcu_cpu_starting() earlier Qian Cai 2020-10-28 21:00 ` Paul E. McKenney 2020-10-29 9:10 ` Will Deacon @ 2020-10-30 16:33 ` Will Deacon 2020-11-05 22:22 ` Will Deacon 2 siblings, 1 reply; 14+ messages in thread From: Will Deacon @ 2020-10-30 16:33 UTC (permalink / raw) To: Paul E. McKenney, Qian Cai Cc: catalin.marinas, kernel-team, Will Deacon, Peter Zijlstra, linux-kernel, linux-arm-kernel On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > enough in the CPU-hotplug onlining process, which results in lockdep > splats as follows: > > WARNING: suspicious RCU usage > ----------------------------- > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > [...] Applied to arm64 (for-next/fixes), thanks! [1/1] arm64/smp: Move rcu_cpu_starting() earlier https://git.kernel.org/arm64/c/ce3d31ad3cac Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-10-30 16:33 ` Will Deacon @ 2020-11-05 22:22 ` Will Deacon 2020-11-05 23:02 ` Qian Cai 0 siblings, 1 reply; 14+ messages in thread From: Will Deacon @ 2020-11-05 22:22 UTC (permalink / raw) To: Paul E. McKenney, Qian Cai Cc: catalin.marinas, kernel-team, Peter Zijlstra, linux-kernel, linux-arm-kernel On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > enough in the CPU-hotplug onlining process, which results in lockdep > > splats as follows: > > > > WARNING: suspicious RCU usage > > ----------------------------- > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > [...] > > Applied to arm64 (for-next/fixes), thanks! > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > https://git.kernel.org/arm64/c/ce3d31ad3cac Hmm, this patch has caused a regression in the case that we fail to online a CPU because it has incompatible CPU features and so we park it in cpu_die_early(). We now get an endless spew of RCU stalls because the core will never come online, but is being tracked by RCU. So I'm tempted to revert this and live with the lockdep warning while we figure out a proper fix. What's the correct say to undo rcu_cpu_starting(), given that we cannot invoke the full hotplug machinery here? Is it correct to call rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the CPU doing cpu_up(), or should we do something else? Will ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-11-05 22:22 ` Will Deacon @ 2020-11-05 23:02 ` Qian Cai 2020-11-05 23:28 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Qian Cai @ 2020-11-05 23:02 UTC (permalink / raw) To: Will Deacon, Paul E. McKenney Cc: catalin.marinas, kernel-team, Peter Zijlstra, linux-kernel, linux-arm-kernel On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > splats as follows: > > > > > > WARNING: suspicious RCU usage > > > ----------------------------- > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > > > [...] > > > > Applied to arm64 (for-next/fixes), thanks! > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > Hmm, this patch has caused a regression in the case that we fail to > online a CPU because it has incompatible CPU features and so we park it > in cpu_die_early(). We now get an endless spew of RCU stalls because the > core will never come online, but is being tracked by RCU. So I'm tempted > to revert this and live with the lockdep warning while we figure out a > proper fix. > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > invoke the full hotplug machinery here? Is it correct to call > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > CPU doing cpu_up(), or should we do something else? It looks to me that rcu_report_dead() does the opposite of rcu_cpu_starting(), so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to rewind, Paul? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-11-05 23:02 ` Qian Cai @ 2020-11-05 23:28 ` Paul E. McKenney 2020-11-06 2:15 ` Qian Cai 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2020-11-05 23:28 UTC (permalink / raw) To: Qian Cai Cc: Will Deacon, catalin.marinas, kernel-team, Peter Zijlstra, linux-kernel, linux-arm-kernel On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > > splats as follows: > > > > > > > > WARNING: suspicious RCU usage > > > > ----------------------------- > > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > > > > > [...] > > > > > > Applied to arm64 (for-next/fixes), thanks! > > > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > > > Hmm, this patch has caused a regression in the case that we fail to > > online a CPU because it has incompatible CPU features and so we park it > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > core will never come online, but is being tracked by RCU. So I'm tempted > > to revert this and live with the lockdep warning while we figure out a > > proper fix. > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > invoke the full hotplug machinery here? Is it correct to call > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > CPU doing cpu_up(), or should we do something else? > It looks to me that rcu_report_dead() does the opposite of rcu_cpu_starting(), > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to rewind, > Paul? Yes, rcu_report_dead() should do the trick. Presumably the earlier online-time CPU-hotplug notifiers are also unwound? Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-11-05 23:28 ` Paul E. McKenney @ 2020-11-06 2:15 ` Qian Cai 2020-11-06 4:07 ` Paul E. McKenney 2020-11-06 10:37 ` Will Deacon 0 siblings, 2 replies; 14+ messages in thread From: Qian Cai @ 2020-11-06 2:15 UTC (permalink / raw) To: paulmck Cc: Will Deacon, catalin.marinas, kernel-team, Peter Zijlstra, linux-kernel, linux-arm-kernel On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote: > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not > > > > > early > > > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > > > splats as follows: > > > > > > > > > > WARNING: suspicious RCU usage > > > > > ----------------------------- > > > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader > > > > > section!! > > > > > > > > > > [...] > > > > > > > > Applied to arm64 (for-next/fixes), thanks! > > > > > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > > > > > Hmm, this patch has caused a regression in the case that we fail to > > > online a CPU because it has incompatible CPU features and so we park it > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > > core will never come online, but is being tracked by RCU. So I'm tempted > > > to revert this and live with the lockdep warning while we figure out a > > > proper fix. > > > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > > invoke the full hotplug machinery here? Is it correct to call > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > > CPU doing cpu_up(), or should we do something else? > > It looks to me that rcu_report_dead() does the opposite of > > rcu_cpu_starting(), > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to > > rewind, > > Paul? > > Yes, rcu_report_dead() should do the trick. Presumably the earlier > online-time CPU-hotplug notifiers are also unwound? I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL, and then __cpu_up() will see a timeout waiting for the AP online and then deal with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y. diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 09c96f57818c..10729d2d6084 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -421,6 +421,8 @@ void cpu_die_early(void) update_cpu_boot_status(CPU_STUCK_IN_KERNEL); + rcu_report_dead(cpu); + cpu_park_loop(); } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 2a52f42f64b6..bd04b09b84b3 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu) smp_mb(); /* Ensure RCU read-side usage follows above initialization. */ } -#ifdef CONFIG_HOTPLUG_CPU /* * The outgoing function has no further need of RCU, so remove it from * the rcu_node tree's ->qsmaskinitnext bit masks. @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu) rdp->cpu_started = false; } +#ifdef CONFIG_HOTPLUG_CPU /* * The outgoing CPU has just passed through the dying-idle state, and we * are being invoked from the CPU that was IPIed to continue the offline ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-11-06 2:15 ` Qian Cai @ 2020-11-06 4:07 ` Paul E. McKenney 2020-11-06 10:37 ` Will Deacon 1 sibling, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2020-11-06 4:07 UTC (permalink / raw) To: Qian Cai Cc: Will Deacon, catalin.marinas, kernel-team, Peter Zijlstra, linux-kernel, linux-arm-kernel On Thu, Nov 05, 2020 at 09:15:24PM -0500, Qian Cai wrote: > On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote: > > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > > > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > > > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not > > > > > > early > > > > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > > > > splats as follows: > > > > > > > > > > > > WARNING: suspicious RCU usage > > > > > > ----------------------------- > > > > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader > > > > > > section!! > > > > > > > > > > > > [...] > > > > > > > > > > Applied to arm64 (for-next/fixes), thanks! > > > > > > > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > > > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > > > > > > > Hmm, this patch has caused a regression in the case that we fail to > > > > online a CPU because it has incompatible CPU features and so we park it > > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > > > core will never come online, but is being tracked by RCU. So I'm tempted > > > > to revert this and live with the lockdep warning while we figure out a > > > > proper fix. > > > > > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > > > invoke the full hotplug machinery here? Is it correct to call > > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > > > CPU doing cpu_up(), or should we do something else? > > > It looks to me that rcu_report_dead() does the opposite of > > > rcu_cpu_starting(), > > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to > > > rewind, > > > Paul? > > > > Yes, rcu_report_dead() should do the trick. Presumably the earlier > > online-time CPU-hotplug notifiers are also unwound? > I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL, > and then __cpu_up() will see a timeout waiting for the AP online and then deal > with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see > anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y. If this works for the ARM folks, it seems like a reasonable approach to me. I cannot reasonably test this because not only do I not have an ARM system, I don't have a system on which a kernel can be built with CONFIG_HOTPLUG_CPU=n, so I must rely on others' testing. Thanx, Paul > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 09c96f57818c..10729d2d6084 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -421,6 +421,8 @@ void cpu_die_early(void) > > update_cpu_boot_status(CPU_STUCK_IN_KERNEL); > > + rcu_report_dead(cpu); > + > cpu_park_loop(); > } > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 2a52f42f64b6..bd04b09b84b3 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu) > smp_mb(); /* Ensure RCU read-side usage follows above initialization. */ > } > > -#ifdef CONFIG_HOTPLUG_CPU > /* > * The outgoing function has no further need of RCU, so remove it from > * the rcu_node tree's ->qsmaskinitnext bit masks. > @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu) > rdp->cpu_started = false; > } > > +#ifdef CONFIG_HOTPLUG_CPU > /* > * The outgoing CPU has just passed through the dying-idle state, and we > * are being invoked from the CPU that was IPIed to continue the offline > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-11-06 2:15 ` Qian Cai 2020-11-06 4:07 ` Paul E. McKenney @ 2020-11-06 10:37 ` Will Deacon 2020-11-06 12:48 ` Qian Cai 1 sibling, 1 reply; 14+ messages in thread From: Will Deacon @ 2020-11-06 10:37 UTC (permalink / raw) To: Qian Cai Cc: paulmck, catalin.marinas, kernel-team, Peter Zijlstra, linux-kernel, linux-arm-kernel On Thu, Nov 05, 2020 at 09:15:24PM -0500, Qian Cai wrote: > On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote: > > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > > > Hmm, this patch has caused a regression in the case that we fail to > > > > online a CPU because it has incompatible CPU features and so we park it > > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > > > core will never come online, but is being tracked by RCU. So I'm tempted > > > > to revert this and live with the lockdep warning while we figure out a > > > > proper fix. > > > > > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > > > invoke the full hotplug machinery here? Is it correct to call > > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > > > CPU doing cpu_up(), or should we do something else? > > > It looks to me that rcu_report_dead() does the opposite of > > > rcu_cpu_starting(), > > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to > > > rewind, > > > Paul? > > > > Yes, rcu_report_dead() should do the trick. Presumably the earlier > > online-time CPU-hotplug notifiers are also unwound? > I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL, > and then __cpu_up() will see a timeout waiting for the AP online and then deal > with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see > anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y. Cheers both for suggesting rcu_report_dead(). > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 09c96f57818c..10729d2d6084 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -421,6 +421,8 @@ void cpu_die_early(void) > > update_cpu_boot_status(CPU_STUCK_IN_KERNEL); > > + rcu_report_dead(cpu); I think this is in the wrong place, see: https://lore.kernel.org/r/20201106103602.9849-1-will@kernel.org which seems to fix the problem for me. Will ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier 2020-11-06 10:37 ` Will Deacon @ 2020-11-06 12:48 ` Qian Cai 0 siblings, 0 replies; 14+ messages in thread From: Qian Cai @ 2020-11-06 12:48 UTC (permalink / raw) To: Will Deacon Cc: paulmck, catalin.marinas, kernel-team, Peter Zijlstra, linux-kernel, linux-arm-kernel On Fri, 2020-11-06 at 10:37 +0000, Will Deacon wrote: > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index 09c96f57818c..10729d2d6084 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -421,6 +421,8 @@ void cpu_die_early(void) > > > > update_cpu_boot_status(CPU_STUCK_IN_KERNEL); > > > > + rcu_report_dead(cpu); > > I think this is in the wrong place, see: > > https://lore.kernel.org/r/20201106103602.9849-1-will@kernel.org > > which seems to fix the problem for me. Ah, I had not realized that cpu_psci_cpu_die() could no return. Your patchset looks good to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-11-06 12:49 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-28 18:26 [PATCH] arm64/smp: Move rcu_cpu_starting() earlier Qian Cai 2020-10-28 21:00 ` Paul E. McKenney 2020-10-29 9:10 ` Will Deacon 2020-10-29 13:17 ` Qian Cai 2020-10-30 8:15 ` Will Deacon 2020-10-29 14:09 ` Paul E. McKenney 2020-10-30 16:33 ` Will Deacon 2020-11-05 22:22 ` Will Deacon 2020-11-05 23:02 ` Qian Cai 2020-11-05 23:28 ` Paul E. McKenney 2020-11-06 2:15 ` Qian Cai 2020-11-06 4:07 ` Paul E. McKenney 2020-11-06 10:37 ` Will Deacon 2020-11-06 12:48 ` Qian Cai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).