linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpuidle: psci: Fix suspicious RCU usage
@ 2020-09-04  6:47 Ulf Hansson
  2020-09-04 14:13 ` Paul E. McKenney
  2020-09-21 13:59 ` Rafael J. Wysocki
  0 siblings, 2 replies; 5+ messages in thread
From: Ulf Hansson @ 2020-09-04  6:47 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J . Wysocki, Paul E . McKenney, linux-pm
  Cc: Thomas Gleixner, Steven Rostedt, Sudeep Holla, Lorenzo Pieralisi,
	Daniel Lezcano, Lina Iyer, Vincent Guittot, Stephen Boyd,
	Bjorn Andersson, Ulf Hansson, linux-arm-kernel, Naresh Kamboju

The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
expose us for tracepoints. This lead to the following RCU splat on an ARM64
Qcom board.

[    5.529634] WARNING: suspicious RCU usage
[    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
[    5.541092] 5.9.0-rc3 #86 Not tainted
[    5.541098] -----------------------------
[    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
[    5.541110]
[    5.541110] other info that might help us debug this:
[    5.541110]
[    5.541116]
[    5.541116] rcu_scheduler_active = 2, debug_locks = 1
[    5.541122] RCU used illegally from extended quiescent state!
[    5.541129] no locks held by swapper/0/0.
[    5.541134]
[    5.541134] stack backtrace:
[    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
[    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[    5.541157] Call trace:
[    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
[    5.574186]  dump_backtrace+0x0/0x1c8
[    5.574206]  show_stack+0x14/0x20
[    5.574229]  dump_stack+0xe8/0x154
[    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8
[    5.574269]  lock_acquire+0x3f0/0x460
[    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0
[    5.574314]  __pm_runtime_suspend+0x4c/0x188
[    5.574341]  psci_enter_domain_idle_state+0x40/0xa0
[    5.574362]  cpuidle_enter_state+0xc0/0x610
[    5.646487]  cpuidle_enter+0x38/0x50
[    5.650651]  call_cpuidle+0x18/0x40
[    5.654467]  do_idle+0x228/0x278
[    5.657678]  cpu_startup_entry+0x24/0x70
[    5.661153]  rest_init+0x1a4/0x278
[    5.665061]  arch_call_rest_init+0xc/0x14
[    5.668272]  start_kernel+0x508/0x540

Following the path in pm_runtime_put_sync_suspend() from
psci_enter_domain_idle_state(), it seems like we end up using the RCU.
Therefore, let's simply silence the splat by informing the RCU about it
with RCU_NONIDLE.

Note that, this is a temporary solution. Instead we should strive to avoid
using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
further down, closer to the arch specific code. However, as the CPU PM
notifiers are also using the RCU, additional rework is needed.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/cpuidle/cpuidle-psci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 74463841805f..d928b37718bd 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
 		return -1;
 
 	/* Do runtime PM to manage a hierarchical CPU toplogy. */
-	pm_runtime_put_sync_suspend(pd_dev);
+	RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
 
 	state = psci_get_domain_state();
 	if (!state)
@@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
 
 	ret = psci_cpu_suspend_enter(state) ? -1 : idx;
 
-	pm_runtime_get_sync(pd_dev);
+	RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
 
 	cpu_pm_exit();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpuidle: psci: Fix suspicious RCU usage
  2020-09-04  6:47 [PATCH] cpuidle: psci: Fix suspicious RCU usage Ulf Hansson
@ 2020-09-04 14:13 ` Paul E. McKenney
  2020-09-15 21:17   ` Ulf Hansson
  2020-09-21 13:59 ` Rafael J. Wysocki
  1 sibling, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2020-09-04 14:13 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Peter Zijlstra, Rafael J . Wysocki, linux-pm, Thomas Gleixner,
	Steven Rostedt, Sudeep Holla, Lorenzo Pieralisi, Daniel Lezcano,
	Lina Iyer, Vincent Guittot, Stephen Boyd, Bjorn Andersson,
	linux-arm-kernel, Naresh Kamboju

On Fri, Sep 04, 2020 at 08:47:05AM +0200, Ulf Hansson wrote:
> The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
> expose us for tracepoints. This lead to the following RCU splat on an ARM64
> Qcom board.
> 
> [    5.529634] WARNING: suspicious RCU usage
> [    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
> [    5.541092] 5.9.0-rc3 #86 Not tainted
> [    5.541098] -----------------------------
> [    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> [    5.541110]
> [    5.541110] other info that might help us debug this:
> [    5.541110]
> [    5.541116]
> [    5.541116] rcu_scheduler_active = 2, debug_locks = 1
> [    5.541122] RCU used illegally from extended quiescent state!
> [    5.541129] no locks held by swapper/0/0.
> [    5.541134]
> [    5.541134] stack backtrace:
> [    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
> [    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [    5.541157] Call trace:
> [    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
> [    5.574186]  dump_backtrace+0x0/0x1c8
> [    5.574206]  show_stack+0x14/0x20
> [    5.574229]  dump_stack+0xe8/0x154
> [    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8
> [    5.574269]  lock_acquire+0x3f0/0x460
> [    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0
> [    5.574314]  __pm_runtime_suspend+0x4c/0x188
> [    5.574341]  psci_enter_domain_idle_state+0x40/0xa0
> [    5.574362]  cpuidle_enter_state+0xc0/0x610
> [    5.646487]  cpuidle_enter+0x38/0x50
> [    5.650651]  call_cpuidle+0x18/0x40
> [    5.654467]  do_idle+0x228/0x278
> [    5.657678]  cpu_startup_entry+0x24/0x70
> [    5.661153]  rest_init+0x1a4/0x278
> [    5.665061]  arch_call_rest_init+0xc/0x14
> [    5.668272]  start_kernel+0x508/0x540
> 
> Following the path in pm_runtime_put_sync_suspend() from
> psci_enter_domain_idle_state(), it seems like we end up using the RCU.
> Therefore, let's simply silence the splat by informing the RCU about it
> with RCU_NONIDLE.
> 
> Note that, this is a temporary solution. Instead we should strive to avoid
> using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
> further down, closer to the arch specific code. However, as the CPU PM
> notifiers are also using the RCU, additional rework is needed.
> 
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 74463841805f..d928b37718bd 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
>  		return -1;
>  
>  	/* Do runtime PM to manage a hierarchical CPU toplogy. */
> -	pm_runtime_put_sync_suspend(pd_dev);
> +	RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
>  
>  	state = psci_get_domain_state();
>  	if (!state)
> @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
>  
>  	ret = psci_cpu_suspend_enter(state) ? -1 : idx;
>  
> -	pm_runtime_get_sync(pd_dev);
> +	RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
>  
>  	cpu_pm_exit();
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpuidle: psci: Fix suspicious RCU usage
  2020-09-04 14:13 ` Paul E. McKenney
@ 2020-09-15 21:17   ` Ulf Hansson
  2020-09-16  8:36     ` peterz
  0 siblings, 1 reply; 5+ messages in thread
From: Ulf Hansson @ 2020-09-15 21:17 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J . Wysocki
  Cc: Paul E. McKenney, Linux PM, Thomas Gleixner, Steven Rostedt,
	Sudeep Holla, Lorenzo Pieralisi, Daniel Lezcano, Lina Iyer,
	Vincent Guittot, Stephen Boyd, Bjorn Andersson, Linux ARM,
	Naresh Kamboju

On Fri, 4 Sep 2020 at 16:13, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Sep 04, 2020 at 08:47:05AM +0200, Ulf Hansson wrote:
> > The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
> > expose us for tracepoints. This lead to the following RCU splat on an ARM64
> > Qcom board.
> >
> > [    5.529634] WARNING: suspicious RCU usage
> > [    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
> > [    5.541092] 5.9.0-rc3 #86 Not tainted
> > [    5.541098] -----------------------------
> > [    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> > [    5.541110]
> > [    5.541110] other info that might help us debug this:
> > [    5.541110]
> > [    5.541116]
> > [    5.541116] rcu_scheduler_active = 2, debug_locks = 1
> > [    5.541122] RCU used illegally from extended quiescent state!
> > [    5.541129] no locks held by swapper/0/0.
> > [    5.541134]
> > [    5.541134] stack backtrace:
> > [    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
> > [    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > [    5.541157] Call trace:
> > [    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
> > [    5.574186]  dump_backtrace+0x0/0x1c8
> > [    5.574206]  show_stack+0x14/0x20
> > [    5.574229]  dump_stack+0xe8/0x154
> > [    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8
> > [    5.574269]  lock_acquire+0x3f0/0x460
> > [    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0
> > [    5.574314]  __pm_runtime_suspend+0x4c/0x188
> > [    5.574341]  psci_enter_domain_idle_state+0x40/0xa0
> > [    5.574362]  cpuidle_enter_state+0xc0/0x610
> > [    5.646487]  cpuidle_enter+0x38/0x50
> > [    5.650651]  call_cpuidle+0x18/0x40
> > [    5.654467]  do_idle+0x228/0x278
> > [    5.657678]  cpu_startup_entry+0x24/0x70
> > [    5.661153]  rest_init+0x1a4/0x278
> > [    5.665061]  arch_call_rest_init+0xc/0x14
> > [    5.668272]  start_kernel+0x508/0x540
> >
> > Following the path in pm_runtime_put_sync_suspend() from
> > psci_enter_domain_idle_state(), it seems like we end up using the RCU.
> > Therefore, let's simply silence the splat by informing the RCU about it
> > with RCU_NONIDLE.
> >
> > Note that, this is a temporary solution. Instead we should strive to avoid
> > using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
> > further down, closer to the arch specific code. However, as the CPU PM
> > notifiers are also using the RCU, additional rework is needed.
> >
> > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> Acked-by: Paul E. McKenney <paulmck@kernel.org>

Peter, Rafael,

Is $subject patch okay as the short term solution? If so, I can
continue to look at the CPU PM notifiers, etc.

Kind regards
Uffe

>
> > ---
> >  drivers/cpuidle/cpuidle-psci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > index 74463841805f..d928b37718bd 100644
> > --- a/drivers/cpuidle/cpuidle-psci.c
> > +++ b/drivers/cpuidle/cpuidle-psci.c
> > @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> >               return -1;
> >
> >       /* Do runtime PM to manage a hierarchical CPU toplogy. */
> > -     pm_runtime_put_sync_suspend(pd_dev);
> > +     RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
> >
> >       state = psci_get_domain_state();
> >       if (!state)
> > @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> >
> >       ret = psci_cpu_suspend_enter(state) ? -1 : idx;
> >
> > -     pm_runtime_get_sync(pd_dev);
> > +     RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
> >
> >       cpu_pm_exit();
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpuidle: psci: Fix suspicious RCU usage
  2020-09-15 21:17   ` Ulf Hansson
@ 2020-09-16  8:36     ` peterz
  0 siblings, 0 replies; 5+ messages in thread
From: peterz @ 2020-09-16  8:36 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, Paul E. McKenney, Linux PM, Thomas Gleixner,
	Steven Rostedt, Sudeep Holla, Lorenzo Pieralisi, Daniel Lezcano,
	Lina Iyer, Vincent Guittot, Stephen Boyd, Bjorn Andersson,
	Linux ARM, Naresh Kamboju

On Tue, Sep 15, 2020 at 11:17:20PM +0200, Ulf Hansson wrote:
> Peter, Rafael,
> 
> Is $subject patch okay as the short term solution? If so, I can
> continue to look at the CPU PM notifiers, etc.

Yes, it's a proper band-aid :-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpuidle: psci: Fix suspicious RCU usage
  2020-09-04  6:47 [PATCH] cpuidle: psci: Fix suspicious RCU usage Ulf Hansson
  2020-09-04 14:13 ` Paul E. McKenney
@ 2020-09-21 13:59 ` Rafael J. Wysocki
  1 sibling, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2020-09-21 13:59 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Peter Zijlstra, Rafael J . Wysocki, Paul E . McKenney, Linux PM,
	Thomas Gleixner, Steven Rostedt, Sudeep Holla, Lorenzo Pieralisi,
	Daniel Lezcano, Lina Iyer, Vincent Guittot, Stephen Boyd,
	Bjorn Andersson, Linux ARM, Naresh Kamboju

On Fri, Sep 4, 2020 at 8:47 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
> expose us for tracepoints. This lead to the following RCU splat on an ARM64
> Qcom board.
>
> [    5.529634] WARNING: suspicious RCU usage
> [    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
> [    5.541092] 5.9.0-rc3 #86 Not tainted
> [    5.541098] -----------------------------
> [    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> [    5.541110]
> [    5.541110] other info that might help us debug this:
> [    5.541110]
> [    5.541116]
> [    5.541116] rcu_scheduler_active = 2, debug_locks = 1
> [    5.541122] RCU used illegally from extended quiescent state!
> [    5.541129] no locks held by swapper/0/0.
> [    5.541134]
> [    5.541134] stack backtrace:
> [    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
> [    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [    5.541157] Call trace:
> [    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
> [    5.574186]  dump_backtrace+0x0/0x1c8
> [    5.574206]  show_stack+0x14/0x20
> [    5.574229]  dump_stack+0xe8/0x154
> [    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8
> [    5.574269]  lock_acquire+0x3f0/0x460
> [    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0
> [    5.574314]  __pm_runtime_suspend+0x4c/0x188
> [    5.574341]  psci_enter_domain_idle_state+0x40/0xa0
> [    5.574362]  cpuidle_enter_state+0xc0/0x610
> [    5.646487]  cpuidle_enter+0x38/0x50
> [    5.650651]  call_cpuidle+0x18/0x40
> [    5.654467]  do_idle+0x228/0x278
> [    5.657678]  cpu_startup_entry+0x24/0x70
> [    5.661153]  rest_init+0x1a4/0x278
> [    5.665061]  arch_call_rest_init+0xc/0x14
> [    5.668272]  start_kernel+0x508/0x540
>
> Following the path in pm_runtime_put_sync_suspend() from
> psci_enter_domain_idle_state(), it seems like we end up using the RCU.
> Therefore, let's simply silence the splat by informing the RCU about it
> with RCU_NONIDLE.
>
> Note that, this is a temporary solution. Instead we should strive to avoid
> using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
> further down, closer to the arch specific code. However, as the CPU PM
> notifiers are also using the RCU, additional rework is needed.
>
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

I missed this one before, sorry about that.

Now applied as 5.9-rc6 material with the Paul's ACK, thanks!

> ---
>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 74463841805f..d928b37718bd 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
>                 return -1;
>
>         /* Do runtime PM to manage a hierarchical CPU toplogy. */
> -       pm_runtime_put_sync_suspend(pd_dev);
> +       RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
>
>         state = psci_get_domain_state();
>         if (!state)
> @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
>
>         ret = psci_cpu_suspend_enter(state) ? -1 : idx;
>
> -       pm_runtime_get_sync(pd_dev);
> +       RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
>
>         cpu_pm_exit();
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-21 13:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04  6:47 [PATCH] cpuidle: psci: Fix suspicious RCU usage Ulf Hansson
2020-09-04 14:13 ` Paul E. McKenney
2020-09-15 21:17   ` Ulf Hansson
2020-09-16  8:36     ` peterz
2020-09-21 13:59 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).