linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter()
@ 2020-12-22  1:37 Frederic Weisbecker
  2020-12-22  1:37 ` [PATCH 1/4] sched/idle: Fix missing need_resched() check " Frederic Weisbecker
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2020-12-22  1:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Peter Zijlstra,
	Ingo Molnar, Fabio Estevam, stable, Thomas Gleixner,
	Paul E . McKenney, Len Brown, Pengutronix Kernel Team,
	NXP Linux Team, Daniel Lezcano, Shawn Guo, Sascha Hauer

With Paul, we've been thinking that the idle loop wasn't twisted enough
yet to deserve 2020.

rcutorture, after some recent parameter changes, has been complaining
about a hung task.

It appears that rcu_idle_enter() may wake up a NOCB kthread but this
happens after the last generic need_resched() check. Some cpuidle drivers
fix it by chance but many others don't.

Here is a proposed bunch of fixes. I will need to also fix the
rcu_user_enter() case, likely using irq_work, since nohz_full requires
irq work to support self IPI.

Also more generally, this raise the question of local task wake_up()
under disabled interrupts. When a wake up occurs in a preempt disabled
section, it gets handled by the outer preempt_enable() call. There is no
similar mechanism when a wake up occurs with interrupts disabled. I guess
it is assumed to be handled, at worst, in the next tick. But a local irq
work would provide instant preemption once IRQs are re-enabled. Of course
this would only make sense in CONFIG_PREEMPTION, and when the tick is
disabled...

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	sched/idle

HEAD: f2fa6e4a070c1535b9edc9ee097167fd2b15d235

Thanks,
	Frederic
---

Frederic Weisbecker (4):
      sched/idle: Fix missing need_resched() check after rcu_idle_enter()
      cpuidle: Fix missing need_resched() check after rcu_idle_enter()
      ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter()
      ACPI: processor: Fix missing need_resched() check after rcu_idle_enter()


 arch/arm/mach-imx/cpuidle-imx6q.c |  7 ++++++-
 drivers/acpi/processor_idle.c     | 10 ++++++++--
 drivers/cpuidle/cpuidle.c         | 33 +++++++++++++++++++++++++--------
 kernel/sched/idle.c               | 18 ++++++++++++------
 4 files changed, 51 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] sched/idle: Fix missing need_resched() check after rcu_idle_enter()
  2020-12-22  1:37 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() Frederic Weisbecker
@ 2020-12-22  1:37 ` Frederic Weisbecker
  2020-12-22  4:20   ` Paul E. McKenney
  2020-12-22  1:37 ` [PATCH 2/4] cpuidle: " Frederic Weisbecker
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2020-12-22  1:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Peter Zijlstra,
	Ingo Molnar, Fabio Estevam, stable, Thomas Gleixner,
	Paul E . McKenney, Len Brown, Pengutronix Kernel Team,
	NXP Linux Team, Daniel Lezcano, Shawn Guo, Sascha Hauer

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.

Unfortunately in default_idle_call(), the call to rcu_idle_enter() is
already beyond the last need_resched() check and we may halt the CPU
with a resched request unhandled, leaving the task hanging.

Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf)
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar<mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/sched/idle.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 305727ea0677..1af60dc50beb 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -109,15 +109,21 @@ void __cpuidle default_idle_call(void)
 		rcu_idle_enter();
 		lockdep_hardirqs_on(_THIS_IP_);
 
-		arch_cpu_idle();
+		/*
+		 * Last need_resched() check must come after rcu_idle_enter()
+		 * which may wake up RCU internal tasks.
+		 */
+		if (!need_resched()) {
+			arch_cpu_idle();
+			raw_local_irq_disable();
+		}
 
 		/*
-		 * OK, so IRQs are enabled here, but RCU needs them disabled to
-		 * turn itself back on.. funny thing is that disabling IRQs
-		 * will cause tracing, which needs RCU. Jump through hoops to
-		 * make it 'work'.
+		 * OK, so IRQs are enabled after arch_cpu_idle(), but RCU needs
+		 * them disabled to turn itself back on.. funny thing is that
+		 * disabling IRQs will cause tracing, which needs RCU. Jump through
+		 * hoops to make it 'work'.
 		 */
-		raw_local_irq_disable();
 		lockdep_hardirqs_off(_THIS_IP_);
 		rcu_idle_exit();
 		lockdep_hardirqs_on(_THIS_IP_);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] cpuidle: Fix missing need_resched() check after rcu_idle_enter()
  2020-12-22  1:37 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() Frederic Weisbecker
  2020-12-22  1:37 ` [PATCH 1/4] sched/idle: Fix missing need_resched() check " Frederic Weisbecker
@ 2020-12-22  1:37 ` Frederic Weisbecker
  2021-01-05 17:25   ` Dmitry Osipenko
  2020-12-22  1:37 ` [PATCH 3/4] ARM: imx6q: " Frederic Weisbecker
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2020-12-22  1:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Peter Zijlstra,
	Ingo Molnar, Fabio Estevam, stable, Thomas Gleixner,
	Paul E . McKenney, Len Brown, Pengutronix Kernel Team,
	NXP Linux Team, Daniel Lezcano, Shawn Guo, Sascha Hauer

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.

Unfortunately within cpuidle the call to rcu_idle_enter() is already
beyond the last generic need_resched() check. Some drivers may perform
their own checks like with mwait_idle_with_hints() but many others don't
and we may halt the CPU with a resched request unhandled, leaving the
task hanging.

Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 1098582a0f6c (sched,idle,rcu: Push rcu_idle deeper into the idle path)
Cc: stable@vger.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/cpuidle/cpuidle.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ef2ea1b12cd8..4cc1ba49ce05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -134,8 +134,8 @@ int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
 }
 
 #ifdef CONFIG_SUSPEND
-static void enter_s2idle_proper(struct cpuidle_driver *drv,
-				struct cpuidle_device *dev, int index)
+static int enter_s2idle_proper(struct cpuidle_driver *drv,
+			       struct cpuidle_device *dev, int index)
 {
 	ktime_t time_start, time_end;
 	struct cpuidle_state *target_state = &drv->states[index];
@@ -151,7 +151,14 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
 		rcu_idle_enter();
-	target_state->enter_s2idle(dev, drv, index);
+	/*
+	 * Last need_resched() check must come after rcu_idle_enter()
+	 * which may wake up RCU internal tasks.
+	 */
+	if (!need_resched())
+		target_state->enter_s2idle(dev, drv, index);
+	else
+		index = -EBUSY;
 	if (WARN_ON_ONCE(!irqs_disabled()))
 		local_irq_disable();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
@@ -159,10 +166,13 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
 	tick_unfreeze();
 	start_critical_timings();
 
-	time_end = ns_to_ktime(local_clock());
+	if (index > 0) {
+		time_end = ns_to_ktime(local_clock());
+		dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
+		dev->states_usage[index].s2idle_usage++;
+	}
 
-	dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
-	dev->states_usage[index].s2idle_usage++;
+	return index;
 }
 
 /**
@@ -184,7 +194,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 */
 	index = find_deepest_state(drv, dev, U64_MAX, 0, true);
 	if (index > 0) {
-		enter_s2idle_proper(drv, dev, index);
+		index = enter_s2idle_proper(drv, dev, index);
 		local_irq_enable();
 	}
 	return index;
@@ -234,7 +244,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
 		rcu_idle_enter();
-	entered_state = target_state->enter(dev, drv, index);
+	/*
+	 * Last need_resched() check must come after rcu_idle_enter()
+	 * which may wake up RCU internal tasks.
+	 */
+	if (!need_resched())
+		entered_state = target_state->enter(dev, drv, index);
+	else
+		entered_state = -EBUSY;
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
 		rcu_idle_exit();
 	start_critical_timings();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter()
  2020-12-22  1:37 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() Frederic Weisbecker
  2020-12-22  1:37 ` [PATCH 1/4] sched/idle: Fix missing need_resched() check " Frederic Weisbecker
  2020-12-22  1:37 ` [PATCH 2/4] cpuidle: " Frederic Weisbecker
@ 2020-12-22  1:37 ` Frederic Weisbecker
  2020-12-22  1:37 ` [PATCH 4/4] ACPI: processor: " Frederic Weisbecker
  2020-12-22 16:19 ` [PATCH 0/4] sched/idle: Fix missing need_resched() checks " Rafael J. Wysocki
  4 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2020-12-22  1:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Peter Zijlstra,
	Ingo Molnar, Fabio Estevam, stable, Thomas Gleixner,
	Paul E . McKenney, Len Brown, Pengutronix Kernel Team,
	NXP Linux Team, Daniel Lezcano, Shawn Guo, Sascha Hauer

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.

Unfortunately imx6q_enter_wait() is beyond the last generic
need_resched() check and it performs a wfi right away after the call to
rcu_idle_enter(). We may halt the CPU with a resched request unhandled,
leaving the task hanging.

Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 1a67b9263e06 (ARM: imx6q: Fixup RCU usage for cpuidle)
Cc: stable@vger.kernel.org
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/arm/mach-imx/cpuidle-imx6q.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-imx/cpuidle-imx6q.c b/arch/arm/mach-imx/cpuidle-imx6q.c
index 094337dc1bc7..31a60d257d3d 100644
--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -25,7 +25,12 @@ static int imx6q_enter_wait(struct cpuidle_device *dev,
 	raw_spin_unlock(&cpuidle_lock);
 
 	rcu_idle_enter();
-	cpu_do_idle();
+	/*
+	 * Last need_resched() check must come after rcu_idle_enter()
+	 * which may wake up RCU internal tasks.
+	 */
+	if (!need_resched())
+		cpu_do_idle();
 	rcu_idle_exit();
 
 	raw_spin_lock(&cpuidle_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ACPI: processor: Fix missing need_resched() check after rcu_idle_enter()
  2020-12-22  1:37 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2020-12-22  1:37 ` [PATCH 3/4] ARM: imx6q: " Frederic Weisbecker
@ 2020-12-22  1:37 ` Frederic Weisbecker
  2020-12-22 16:19 ` [PATCH 0/4] sched/idle: Fix missing need_resched() checks " Rafael J. Wysocki
  4 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2020-12-22  1:37 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Peter Zijlstra,
	Ingo Molnar, Fabio Estevam, stable, Thomas Gleixner,
	Paul E . McKenney, Len Brown, Pengutronix Kernel Team,
	NXP Linux Team, Daniel Lezcano, Shawn Guo, Sascha Hauer

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.

Unfortunately within acpi_idle_enter_bm() the call to rcu_idle_enter()
is already beyond the last generic need_resched() check. The cpu idle
implementation happens to be ok because it ends up calling
mwait_idle_with_hints() or acpi_safe_halt() which both perform their own
need_resched() checks. But the suspend to idle implementation doesn't so
it may suspend the CPU with a resched request unhandled, leaving the
task hanging.

Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 1fecfdbb7acc (ACPI: processor: Take over RCU-idle for C3-BM idle)
Cc: stable@vger.kernel.org
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar<mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/acpi/processor_idle.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index d93e400940a3..c4939c49d972 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -604,8 +604,14 @@ static int acpi_idle_enter_bm(struct cpuidle_driver *drv,
 	}
 
 	rcu_idle_enter();
-
-	acpi_idle_do_entry(cx);
+	/*
+	 * Last need_resched() check must come after rcu_idle_enter()
+	 * which may wake up RCU internal tasks. mwait_idle_with_hints()
+	 * and acpi_safe_halt() have their own checks but s2idle
+	 * implementation doesn't.
+	 */
+	if (!need_resched())
+		acpi_idle_do_entry(cx);
 
 	rcu_idle_exit();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] sched/idle: Fix missing need_resched() check after rcu_idle_enter()
  2020-12-22  1:37 ` [PATCH 1/4] sched/idle: Fix missing need_resched() check " Frederic Weisbecker
@ 2020-12-22  4:20   ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2020-12-22  4:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Rafael J . Wysocki, Peter Zijlstra, Ingo Molnar,
	Fabio Estevam, stable, Thomas Gleixner, Len Brown,
	Pengutronix Kernel Team, NXP Linux Team, Daniel Lezcano,
	Shawn Guo, Sascha Hauer

On Tue, Dec 22, 2020 at 02:37:09AM +0100, Frederic Weisbecker wrote:
> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> kthread (rcuog) to be serviced.
> 
> Usually a wake up happening while running the idle task is spotted in
> one of the need_resched() checks carefully placed within the idle loop
> that can break to the scheduler.
> 
> Unfortunately in default_idle_call(), the call to rcu_idle_enter() is
> already beyond the last need_resched() check and we may halt the CPU
> with a resched request unhandled, leaving the task hanging.
> 
> Fix this with performing a last minute need_resched() check after
> calling rcu_idle_enter().
> 
> Reported-by: Paul E. McKenney <paulmck@kernel.org>
> Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf)
> Cc: stable@vger.kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar<mingo@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Tested-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/sched/idle.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 305727ea0677..1af60dc50beb 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -109,15 +109,21 @@ void __cpuidle default_idle_call(void)
>  		rcu_idle_enter();
>  		lockdep_hardirqs_on(_THIS_IP_);
>  
> -		arch_cpu_idle();
> +		/*
> +		 * Last need_resched() check must come after rcu_idle_enter()
> +		 * which may wake up RCU internal tasks.
> +		 */
> +		if (!need_resched()) {
> +			arch_cpu_idle();
> +			raw_local_irq_disable();
> +		}
>  
>  		/*
> -		 * OK, so IRQs are enabled here, but RCU needs them disabled to
> -		 * turn itself back on.. funny thing is that disabling IRQs
> -		 * will cause tracing, which needs RCU. Jump through hoops to
> -		 * make it 'work'.
> +		 * OK, so IRQs are enabled after arch_cpu_idle(), but RCU needs
> +		 * them disabled to turn itself back on.. funny thing is that
> +		 * disabling IRQs will cause tracing, which needs RCU. Jump through
> +		 * hoops to make it 'work'.
>  		 */
> -		raw_local_irq_disable();
>  		lockdep_hardirqs_off(_THIS_IP_);
>  		rcu_idle_exit();
>  		lockdep_hardirqs_on(_THIS_IP_);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter()
  2020-12-22  1:37 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2020-12-22  1:37 ` [PATCH 4/4] ACPI: processor: " Frederic Weisbecker
@ 2020-12-22 16:19 ` Rafael J. Wysocki
  2021-01-01 16:05   ` Paul E. McKenney
  4 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2020-12-22 16:19 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Fabio Estevam, stable,
	Thomas Gleixner, Paul E . McKenney, Len Brown,
	Pengutronix Kernel Team, NXP Linux Team, Daniel Lezcano,
	Shawn Guo, Sascha Hauer, Linux PM

On 12/22/2020 2:37 AM, Frederic Weisbecker wrote:
> With Paul, we've been thinking that the idle loop wasn't twisted enough
> yet to deserve 2020.
>
> rcutorture, after some recent parameter changes, has been complaining
> about a hung task.
>
> It appears that rcu_idle_enter() may wake up a NOCB kthread but this
> happens after the last generic need_resched() check. Some cpuidle drivers
> fix it by chance but many others don't.
>
> Here is a proposed bunch of fixes. I will need to also fix the
> rcu_user_enter() case, likely using irq_work, since nohz_full requires
> irq work to support self IPI.
>
> Also more generally, this raise the question of local task wake_up()
> under disabled interrupts. When a wake up occurs in a preempt disabled
> section, it gets handled by the outer preempt_enable() call. There is no
> similar mechanism when a wake up occurs with interrupts disabled. I guess
> it is assumed to be handled, at worst, in the next tick. But a local irq
> work would provide instant preemption once IRQs are re-enabled. Of course
> this would only make sense in CONFIG_PREEMPTION, and when the tick is
> disabled...
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	sched/idle
>
> HEAD: f2fa6e4a070c1535b9edc9ee097167fd2b15d235
>
> Thanks,
> 	Frederic
> ---
>
> Frederic Weisbecker (4):
>        sched/idle: Fix missing need_resched() check after rcu_idle_enter()
>        cpuidle: Fix missing need_resched() check after rcu_idle_enter()
>        ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter()
>        ACPI: processor: Fix missing need_resched() check after rcu_idle_enter()
>
>
>   arch/arm/mach-imx/cpuidle-imx6q.c |  7 ++++++-
>   drivers/acpi/processor_idle.c     | 10 ++++++++--
>   drivers/cpuidle/cpuidle.c         | 33 +++++++++++++++++++++++++--------
>   kernel/sched/idle.c               | 18 ++++++++++++------
>   4 files changed, 51 insertions(+), 17 deletions(-)

Please feel free to add

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

to all patches in the series.

Thanks!



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter()
  2020-12-22 16:19 ` [PATCH 0/4] sched/idle: Fix missing need_resched() checks " Rafael J. Wysocki
@ 2021-01-01 16:05   ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2021-01-01 16:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Frederic Weisbecker, LKML, Peter Zijlstra, Ingo Molnar,
	Fabio Estevam, stable, Thomas Gleixner, Len Brown,
	Pengutronix Kernel Team, NXP Linux Team, Daniel Lezcano,
	Shawn Guo, Sascha Hauer, Linux PM

On Tue, Dec 22, 2020 at 05:19:51PM +0100, Rafael J. Wysocki wrote:
> On 12/22/2020 2:37 AM, Frederic Weisbecker wrote:
> > With Paul, we've been thinking that the idle loop wasn't twisted enough
> > yet to deserve 2020.
> > 
> > rcutorture, after some recent parameter changes, has been complaining
> > about a hung task.
> > 
> > It appears that rcu_idle_enter() may wake up a NOCB kthread but this
> > happens after the last generic need_resched() check. Some cpuidle drivers
> > fix it by chance but many others don't.
> > 
> > Here is a proposed bunch of fixes. I will need to also fix the
> > rcu_user_enter() case, likely using irq_work, since nohz_full requires
> > irq work to support self IPI.
> > 
> > Also more generally, this raise the question of local task wake_up()
> > under disabled interrupts. When a wake up occurs in a preempt disabled
> > section, it gets handled by the outer preempt_enable() call. There is no
> > similar mechanism when a wake up occurs with interrupts disabled. I guess
> > it is assumed to be handled, at worst, in the next tick. But a local irq
> > work would provide instant preemption once IRQs are re-enabled. Of course
> > this would only make sense in CONFIG_PREEMPTION, and when the tick is
> > disabled...
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	sched/idle
> > 
> > HEAD: f2fa6e4a070c1535b9edc9ee097167fd2b15d235
> > 
> > Thanks,
> > 	Frederic
> > ---
> > 
> > Frederic Weisbecker (4):
> >        sched/idle: Fix missing need_resched() check after rcu_idle_enter()
> >        cpuidle: Fix missing need_resched() check after rcu_idle_enter()
> >        ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter()
> >        ACPI: processor: Fix missing need_resched() check after rcu_idle_enter()
> > 
> > 
> >   arch/arm/mach-imx/cpuidle-imx6q.c |  7 ++++++-
> >   drivers/acpi/processor_idle.c     | 10 ++++++++--
> >   drivers/cpuidle/cpuidle.c         | 33 +++++++++++++++++++++++++--------
> >   kernel/sched/idle.c               | 18 ++++++++++++------
> >   4 files changed, 51 insertions(+), 17 deletions(-)
> 
> Please feel free to add
> 
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> to all patches in the series.

I would guess that they will take some other path to mainline, but I have
queued these to cut down on rcutorture's whining.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] cpuidle: Fix missing need_resched() check after rcu_idle_enter()
  2020-12-22  1:37 ` [PATCH 2/4] cpuidle: " Frederic Weisbecker
@ 2021-01-05 17:25   ` Dmitry Osipenko
  2021-01-05 18:10     ` Dmitry Osipenko
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Osipenko @ 2021-01-05 17:25 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Rafael J . Wysocki, Peter Zijlstra, Ingo Molnar, Fabio Estevam,
	stable, Thomas Gleixner, Paul E . McKenney, Len Brown,
	Pengutronix Kernel Team, NXP Linux Team, Daniel Lezcano,
	Shawn Guo, Sascha Hauer, linux-tegra

22.12.2020 04:37, Frederic Weisbecker пишет:
> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> kthread (rcuog) to be serviced.
> 
> Usually a wake up happening while running the idle task is spotted in
> one of the need_resched() checks carefully placed within the idle loop
> that can break to the scheduler.
> 
> Unfortunately within cpuidle the call to rcu_idle_enter() is already
> beyond the last generic need_resched() check. Some drivers may perform
> their own checks like with mwait_idle_with_hints() but many others don't
> and we may halt the CPU with a resched request unhandled, leaving the
> task hanging.
> 
> Fix this with performing a last minute need_resched() check after
> calling rcu_idle_enter().
> 
> Reported-by: Paul E. McKenney <paulmck@kernel.org>
> Fixes: 1098582a0f6c (sched,idle,rcu: Push rcu_idle deeper into the idle path)
> Cc: stable@vger.kernel.org
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  drivers/cpuidle/cpuidle.c | 33 +++++++++++++++++++++++++--------
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index ef2ea1b12cd8..4cc1ba49ce05 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -134,8 +134,8 @@ int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
>  }
>  
>  #ifdef CONFIG_SUSPEND
> -static void enter_s2idle_proper(struct cpuidle_driver *drv,
> -				struct cpuidle_device *dev, int index)
> +static int enter_s2idle_proper(struct cpuidle_driver *drv,
> +			       struct cpuidle_device *dev, int index)
>  {
>  	ktime_t time_start, time_end;
>  	struct cpuidle_state *target_state = &drv->states[index];
> @@ -151,7 +151,14 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
>  	stop_critical_timings();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>  		rcu_idle_enter();
> -	target_state->enter_s2idle(dev, drv, index);
> +	/*
> +	 * Last need_resched() check must come after rcu_idle_enter()
> +	 * which may wake up RCU internal tasks.
> +	 */
> +	if (!need_resched())
> +		target_state->enter_s2idle(dev, drv, index);
> +	else
> +		index = -EBUSY;
>  	if (WARN_ON_ONCE(!irqs_disabled()))
>  		local_irq_disable();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> @@ -159,10 +166,13 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
>  	tick_unfreeze();
>  	start_critical_timings();
>  
> -	time_end = ns_to_ktime(local_clock());
> +	if (index > 0) {

index=0 is valid too

> +		time_end = ns_to_ktime(local_clock());
> +		dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
> +		dev->states_usage[index].s2idle_usage++;
> +	}
>  
> -	dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
> -	dev->states_usage[index].s2idle_usage++;
> +	return index;
>  }
>  
>  /**
> @@ -184,7 +194,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	 */
>  	index = find_deepest_state(drv, dev, U64_MAX, 0, true);
>  	if (index > 0) {
> -		enter_s2idle_proper(drv, dev, index);
> +		index = enter_s2idle_proper(drv, dev, index);
>  		local_irq_enable();
>  	}
>  	return index;
> @@ -234,7 +244,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	stop_critical_timings();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>  		rcu_idle_enter();
> -	entered_state = target_state->enter(dev, drv, index);
> +	/*
> +	 * Last need_resched() check must come after rcu_idle_enter()
> +	 * which may wake up RCU internal tasks.
> +	 */
> +	if (!need_resched())
> +		entered_state = target_state->enter(dev, drv, index);
> +	else
> +		entered_state = -EBUSY;
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>  		rcu_idle_exit();
>  	start_critical_timings();
> 

This patch causes a hardlock on NVIDIA Tegra using today's linux-next.
Disabling coupled idling state helps. Please fix thanks in advance.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] cpuidle: Fix missing need_resched() check after rcu_idle_enter()
  2021-01-05 17:25   ` Dmitry Osipenko
@ 2021-01-05 18:10     ` Dmitry Osipenko
  2021-01-05 19:17       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Osipenko @ 2021-01-05 18:10 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Rafael J . Wysocki, Peter Zijlstra, Ingo Molnar, Fabio Estevam,
	stable, Thomas Gleixner, Paul E . McKenney, Len Brown,
	Pengutronix Kernel Team, NXP Linux Team, Daniel Lezcano,
	Shawn Guo, Sascha Hauer, linux-tegra

05.01.2021 20:25, Dmitry Osipenko пишет:
> 22.12.2020 04:37, Frederic Weisbecker пишет:
>> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
>> kthread (rcuog) to be serviced.
>>
>> Usually a wake up happening while running the idle task is spotted in
>> one of the need_resched() checks carefully placed within the idle loop
>> that can break to the scheduler.
>>
>> Unfortunately within cpuidle the call to rcu_idle_enter() is already
>> beyond the last generic need_resched() check. Some drivers may perform
>> their own checks like with mwait_idle_with_hints() but many others don't
>> and we may halt the CPU with a resched request unhandled, leaving the
>> task hanging.
>>
>> Fix this with performing a last minute need_resched() check after
>> calling rcu_idle_enter().
>>
>> Reported-by: Paul E. McKenney <paulmck@kernel.org>
>> Fixes: 1098582a0f6c (sched,idle,rcu: Push rcu_idle deeper into the idle path)
>> Cc: stable@vger.kernel.org
>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>> ---
>>  drivers/cpuidle/cpuidle.c | 33 +++++++++++++++++++++++++--------
>>  1 file changed, 25 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index ef2ea1b12cd8..4cc1ba49ce05 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -134,8 +134,8 @@ int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
>>  }
>>  
>>  #ifdef CONFIG_SUSPEND
>> -static void enter_s2idle_proper(struct cpuidle_driver *drv,
>> -				struct cpuidle_device *dev, int index)
>> +static int enter_s2idle_proper(struct cpuidle_driver *drv,
>> +			       struct cpuidle_device *dev, int index)
>>  {
>>  	ktime_t time_start, time_end;
>>  	struct cpuidle_state *target_state = &drv->states[index];
>> @@ -151,7 +151,14 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
>>  	stop_critical_timings();
>>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>>  		rcu_idle_enter();
>> -	target_state->enter_s2idle(dev, drv, index);
>> +	/*
>> +	 * Last need_resched() check must come after rcu_idle_enter()
>> +	 * which may wake up RCU internal tasks.
>> +	 */
>> +	if (!need_resched())
>> +		target_state->enter_s2idle(dev, drv, index);
>> +	else
>> +		index = -EBUSY;
>>  	if (WARN_ON_ONCE(!irqs_disabled()))
>>  		local_irq_disable();
>>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>> @@ -159,10 +166,13 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
>>  	tick_unfreeze();
>>  	start_critical_timings();
>>  
>> -	time_end = ns_to_ktime(local_clock());
>> +	if (index > 0) {
> 
> index=0 is valid too
> 
>> +		time_end = ns_to_ktime(local_clock());
>> +		dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
>> +		dev->states_usage[index].s2idle_usage++;
>> +	}
>>  
>> -	dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
>> -	dev->states_usage[index].s2idle_usage++;
>> +	return index;
>>  }
>>  
>>  /**
>> @@ -184,7 +194,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  	 */
>>  	index = find_deepest_state(drv, dev, U64_MAX, 0, true);
>>  	if (index > 0) {
>> -		enter_s2idle_proper(drv, dev, index);
>> +		index = enter_s2idle_proper(drv, dev, index);
>>  		local_irq_enable();
>>  	}
>>  	return index;
>> @@ -234,7 +244,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>>  	stop_critical_timings();
>>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>>  		rcu_idle_enter();
>> -	entered_state = target_state->enter(dev, drv, index);
>> +	/*
>> +	 * Last need_resched() check must come after rcu_idle_enter()
>> +	 * which may wake up RCU internal tasks.
>> +	 */
>> +	if (!need_resched())
>> +		entered_state = target_state->enter(dev, drv, index);
>> +	else
>> +		entered_state = -EBUSY;
>>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>>  		rcu_idle_exit();
>>  	start_critical_timings();
>>
> 
> This patch causes a hardlock on NVIDIA Tegra using today's linux-next.
> Disabling coupled idling state helps. Please fix thanks in advance.
> 

This isn't a proper fix, but it works:

diff --git a/drivers/cpuidle/cpuidle-tegra.c
b/drivers/cpuidle/cpuidle-tegra.c
index 191966dc8d02..ecc5d9b31553 100644
--- a/drivers/cpuidle/cpuidle-tegra.c
+++ b/drivers/cpuidle/cpuidle-tegra.c
@@ -148,7 +148,7 @@ static int tegra_cpuidle_c7_enter(void)

 static int tegra_cpuidle_coupled_barrier(struct cpuidle_device *dev)
 {
-	if (tegra_pending_sgi()) {
+	if (tegra_pending_sgi() || need_resched()) {
 		/*
 		 * CPU got local interrupt that will be lost after GIC's
 		 * shutdown because GIC driver doesn't save/restore the
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 4cc1ba49ce05..2bc52ccc339b 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -248,7 +248,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
 	 * Last need_resched() check must come after rcu_idle_enter()
 	 * which may wake up RCU internal tasks.
 	 */
-	if (!need_resched())
+	if ((target_state->flags & CPUIDLE_FLAG_COUPLED) || !need_resched())
 		entered_state = target_state->enter(dev, drv, index);
 	else
 		entered_state = -EBUSY;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] cpuidle: Fix missing need_resched() check after rcu_idle_enter()
  2021-01-05 18:10     ` Dmitry Osipenko
@ 2021-01-05 19:17       ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2021-01-05 19:17 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Frederic Weisbecker, LKML, Rafael J . Wysocki, Peter Zijlstra,
	Ingo Molnar, Fabio Estevam, stable, Thomas Gleixner, Len Brown,
	Pengutronix Kernel Team, NXP Linux Team, Daniel Lezcano,
	Shawn Guo, Sascha Hauer, linux-tegra

On Tue, Jan 05, 2021 at 09:10:30PM +0300, Dmitry Osipenko wrote:
> 05.01.2021 20:25, Dmitry Osipenko пишет:
> > 22.12.2020 04:37, Frederic Weisbecker пишет:
> >> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> >> kthread (rcuog) to be serviced.
> >>
> >> Usually a wake up happening while running the idle task is spotted in
> >> one of the need_resched() checks carefully placed within the idle loop
> >> that can break to the scheduler.
> >>
> >> Unfortunately within cpuidle the call to rcu_idle_enter() is already
> >> beyond the last generic need_resched() check. Some drivers may perform
> >> their own checks like with mwait_idle_with_hints() but many others don't
> >> and we may halt the CPU with a resched request unhandled, leaving the
> >> task hanging.
> >>
> >> Fix this with performing a last minute need_resched() check after
> >> calling rcu_idle_enter().
> >>
> >> Reported-by: Paul E. McKenney <paulmck@kernel.org>
> >> Fixes: 1098582a0f6c (sched,idle,rcu: Push rcu_idle deeper into the idle path)
> >> Cc: stable@vger.kernel.org
> >> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Ingo Molnar <mingo@kernel.org>
> >> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> >> ---
> >>  drivers/cpuidle/cpuidle.c | 33 +++++++++++++++++++++++++--------
> >>  1 file changed, 25 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> >> index ef2ea1b12cd8..4cc1ba49ce05 100644
> >> --- a/drivers/cpuidle/cpuidle.c
> >> +++ b/drivers/cpuidle/cpuidle.c
> >> @@ -134,8 +134,8 @@ int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
> >>  }
> >>  
> >>  #ifdef CONFIG_SUSPEND
> >> -static void enter_s2idle_proper(struct cpuidle_driver *drv,
> >> -				struct cpuidle_device *dev, int index)
> >> +static int enter_s2idle_proper(struct cpuidle_driver *drv,
> >> +			       struct cpuidle_device *dev, int index)
> >>  {
> >>  	ktime_t time_start, time_end;
> >>  	struct cpuidle_state *target_state = &drv->states[index];
> >> @@ -151,7 +151,14 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
> >>  	stop_critical_timings();
> >>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> >>  		rcu_idle_enter();
> >> -	target_state->enter_s2idle(dev, drv, index);
> >> +	/*
> >> +	 * Last need_resched() check must come after rcu_idle_enter()
> >> +	 * which may wake up RCU internal tasks.
> >> +	 */
> >> +	if (!need_resched())
> >> +		target_state->enter_s2idle(dev, drv, index);
> >> +	else
> >> +		index = -EBUSY;
> >>  	if (WARN_ON_ONCE(!irqs_disabled()))
> >>  		local_irq_disable();
> >>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> >> @@ -159,10 +166,13 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
> >>  	tick_unfreeze();
> >>  	start_critical_timings();
> >>  
> >> -	time_end = ns_to_ktime(local_clock());
> >> +	if (index > 0) {
> > 
> > index=0 is valid too
> > 
> >> +		time_end = ns_to_ktime(local_clock());
> >> +		dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
> >> +		dev->states_usage[index].s2idle_usage++;
> >> +	}
> >>  
> >> -	dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
> >> -	dev->states_usage[index].s2idle_usage++;
> >> +	return index;
> >>  }
> >>  
> >>  /**
> >> @@ -184,7 +194,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> >>  	 */
> >>  	index = find_deepest_state(drv, dev, U64_MAX, 0, true);
> >>  	if (index > 0) {
> >> -		enter_s2idle_proper(drv, dev, index);
> >> +		index = enter_s2idle_proper(drv, dev, index);
> >>  		local_irq_enable();
> >>  	}
> >>  	return index;
> >> @@ -234,7 +244,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> >>  	stop_critical_timings();
> >>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> >>  		rcu_idle_enter();
> >> -	entered_state = target_state->enter(dev, drv, index);
> >> +	/*
> >> +	 * Last need_resched() check must come after rcu_idle_enter()
> >> +	 * which may wake up RCU internal tasks.
> >> +	 */
> >> +	if (!need_resched())
> >> +		entered_state = target_state->enter(dev, drv, index);
> >> +	else
> >> +		entered_state = -EBUSY;
> >>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> >>  		rcu_idle_exit();
> >>  	start_critical_timings();
> >>
> > 
> > This patch causes a hardlock on NVIDIA Tegra using today's linux-next.
> > Disabling coupled idling state helps. Please fix thanks in advance.
> 
> This isn't a proper fix, but it works:

There is some ongoing discussion about what an overall proper fix might
look like, so in the meantime I am folding you changes below into
Frederic's original.  ;-)

							Thanx, Paul

> diff --git a/drivers/cpuidle/cpuidle-tegra.c
> b/drivers/cpuidle/cpuidle-tegra.c
> index 191966dc8d02..ecc5d9b31553 100644
> --- a/drivers/cpuidle/cpuidle-tegra.c
> +++ b/drivers/cpuidle/cpuidle-tegra.c
> @@ -148,7 +148,7 @@ static int tegra_cpuidle_c7_enter(void)
> 
>  static int tegra_cpuidle_coupled_barrier(struct cpuidle_device *dev)
>  {
> -	if (tegra_pending_sgi()) {
> +	if (tegra_pending_sgi() || need_resched()) {
>  		/*
>  		 * CPU got local interrupt that will be lost after GIC's
>  		 * shutdown because GIC driver doesn't save/restore the
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 4cc1ba49ce05..2bc52ccc339b 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -248,7 +248,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> struct cpuidle_driver *drv,
>  	 * Last need_resched() check must come after rcu_idle_enter()
>  	 * which may wake up RCU internal tasks.
>  	 */
> -	if (!need_resched())
> +	if ((target_state->flags & CPUIDLE_FLAG_COUPLED) || !need_resched())
>  		entered_state = target_state->enter(dev, drv, index);
>  	else
>  		entered_state = -EBUSY;
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter()
  2021-01-04 15:20 ` [PATCH 3/4] ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter() Frederic Weisbecker
@ 2021-01-04 15:57   ` Fabio Estevam
  0 siblings, 0 replies; 13+ messages in thread
From: Fabio Estevam @ 2021-01-04 15:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Rafael J . Wysocki, Ingo Molnar, stable,
	Thomas Gleixner, Paul E . McKenney, Len Brown,
	Pengutronix Kernel Team, NXP Linux Team, Daniel Lezcano,
	Shawn Guo, Sascha Hauer

Hi Frederic,

On Mon, Jan 4, 2021 at 12:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> kthread (rcuog) to be serviced.
>
> Usually a wake up happening while running the idle task is spotted in
> one of the need_resched() checks carefully placed within the idle loop
> that can break to the scheduler.
>
> Unfortunately imx6q_enter_wait() is beyond the last generic
> need_resched() check and it performs a wfi right away after the call to
> rcu_idle_enter(). We may halt the CPU with a resched request unhandled,
> leaving the task hanging.
>
> Fix this with performing a last minute need_resched() check after
> calling rcu_idle_enter().

Shouldn't tif_need_resched() be used instead of need_resched() in the
commit log?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter()
  2021-01-04 15:20 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() v2 Frederic Weisbecker
@ 2021-01-04 15:20 ` Frederic Weisbecker
  2021-01-04 15:57   ` Fabio Estevam
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2021-01-04 15:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Rafael J . Wysocki, Ingo Molnar,
	Fabio Estevam, stable, Thomas Gleixner, Paul E . McKenney,
	Len Brown, Pengutronix Kernel Team, NXP Linux Team,
	Daniel Lezcano, Shawn Guo, Sascha Hauer

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.

Unfortunately imx6q_enter_wait() is beyond the last generic
need_resched() check and it performs a wfi right away after the call to
rcu_idle_enter(). We may halt the CPU with a resched request unhandled,
leaving the task hanging.

Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fixes: 1a67b9263e06 (ARM: imx6q: Fixup RCU usage for cpuidle)
Cc: stable@vger.kernel.org
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/arm/mach-imx/cpuidle-imx6q.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-imx/cpuidle-imx6q.c b/arch/arm/mach-imx/cpuidle-imx6q.c
index 094337dc1bc7..1115f4dc6d1d 100644
--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -5,6 +5,7 @@
 
 #include <linux/cpuidle.h>
 #include <linux/module.h>
+#include <linux/thread_info.h>
 #include <asm/cpuidle.h>
 
 #include <soc/imx/cpuidle.h>
@@ -25,7 +26,12 @@ static int imx6q_enter_wait(struct cpuidle_device *dev,
 	raw_spin_unlock(&cpuidle_lock);
 
 	rcu_idle_enter();
-	cpu_do_idle();
+	/*
+	 * Last need_resched() check must come after rcu_idle_enter()
+	 * which may wake up RCU internal tasks.
+	 */
+	if (!tif_need_resched())
+		cpu_do_idle();
 	rcu_idle_exit();
 
 	raw_spin_lock(&cpuidle_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-05 19:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22  1:37 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() Frederic Weisbecker
2020-12-22  1:37 ` [PATCH 1/4] sched/idle: Fix missing need_resched() check " Frederic Weisbecker
2020-12-22  4:20   ` Paul E. McKenney
2020-12-22  1:37 ` [PATCH 2/4] cpuidle: " Frederic Weisbecker
2021-01-05 17:25   ` Dmitry Osipenko
2021-01-05 18:10     ` Dmitry Osipenko
2021-01-05 19:17       ` Paul E. McKenney
2020-12-22  1:37 ` [PATCH 3/4] ARM: imx6q: " Frederic Weisbecker
2020-12-22  1:37 ` [PATCH 4/4] ACPI: processor: " Frederic Weisbecker
2020-12-22 16:19 ` [PATCH 0/4] sched/idle: Fix missing need_resched() checks " Rafael J. Wysocki
2021-01-01 16:05   ` Paul E. McKenney
2021-01-04 15:20 [PATCH 0/4] sched/idle: Fix missing need_resched() checks after rcu_idle_enter() v2 Frederic Weisbecker
2021-01-04 15:20 ` [PATCH 3/4] ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter() Frederic Weisbecker
2021-01-04 15:57   ` Fabio Estevam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).