All of lore.kernel.org
 help / color / mirror / Atom feed
* Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
@ 2022-03-10 21:56 Paul E. McKenney
  2022-03-10 22:41 ` Frederic Weisbecker
  2022-03-11 11:08 ` Frederic Weisbecker
  0 siblings, 2 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-03-10 21:56 UTC (permalink / raw)
  To: frederic; +Cc: linux-kernel

Hello, Frederic,

I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
then am getting roughly one RCU CPU stall warning (or silent hang)
per few tens of hours of rcutorture testing on dual-socket systems.
The stall warnings feature starvation of RCU grace-period kthread.

Any advice on debugging this?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-10 21:56 Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n? Paul E. McKenney
@ 2022-03-10 22:41 ` Frederic Weisbecker
  2022-03-10 22:52   ` Paul E. McKenney
  2022-03-11 11:08 ` Frederic Weisbecker
  1 sibling, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2022-03-10 22:41 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> Hello, Frederic,
> 
> I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> then am getting roughly one RCU CPU stall warning (or silent hang)
> per few tens of hours of rcutorture testing on dual-socket systems.
> The stall warnings feature starvation of RCU grace-period kthread.
> 
> Any advice on debugging this?

Oh, I'm testing that!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-10 22:41 ` Frederic Weisbecker
@ 2022-03-10 22:52   ` Paul E. McKenney
  2022-03-11 11:32     ` Frederic Weisbecker
  2022-03-11 13:07     ` Frederic Weisbecker
  0 siblings, 2 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-03-10 22:52 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel

On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > Hello, Frederic,
> > 
> > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > then am getting roughly one RCU CPU stall warning (or silent hang)
> > per few tens of hours of rcutorture testing on dual-socket systems.
> > The stall warnings feature starvation of RCU grace-period kthread.
> > 
> > Any advice on debugging this?
> 
> Oh, I'm testing that!

Even better, thank you!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-10 21:56 Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n? Paul E. McKenney
  2022-03-10 22:41 ` Frederic Weisbecker
@ 2022-03-11 11:08 ` Frederic Weisbecker
  1 sibling, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2022-03-11 11:08 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> Hello, Frederic,
> 
> I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> then am getting roughly one RCU CPU stall warning (or silent hang)
> per few tens of hours of rcutorture testing on dual-socket systems.
> The stall warnings feature starvation of RCU grace-period kthread.
> 
> Any advice on debugging this?
> 
> 							Thanx, Paul

Ok it reproduces easily but I have no clue about the origin. I'm starting a
bisection.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-10 22:52   ` Paul E. McKenney
@ 2022-03-11 11:32     ` Frederic Weisbecker
  2022-03-11 13:07     ` Frederic Weisbecker
  1 sibling, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2022-03-11 11:32 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > Hello, Frederic,
> > > 
> > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > The stall warnings feature starvation of RCU grace-period kthread.
> > > 
> > > Any advice on debugging this?
> > 
> > Oh, I'm testing that!
> 
> Even better, thank you!  ;-)

One possibly interesting detail: the stalling CPU is stuck on
sync_rcu_do_polled_gp(), which is launched by the recent start_poll_synchronize_rcu_expedited():

[  463.518410]  <IRQ>
[  463.518691]  dump_stack_lvl+0x33/0x42
[  463.519182]  nmi_cpu_backtrace+0xc0/0xe0
[  463.519706]  ? lapic_can_unplug_cpu+0x90/0x90
[  463.522188]  nmi_trigger_cpumask_backtrace+0x82/0xc0
[  463.522863]  rcu_dump_cpu_stacks+0xc0/0xf0
[  463.523410]  rcu_sched_clock_irq+0x6e3/0xa30
[  463.523982]  ? tick_sched_handle.isra.21+0x40/0x40
[  463.524628]  update_process_times+0x87/0xb0
[  463.525183]  tick_sched_handle.isra.21+0x2b/0x40
[  463.525795]  tick_sched_timer+0x5e/0x70
[  463.526306]  __hrtimer_run_queues+0x108/0x240
[  463.526886]  hrtimer_interrupt+0xe0/0x240
[  463.527419]  __sysvec_apic_timer_interrupt+0x55/0xf0
[  463.528436]  sysvec_apic_timer_interrupt+0x43/0x80
[  463.529074]  </IRQ>
[  463.529366]  <TASK>
[  463.529656]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  463.530333] RIP: 0010:synchronize_rcu_expedited+0x0/0x3f0
[  463.531042] Code: 89 65 48 48 89 ef e8 8f 03 af 00 48 8b 85 80 00 00 00 48 85 c0 0f 84 70 ff ff ff 4c 8b 65 68 48 89 c5 eb b6 66 0f 1f 44 00 00 <41> 54 55 53 48 81 ec 90 00 00 00 44 8b 25 3a 10 7a 01 65 48 8b 04
[  463.533454] RSP: 0018:ffffb2e94045be88 EFLAGS: 00000287
[  463.534141] RAX: 000000000001300c RBX: 0000000000013010 RCX: ffff98941f2298e8
[  463.535078] RDX: 0000000000013011 RSI: 0000000000000286 RDI: ffffffff9bb3ba9c
[  463.536007] RBP: ffffffff9bb3baa8 R08: 000070675f756372 R09: 8080808080808080
[  463.536948] R10: ffffb2e940077d48 R11: fefefefefefefeff R12: ffffffff9bb3ba9c
[  463.537878] R13: 0000000000000000 R14: ffff98941f2298c0 R15: ffff98941f22e005
[  463.538810]  sync_rcu_do_polled_gp+0x39/0xc0
[  463.539382]  process_one_work+0x1ec/0x3b0
[  463.539917]  worker_thread+0x25/0x390
[  463.548472]  ? process_one_work+0x3b0/0x3b0
[  463.549193]  kthread+0xbd/0xe0
[  463.549730]  ? kthread_complete_and_exit+0x20/0x20
[  463.550548]  ret_from_fork+0x22/0x30
[  463.551167]  </TASK>


Could be this:

	while (!sync_exp_work_done(s))
		synchronize_rcu_expedited();

And if synchronize_rcu_expedited() -> rcu_blocking_is_gp() is true, then we
may run in a long loop withing the workqueue without a chance to report a QS,
expecially if we are running a non-preemptible kernel. This could be the cause
of the stalling grace period kthread.

rcu_blocking_is_gp() could be true if all other CPUs but the current one are
offline.

It's just a potential scenario, lemme check...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-10 22:52   ` Paul E. McKenney
  2022-03-11 11:32     ` Frederic Weisbecker
@ 2022-03-11 13:07     ` Frederic Weisbecker
  2022-03-11 15:21       ` Paul E. McKenney
  1 sibling, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2022-03-11 13:07 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > Hello, Frederic,
> > > 
> > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > The stall warnings feature starvation of RCU grace-period kthread.
> > > 
> > > Any advice on debugging this?
> > 
> > Oh, I'm testing that!
> 
> Even better, thank you!  ;-)

Here is what I'm testing below. If it happens to work though, it's still not
the most optimized way of dealing with the UP on SMP situation as we still start
an exp grace period when we could early return. But since we have a cookie
to pass to poll_state_synchronize_rcu_expedited()...

Oh but if we were to early check a positive rcu_blocking_is_gp() from
start_poll_synchronize_rcu_expedited(),
we could simply return the current value of rcu_state.expedited_sequence without
doing an rcu_exp_gp_seq_snap() and then pass that to
poll_state_synchronize_rcu_expedited() which should then immediately return.

Now even if we do that, we still need the below in case the CPUs went offline
in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
fixes the issue. I'm running the test).

---
From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic@kernel.org>
Date: Fri, 11 Mar 2022 13:30:02 +0100
Subject: [PATCH] rcu: Fix rcu exp polling

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index d5f30085b0cf..69c4dcc9159a 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
 
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-/**
- * synchronize_rcu_expedited - Brute-force RCU grace period
- *
- * Wait for an RCU grace period, but expedite it.  The basic idea is to
- * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
- * the CPU is in an RCU critical section, and if so, it sets a flag that
- * causes the outermost rcu_read_unlock() to report the quiescent state
- * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
- * other hand, if the CPU is not in an RCU read-side critical section,
- * the IPI handler reports the quiescent state immediately.
- *
- * Although this is a great improvement over previous expedited
- * implementations, it is still unfriendly to real-time workloads, so is
- * thus not recommended for any sort of common-case code.  In fact, if
- * you are using synchronize_rcu_expedited() in a loop, please restructure
- * your code to batch your updates, and then use a single synchronize_rcu()
- * instead.
- *
- * This has the same semantics as (but is more brutal than) synchronize_rcu().
- */
-void synchronize_rcu_expedited(void)
+static void __synchronize_rcu_expedited(bool polling)
 {
 	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
 	struct rcu_exp_work rew;
@@ -827,7 +807,7 @@ void synchronize_rcu_expedited(void)
 			 "Illegal synchronize_rcu_expedited() in RCU read-side critical section");
 
 	/* Is the state is such that the call is a grace period? */
-	if (rcu_blocking_is_gp())
+	if (rcu_blocking_is_gp() && !polling)
 		return;
 
 	/* If expedited grace periods are prohibited, fall back to normal. */
@@ -863,6 +843,32 @@ void synchronize_rcu_expedited(void)
 
 	if (likely(!boottime))
 		destroy_work_on_stack(&rew.rew_work);
+
+}
+
+/**
+ * synchronize_rcu_expedited - Brute-force RCU grace period
+ *
+ * Wait for an RCU grace period, but expedite it.  The basic idea is to
+ * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
+ * the CPU is in an RCU critical section, and if so, it sets a flag that
+ * causes the outermost rcu_read_unlock() to report the quiescent state
+ * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
+ * other hand, if the CPU is not in an RCU read-side critical section,
+ * the IPI handler reports the quiescent state immediately.
+ *
+ * Although this is a great improvement over previous expedited
+ * implementations, it is still unfriendly to real-time workloads, so is
+ * thus not recommended for any sort of common-case code.  In fact, if
+ * you are using synchronize_rcu_expedited() in a loop, please restructure
+ * your code to batch your updates, and then use a single synchronize_rcu()
+ * instead.
+ *
+ * This has the same semantics as (but is more brutal than) synchronize_rcu().
+ */
+void synchronize_rcu_expedited(void)
+{
+	__synchronize_rcu_expedited(false);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
@@ -903,7 +909,7 @@ static void sync_rcu_do_polled_gp(struct work_struct *wp)
 	if (s & 0x1)
 		return;
 	while (!sync_exp_work_done(s))
-		synchronize_rcu_expedited();
+		__synchronize_rcu_expedited(true);
 	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
 	s = rnp->exp_seq_poll_rq;
 	if (!(s & 0x1) && !sync_exp_work_done(s))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-11 13:07     ` Frederic Weisbecker
@ 2022-03-11 15:21       ` Paul E. McKenney
  2022-03-11 15:46         ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-03-11 15:21 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel

On Fri, Mar 11, 2022 at 02:07:19PM +0100, Frederic Weisbecker wrote:
> On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> > On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > > Hello, Frederic,
> > > > 
> > > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > > The stall warnings feature starvation of RCU grace-period kthread.
> > > > 
> > > > Any advice on debugging this?
> > > 
> > > Oh, I'm testing that!
> > 
> > Even better, thank you!  ;-)
> 
> Here is what I'm testing below. If it happens to work though, it's still not
> the most optimized way of dealing with the UP on SMP situation as we still start
> an exp grace period when we could early return. But since we have a cookie
> to pass to poll_state_synchronize_rcu_expedited()...
> 
> Oh but if we were to early check a positive rcu_blocking_is_gp() from
> start_poll_synchronize_rcu_expedited(),
> we could simply return the current value of rcu_state.expedited_sequence without
> doing an rcu_exp_gp_seq_snap() and then pass that to
> poll_state_synchronize_rcu_expedited() which should then immediately return.
> 
> Now even if we do that, we still need the below in case the CPUs went offline
> in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
> fixes the issue. I'm running the test).

Color me slow and stupid!!!

So the reason that this works for CONFIG_PREEMPT_DYNAMIC=y is that
the rcu_blocking_is_gp() was never updated to account for this.

The first "if" in rcu_blocking_is_gp() needs to become something like
this:

	if (!preemption_boot_enabled())

Where:

	bool preemption_boot_enabled(void)
	{
	#ifdef CONFIG_PREEMPT_DYNAMIC
		return preempt_dynamic_mode == preempt_dynamic_full;
	#else
		return IS_ENABLED(CONFIG_PREEMPTION);
	#endif
	}

Or am I missing something here?

> ---
> >From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
> From: Frederic Weisbecker <frederic@kernel.org>
> Date: Fri, 11 Mar 2022 13:30:02 +0100
> Subject: [PATCH] rcu: Fix rcu exp polling
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
>  1 file changed, 29 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index d5f30085b0cf..69c4dcc9159a 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>  
>  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
>  
> -/**
> - * synchronize_rcu_expedited - Brute-force RCU grace period
> - *
> - * Wait for an RCU grace period, but expedite it.  The basic idea is to
> - * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> - * the CPU is in an RCU critical section, and if so, it sets a flag that
> - * causes the outermost rcu_read_unlock() to report the quiescent state
> - * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> - * other hand, if the CPU is not in an RCU read-side critical section,
> - * the IPI handler reports the quiescent state immediately.
> - *
> - * Although this is a great improvement over previous expedited
> - * implementations, it is still unfriendly to real-time workloads, so is
> - * thus not recommended for any sort of common-case code.  In fact, if
> - * you are using synchronize_rcu_expedited() in a loop, please restructure
> - * your code to batch your updates, and then use a single synchronize_rcu()
> - * instead.
> - *
> - * This has the same semantics as (but is more brutal than) synchronize_rcu().
> - */
> -void synchronize_rcu_expedited(void)

We should have a header comment here.  Given that I missed the need for
this, for example.  ;-)

But feel free to send a formal patch without it, and I can add it.

Otherwise, it looks good.

							Thanx, Paul

> +static void __synchronize_rcu_expedited(bool polling)
>  {
>  	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>  	struct rcu_exp_work rew;
> @@ -827,7 +807,7 @@ void synchronize_rcu_expedited(void)
>  			 "Illegal synchronize_rcu_expedited() in RCU read-side critical section");
>  
>  	/* Is the state is such that the call is a grace period? */
> -	if (rcu_blocking_is_gp())
> +	if (rcu_blocking_is_gp() && !polling)
>  		return;
>  
>  	/* If expedited grace periods are prohibited, fall back to normal. */
> @@ -863,6 +843,32 @@ void synchronize_rcu_expedited(void)
>  
>  	if (likely(!boottime))
>  		destroy_work_on_stack(&rew.rew_work);
> +
> +}
> +
> +/**
> + * synchronize_rcu_expedited - Brute-force RCU grace period
> + *
> + * Wait for an RCU grace period, but expedite it.  The basic idea is to
> + * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> + * the CPU is in an RCU critical section, and if so, it sets a flag that
> + * causes the outermost rcu_read_unlock() to report the quiescent state
> + * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> + * other hand, if the CPU is not in an RCU read-side critical section,
> + * the IPI handler reports the quiescent state immediately.
> + *
> + * Although this is a great improvement over previous expedited
> + * implementations, it is still unfriendly to real-time workloads, so is
> + * thus not recommended for any sort of common-case code.  In fact, if
> + * you are using synchronize_rcu_expedited() in a loop, please restructure
> + * your code to batch your updates, and then use a single synchronize_rcu()
> + * instead.
> + *
> + * This has the same semantics as (but is more brutal than) synchronize_rcu().
> + */
> +void synchronize_rcu_expedited(void)
> +{
> +	__synchronize_rcu_expedited(false);
>  }
>  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
>  
> @@ -903,7 +909,7 @@ static void sync_rcu_do_polled_gp(struct work_struct *wp)
>  	if (s & 0x1)
>  		return;
>  	while (!sync_exp_work_done(s))
> -		synchronize_rcu_expedited();
> +		__synchronize_rcu_expedited(true);
>  	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
>  	s = rnp->exp_seq_poll_rq;
>  	if (!(s & 0x1) && !sync_exp_work_done(s))
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-11 15:21       ` Paul E. McKenney
@ 2022-03-11 15:46         ` Frederic Weisbecker
  2022-03-11 16:06           ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2022-03-11 15:46 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

On Fri, Mar 11, 2022 at 07:21:48AM -0800, Paul E. McKenney wrote:
> On Fri, Mar 11, 2022 at 02:07:19PM +0100, Frederic Weisbecker wrote:
> > On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> > > On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > > > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > > > Hello, Frederic,
> > > > > 
> > > > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > > > The stall warnings feature starvation of RCU grace-period kthread.
> > > > > 
> > > > > Any advice on debugging this?
> > > > 
> > > > Oh, I'm testing that!
> > > 
> > > Even better, thank you!  ;-)
> > 
> > Here is what I'm testing below. If it happens to work though, it's still not
> > the most optimized way of dealing with the UP on SMP situation as we still start
> > an exp grace period when we could early return. But since we have a cookie
> > to pass to poll_state_synchronize_rcu_expedited()...
> > 
> > Oh but if we were to early check a positive rcu_blocking_is_gp() from
> > start_poll_synchronize_rcu_expedited(),
> > we could simply return the current value of rcu_state.expedited_sequence without
> > doing an rcu_exp_gp_seq_snap() and then pass that to
> > poll_state_synchronize_rcu_expedited() which should then immediately return.
> > 
> > Now even if we do that, we still need the below in case the CPUs went offline
> > in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
> > fixes the issue. I'm running the test).
> 
> Color me slow and stupid!!!
> 
> So the reason that this works for CONFIG_PREEMPT_DYNAMIC=y is that
> the rcu_blocking_is_gp() was never updated to account for this.
> 
> The first "if" in rcu_blocking_is_gp() needs to become something like
> this:
> 
> 	if (!preemption_boot_enabled())
> 
> Where:
> 
> 	bool preemption_boot_enabled(void)
> 	{
> 	#ifdef CONFIG_PREEMPT_DYNAMIC
> 		return preempt_dynamic_mode == preempt_dynamic_full;
> 	#else
> 		return IS_ENABLED(CONFIG_PREEMPTION);
> 	#endif
> 	}
> 
> Or am I missing something here?

Oh right there is that too!

I think we need to apply this patch:
https://lore.kernel.org/lkml/20211110202448.4054153-3-valentin.schneider@arm.com/
and then your above change. I can cook a series with the below.

> 
> > ---
> > >From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
> > From: Frederic Weisbecker <frederic@kernel.org>
> > Date: Fri, 11 Mar 2022 13:30:02 +0100
> > Subject: [PATCH] rcu: Fix rcu exp polling
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
> >  1 file changed, 29 insertions(+), 23 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index d5f30085b0cf..69c4dcc9159a 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> >  
> >  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
> >  
> > -/**
> > - * synchronize_rcu_expedited - Brute-force RCU grace period
> > - *
> > - * Wait for an RCU grace period, but expedite it.  The basic idea is to
> > - * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> > - * the CPU is in an RCU critical section, and if so, it sets a flag that
> > - * causes the outermost rcu_read_unlock() to report the quiescent state
> > - * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> > - * other hand, if the CPU is not in an RCU read-side critical section,
> > - * the IPI handler reports the quiescent state immediately.
> > - *
> > - * Although this is a great improvement over previous expedited
> > - * implementations, it is still unfriendly to real-time workloads, so is
> > - * thus not recommended for any sort of common-case code.  In fact, if
> > - * you are using synchronize_rcu_expedited() in a loop, please restructure
> > - * your code to batch your updates, and then use a single synchronize_rcu()
> > - * instead.
> > - *
> > - * This has the same semantics as (but is more brutal than) synchronize_rcu().
> > - */
> > -void synchronize_rcu_expedited(void)
> 
> We should have a header comment here.  Given that I missed the need for
> this, for example.  ;-)
> 
> But feel free to send a formal patch without it, and I can add it.
> 
> Otherwise, it looks good.

Ok, preparing this.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-11 15:46         ` Frederic Weisbecker
@ 2022-03-11 16:06           ` Paul E. McKenney
  2022-03-11 16:47             ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-03-11 16:06 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel

On Fri, Mar 11, 2022 at 04:46:03PM +0100, Frederic Weisbecker wrote:
> On Fri, Mar 11, 2022 at 07:21:48AM -0800, Paul E. McKenney wrote:
> > On Fri, Mar 11, 2022 at 02:07:19PM +0100, Frederic Weisbecker wrote:
> > > On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> > > > On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > > > > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > > > > Hello, Frederic,
> > > > > > 
> > > > > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > > > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > > > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > > > > The stall warnings feature starvation of RCU grace-period kthread.
> > > > > > 
> > > > > > Any advice on debugging this?
> > > > > 
> > > > > Oh, I'm testing that!
> > > > 
> > > > Even better, thank you!  ;-)
> > > 
> > > Here is what I'm testing below. If it happens to work though, it's still not
> > > the most optimized way of dealing with the UP on SMP situation as we still start
> > > an exp grace period when we could early return. But since we have a cookie
> > > to pass to poll_state_synchronize_rcu_expedited()...
> > > 
> > > Oh but if we were to early check a positive rcu_blocking_is_gp() from
> > > start_poll_synchronize_rcu_expedited(),
> > > we could simply return the current value of rcu_state.expedited_sequence without
> > > doing an rcu_exp_gp_seq_snap() and then pass that to
> > > poll_state_synchronize_rcu_expedited() which should then immediately return.
> > > 
> > > Now even if we do that, we still need the below in case the CPUs went offline
> > > in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
> > > fixes the issue. I'm running the test).
> > 
> > Color me slow and stupid!!!
> > 
> > So the reason that this works for CONFIG_PREEMPT_DYNAMIC=y is that
> > the rcu_blocking_is_gp() was never updated to account for this.
> > 
> > The first "if" in rcu_blocking_is_gp() needs to become something like
> > this:
> > 
> > 	if (!preemption_boot_enabled())
> > 
> > Where:
> > 
> > 	bool preemption_boot_enabled(void)
> > 	{
> > 	#ifdef CONFIG_PREEMPT_DYNAMIC
> > 		return preempt_dynamic_mode == preempt_dynamic_full;
> > 	#else
> > 		return IS_ENABLED(CONFIG_PREEMPTION);
> > 	#endif
> > 	}
> > 
> > Or am I missing something here?
> 
> Oh right there is that too!

I am testing with that fastpath completely commented out, just to make
sure that we were seeing the same failure.

> I think we need to apply this patch:
> https://lore.kernel.org/lkml/20211110202448.4054153-3-valentin.schneider@arm.com/
> and then your above change. I can cook a series with the below.

Agreed, Valentin's approach is better than my preemption_boot_enabled().
But when will CONFIG_PREEMPT_RT be boot-time selectable?  (/me runs!)

Looking forward to your series!

							Thanx, Paul

> > > ---
> > > >From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
> > > From: Frederic Weisbecker <frederic@kernel.org>
> > > Date: Fri, 11 Mar 2022 13:30:02 +0100
> > > Subject: [PATCH] rcu: Fix rcu exp polling
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
> > >  1 file changed, 29 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index d5f30085b0cf..69c4dcc9159a 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> > >  
> > >  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
> > >  
> > > -/**
> > > - * synchronize_rcu_expedited - Brute-force RCU grace period
> > > - *
> > > - * Wait for an RCU grace period, but expedite it.  The basic idea is to
> > > - * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> > > - * the CPU is in an RCU critical section, and if so, it sets a flag that
> > > - * causes the outermost rcu_read_unlock() to report the quiescent state
> > > - * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> > > - * other hand, if the CPU is not in an RCU read-side critical section,
> > > - * the IPI handler reports the quiescent state immediately.
> > > - *
> > > - * Although this is a great improvement over previous expedited
> > > - * implementations, it is still unfriendly to real-time workloads, so is
> > > - * thus not recommended for any sort of common-case code.  In fact, if
> > > - * you are using synchronize_rcu_expedited() in a loop, please restructure
> > > - * your code to batch your updates, and then use a single synchronize_rcu()
> > > - * instead.
> > > - *
> > > - * This has the same semantics as (but is more brutal than) synchronize_rcu().
> > > - */
> > > -void synchronize_rcu_expedited(void)
> > 
> > We should have a header comment here.  Given that I missed the need for
> > this, for example.  ;-)
> > 
> > But feel free to send a formal patch without it, and I can add it.
> > 
> > Otherwise, it looks good.
> 
> Ok, preparing this.
> 
> Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-11 16:06           ` Paul E. McKenney
@ 2022-03-11 16:47             ` Paul E. McKenney
  2022-03-11 17:26               ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-03-11 16:47 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel

On Fri, Mar 11, 2022 at 08:06:19AM -0800, Paul E. McKenney wrote:
> On Fri, Mar 11, 2022 at 04:46:03PM +0100, Frederic Weisbecker wrote:
> > On Fri, Mar 11, 2022 at 07:21:48AM -0800, Paul E. McKenney wrote:
> > > On Fri, Mar 11, 2022 at 02:07:19PM +0100, Frederic Weisbecker wrote:
> > > > On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > > > > > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > > > > > Hello, Frederic,
> > > > > > > 
> > > > > > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > > > > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > > > > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > > > > > The stall warnings feature starvation of RCU grace-period kthread.
> > > > > > > 
> > > > > > > Any advice on debugging this?
> > > > > > 
> > > > > > Oh, I'm testing that!
> > > > > 
> > > > > Even better, thank you!  ;-)
> > > > 
> > > > Here is what I'm testing below. If it happens to work though, it's still not
> > > > the most optimized way of dealing with the UP on SMP situation as we still start
> > > > an exp grace period when we could early return. But since we have a cookie
> > > > to pass to poll_state_synchronize_rcu_expedited()...
> > > > 
> > > > Oh but if we were to early check a positive rcu_blocking_is_gp() from
> > > > start_poll_synchronize_rcu_expedited(),
> > > > we could simply return the current value of rcu_state.expedited_sequence without
> > > > doing an rcu_exp_gp_seq_snap() and then pass that to
> > > > poll_state_synchronize_rcu_expedited() which should then immediately return.
> > > > 
> > > > Now even if we do that, we still need the below in case the CPUs went offline
> > > > in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
> > > > fixes the issue. I'm running the test).
> > > 
> > > Color me slow and stupid!!!
> > > 
> > > So the reason that this works for CONFIG_PREEMPT_DYNAMIC=y is that
> > > the rcu_blocking_is_gp() was never updated to account for this.
> > > 
> > > The first "if" in rcu_blocking_is_gp() needs to become something like
> > > this:
> > > 
> > > 	if (!preemption_boot_enabled())
> > > 
> > > Where:
> > > 
> > > 	bool preemption_boot_enabled(void)
> > > 	{
> > > 	#ifdef CONFIG_PREEMPT_DYNAMIC
> > > 		return preempt_dynamic_mode == preempt_dynamic_full;
> > > 	#else
> > > 		return IS_ENABLED(CONFIG_PREEMPTION);
> > > 	#endif
> > > 	}
> > > 
> > > Or am I missing something here?
> > 
> > Oh right there is that too!
> 
> I am testing with that fastpath completely commented out, just to make
> sure that we were seeing the same failure.
> 
> > I think we need to apply this patch:
> > https://lore.kernel.org/lkml/20211110202448.4054153-3-valentin.schneider@arm.com/
> > and then your above change. I can cook a series with the below.
> 
> Agreed, Valentin's approach is better than my preemption_boot_enabled().
> But when will CONFIG_PREEMPT_RT be boot-time selectable?  (/me runs!)
> 
> Looking forward to your series!

And there is one more issue with this code.  Someone invoking
get_state_synchronize_rcu_expedited() in one task might naively expect
that calls to synchronize_rcu_expedited() in some other task would cause
a later poll_state_synchronize_rcu_expedited() would return true.

Except that if CONFIG_PREEMPT_NONE=y and there is only one CPU, those
calls to synchronize_rcu_expedited() won't be helping at all.

I could imagine poll_state_synchronize_rcu_expedited() setting a
global flag if there is only one CPU, which could be checked by
__synchronize_rcu_expedited() and reset.

Is there a better way?

							Thanx, Paul

> > > > ---
> > > > >From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
> > > > From: Frederic Weisbecker <frederic@kernel.org>
> > > > Date: Fri, 11 Mar 2022 13:30:02 +0100
> > > > Subject: [PATCH] rcu: Fix rcu exp polling
> > > > 
> > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > ---
> > > >  kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
> > > >  1 file changed, 29 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > index d5f30085b0cf..69c4dcc9159a 100644
> > > > --- a/kernel/rcu/tree_exp.h
> > > > +++ b/kernel/rcu/tree_exp.h
> > > > @@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> > > >  
> > > >  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
> > > >  
> > > > -/**
> > > > - * synchronize_rcu_expedited - Brute-force RCU grace period
> > > > - *
> > > > - * Wait for an RCU grace period, but expedite it.  The basic idea is to
> > > > - * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> > > > - * the CPU is in an RCU critical section, and if so, it sets a flag that
> > > > - * causes the outermost rcu_read_unlock() to report the quiescent state
> > > > - * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> > > > - * other hand, if the CPU is not in an RCU read-side critical section,
> > > > - * the IPI handler reports the quiescent state immediately.
> > > > - *
> > > > - * Although this is a great improvement over previous expedited
> > > > - * implementations, it is still unfriendly to real-time workloads, so is
> > > > - * thus not recommended for any sort of common-case code.  In fact, if
> > > > - * you are using synchronize_rcu_expedited() in a loop, please restructure
> > > > - * your code to batch your updates, and then use a single synchronize_rcu()
> > > > - * instead.
> > > > - *
> > > > - * This has the same semantics as (but is more brutal than) synchronize_rcu().
> > > > - */
> > > > -void synchronize_rcu_expedited(void)
> > > 
> > > We should have a header comment here.  Given that I missed the need for
> > > this, for example.  ;-)
> > > 
> > > But feel free to send a formal patch without it, and I can add it.
> > > 
> > > Otherwise, it looks good.
> > 
> > Ok, preparing this.
> > 
> > Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-11 16:47             ` Paul E. McKenney
@ 2022-03-11 17:26               ` Frederic Weisbecker
  2022-03-11 17:33                 ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2022-03-11 17:26 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

On Fri, Mar 11, 2022 at 08:47:58AM -0800, Paul E. McKenney wrote:
> And there is one more issue with this code.  Someone invoking
> get_state_synchronize_rcu_expedited() in one task might naively expect
> that calls to synchronize_rcu_expedited() in some other task would cause
> a later poll_state_synchronize_rcu_expedited() would return true.
> 
> Except that if CONFIG_PREEMPT_NONE=y and there is only one CPU, those
> calls to synchronize_rcu_expedited() won't be helping at all.
> 
> I could imagine poll_state_synchronize_rcu_expedited() setting a
> global flag if there is only one CPU, which could be checked by
> __synchronize_rcu_expedited() and reset.
> 
> Is there a better way?

I would tend to think that in this case, it's the responsibility of the
caller to make sure that the task supposed to start the exp GP has a chance
to run (cond_resched(), etc...).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
  2022-03-11 17:26               ` Frederic Weisbecker
@ 2022-03-11 17:33                 ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-03-11 17:33 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel

On Fri, Mar 11, 2022 at 06:26:20PM +0100, Frederic Weisbecker wrote:
> On Fri, Mar 11, 2022 at 08:47:58AM -0800, Paul E. McKenney wrote:
> > And there is one more issue with this code.  Someone invoking
> > get_state_synchronize_rcu_expedited() in one task might naively expect
> > that calls to synchronize_rcu_expedited() in some other task would cause
> > a later poll_state_synchronize_rcu_expedited() would return true.
> > 
> > Except that if CONFIG_PREEMPT_NONE=y and there is only one CPU, those
> > calls to synchronize_rcu_expedited() won't be helping at all.
> > 
> > I could imagine poll_state_synchronize_rcu_expedited() setting a
> > global flag if there is only one CPU, which could be checked by
> > __synchronize_rcu_expedited() and reset.
> > 
> > Is there a better way?
> 
> I would tend to think that in this case, it's the responsibility of the
> caller to make sure that the task supposed to start the exp GP has a chance
> to run (cond_resched(), etc...).

Hahahahahahaha!

The same problem arises for poll_state_synchronize_rcu() and friends
on a single-CPU CONFIG_PREEMPT_NONE=y system.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-11 17:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-10 21:56 Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n? Paul E. McKenney
2022-03-10 22:41 ` Frederic Weisbecker
2022-03-10 22:52   ` Paul E. McKenney
2022-03-11 11:32     ` Frederic Weisbecker
2022-03-11 13:07     ` Frederic Weisbecker
2022-03-11 15:21       ` Paul E. McKenney
2022-03-11 15:46         ` Frederic Weisbecker
2022-03-11 16:06           ` Paul E. McKenney
2022-03-11 16:47             ` Paul E. McKenney
2022-03-11 17:26               ` Frederic Weisbecker
2022-03-11 17:33                 ` Paul E. McKenney
2022-03-11 11:08 ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.