[PATCHv2 0/3] rcu: Enhance the capability to cope with concurrent cpu offlining/onlining

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv2 0/3] rcu: Enhance the capability to cope with concurrent cpu offlining/onlining
@ 2022-09-15  5:58 Pingfan Liu
  2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-15  5:58 UTC (permalink / raw)
  To: rcu
  Cc: Pingfan Liu, Paul E. McKenney, David Woodhouse,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

[1/3], It was V1's [3/3]
[2/3], reimplement V1's [1/3], and fix the issue in
       rcutree_offline_cpu() instead of rcutree_dead_cpu()
[3/3], handling TICK_DEP_BIT_RCU when cpu offlining

V1: https://lore.kernel.org/all/20220910203622.GI4315@paulmck-ThinkPad-P17-Gen-1/T/

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org

Pingfan Liu (3):
  rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  rcu: Resort to cpu_dying_mask for affinity when offlining
  rcu: coordinate tick dependency during concurrent offlining

 kernel/rcu/tree.c        | 21 ++++++++++++++-------
 kernel/rcu/tree.h        |  1 +
 kernel/rcu/tree_plugin.h | 13 ++++++++++++-
 3 files changed, 27 insertions(+), 8 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-15  5:58 [PATCHv2 0/3] rcu: Enhance the capability to cope with concurrent cpu offlining/onlining Pingfan Liu
@ 2022-09-15  5:58 ` Pingfan Liu
  2022-09-15  6:11   ` Pingfan Liu
                     ` (2 more replies)
  2022-09-15  5:58 ` [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining Pingfan Liu
  2022-09-15  5:58 ` [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining Pingfan Liu
  2 siblings, 3 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-15  5:58 UTC (permalink / raw)
  To: rcu
  Cc: Pingfan Liu, Paul E. McKenney, David Woodhouse,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

rcutree_online_cpu() is concurrent, so there is the following race
scene:

        CPU 1                               CPU2
mask_old = rcu_rnp_online_cpus(rnp);
...

                                   mask_new = rcu_rnp_online_cpus(rnp);
                                   ...
                                   set_cpus_allowed_ptr(t, cm);

set_cpus_allowed_ptr(t, cm);

Consequently, the old mask will overwrite the new mask in the task's cpus_ptr.

Since there is a mutex ->boost_kthread_mutex, using it to build an
order, then the latest ->qsmaskinitnext will be fetched for updating cpus_ptr.

But for concurrent offlining, ->qsmaskinitnext is not reilable when
rcutree_offline_cpu(). That is another story and comes in the following
patch.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree_plugin.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 438ecae6bd7e..ef6d3ae239b9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 {
 	struct task_struct *t = rnp->boost_kthread_task;
-	unsigned long mask = rcu_rnp_online_cpus(rnp);
+	unsigned long mask;
 	cpumask_var_t cm;
 	int cpu;
 
@@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
 		return;
 	mutex_lock(&rnp->boost_kthread_mutex);
+	/*
+	 * Relying on the lock to serialize, so when onlining, the latest
+	 * qsmaskinitnext is for cpus_ptr.
+	 */
+	mask = rcu_rnp_online_cpus(rnp);
 	for_each_leaf_node_possible_cpu(rnp, cpu)
 		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
 		    cpu != outgoingcpu)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-15  5:58 [PATCHv2 0/3] rcu: Enhance the capability to cope with concurrent cpu offlining/onlining Pingfan Liu
  2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
@ 2022-09-15  5:58 ` Pingfan Liu
  2022-09-16 14:23   ` Frederic Weisbecker
  2022-09-20  9:38   ` Frederic Weisbecker
  2022-09-15  5:58 ` [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining Pingfan Liu
  2 siblings, 2 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-15  5:58 UTC (permalink / raw)
  To: rcu
  Cc: Pingfan Liu, Paul E. McKenney, David Woodhouse,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

During offlining, the concurrent rcutree_offline_cpu() can not be aware
of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
information at that point and can be utilized.

Besides, a trivial change which removes the redudant call to
rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
rcutree_offline_cpu() can fully serve that purpose.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree.c        | 2 --
 kernel/rcu/tree_plugin.h | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..8a829b64f5b2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
 		return 0;
 
 	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
-	/* Adjust any no-longer-needed kthreads. */
-	rcu_boost_kthread_setaffinity(rnp, -1);
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
 	return 0;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index ef6d3ae239b9..e5afc63bd97f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 		    cpu != outgoingcpu)
 			cpumask_set_cpu(cpu, cm);
 	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
+	/*
+	 * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
+	 * So resort to cpu_dying_mask, whose changes has already been visible.
+	 */
+	if (outgoingcpu != -1)
+		cpumask_andnot(cm, cm, cpu_dying_mask);
 	if (cpumask_empty(cm))
 		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
 	set_cpus_allowed_ptr(t, cm);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-15  5:58 [PATCHv2 0/3] rcu: Enhance the capability to cope with concurrent cpu offlining/onlining Pingfan Liu
  2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
  2022-09-15  5:58 ` [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining Pingfan Liu
@ 2022-09-15  5:58 ` Pingfan Liu
  2022-09-16 13:42   ` Frederic Weisbecker
  2022-09-26 16:13   ` Joel Fernandes
  2 siblings, 2 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-15  5:58 UTC (permalink / raw)
  To: rcu
  Cc: Pingfan Liu, Paul E. McKenney, David Woodhouse,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
atomic operations, but the problem is that if there are multiple
nohz_full CPUs going offline concurrently, the first CPU to invoke
rcutree_dead_cpu() will turn the tick off.  This might require an
atomically manipulated counter to mediate the calls to
rcutree_dead_cpu(). "

This patch introduces a new member ->dying to rcu_node, which reflects
the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
the first entrance and cleared by the last.

Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
no lock, no extra locking order is introduced.

Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree.c | 19 ++++++++++++++-----
 kernel/rcu/tree.h |  1 +
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8a829b64f5b2..f8bd0fc5fd2f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
 {
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
+	unsigned long flags;
+	u8 dying;
 
 	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
 		return 0;
 
 	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
-	// Stop-machine done, so allow nohz_full to disable tick.
-	tick_dep_clear(TICK_DEP_BIT_RCU);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	dying = --rnp->dying;
+	if (!dying)
+		// Stop-machine done, so allow nohz_full to disable tick.
+		tick_dep_clear(TICK_DEP_BIT_RCU);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	return 0;
 }
 
@@ -4020,17 +4026,20 @@ int rcutree_offline_cpu(unsigned int cpu)
 	unsigned long flags;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
+	u8 dying;
 
 	rdp = per_cpu_ptr(&rcu_data, cpu);
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	rnp->ffmask &= ~rdp->grpmask;
+	/* Let rcutree_dead_cpu() know a new offlining. */
+	dying = rnp->dying++;
+	if (!dying)
+		// nohz_full CPUs need the tick for stop-machine to work quickly
+		tick_dep_set(TICK_DEP_BIT_RCU);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
 	rcutree_affinity_setting(cpu, cpu);
-
-	// nohz_full CPUs need the tick for stop-machine to work quickly
-	tick_dep_set(TICK_DEP_BIT_RCU);
 	return 0;
 }
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index d4a97e40ea9c..b508a12ac953 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -81,6 +81,7 @@ struct rcu_node {
 	int	grphi;		/* highest-numbered CPU here. */
 	u8	grpnum;		/* group number for next level up. */
 	u8	level;		/* root is at level 0. */
+	u8	dying;		/* num of concurrent rdp offlining */
 	bool	wait_blkd_tasks;/* Necessary to wait for blocked tasks to */
 				/*  exit RCU read-side critical sections */
 				/*  before propagating offline up the */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
@ 2022-09-15  6:11   ` Pingfan Liu
  2022-09-16 14:52   ` Frederic Weisbecker
  2022-09-20 10:31   ` Frederic Weisbecker
  2 siblings, 0 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-15  6:11 UTC (permalink / raw)
  To: rcu
  Cc: Paul E. McKenney, David Woodhouse, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

Hi Paul,

If you pick up this one from my V1, could you drop it? Since I have
rephrased the commit log and comment in the code.  Sorry for the
inconvenience.

On Thu, Sep 15, 2022 at 1:58 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> rcutree_online_cpu() is concurrent, so there is the following race
> scene:
>
>         CPU 1                               CPU2
> mask_old = rcu_rnp_online_cpus(rnp);
> ...
>
>                                    mask_new = rcu_rnp_online_cpus(rnp);
>                                    ...
>                                    set_cpus_allowed_ptr(t, cm);
>
> set_cpus_allowed_ptr(t, cm);
>
> Consequently, the old mask will overwrite the new mask in the task's cpus_ptr.
>
> Since there is a mutex ->boost_kthread_mutex, using it to build an
> order, then the latest ->qsmaskinitnext will be fetched for updating cpus_ptr.
>
> But for concurrent offlining, ->qsmaskinitnext is not reilable when
> rcutree_offline_cpu(). That is another story and comes in the following
> patch.
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree_plugin.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 438ecae6bd7e..ef6d3ae239b9 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  {
>         struct task_struct *t = rnp->boost_kthread_task;
> -       unsigned long mask = rcu_rnp_online_cpus(rnp);
> +       unsigned long mask;
>         cpumask_var_t cm;
>         int cpu;
>
> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>         if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>                 return;
>         mutex_lock(&rnp->boost_kthread_mutex);
> +       /*
> +        * Relying on the lock to serialize, so when onlining, the latest
> +        * qsmaskinitnext is for cpus_ptr.
> +        */

In the comment, I put accent on 'onlining' which is different from V1

Thanks,

Pingfan

> +       mask = rcu_rnp_online_cpus(rnp);
>         for_each_leaf_node_possible_cpu(rnp, cpu)
>                 if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>                     cpu != outgoingcpu)
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-15  5:58 ` [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining Pingfan Liu
@ 2022-09-16 13:42   ` Frederic Weisbecker
  2022-09-20  7:26     ` Pingfan Liu
  2022-09-26 16:13   ` Joel Fernandes
  1 sibling, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-16 13:42 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> atomic operations, but the problem is that if there are multiple
> nohz_full CPUs going offline concurrently, the first CPU to invoke
> rcutree_dead_cpu() will turn the tick off.  This might require an
> atomically manipulated counter to mediate the calls to
> rcutree_dead_cpu(). "
> 
> This patch introduces a new member ->dying to rcu_node, which reflects
> the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> the first entrance and cleared by the last.
> 
> Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> no lock, no extra locking order is introduced.
> 
> Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c | 19 ++++++++++++++-----
>  kernel/rcu/tree.h |  1 +
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8a829b64f5b2..f8bd0fc5fd2f 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
>  {
>  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> +	unsigned long flags;
> +	u8 dying;
>  
>  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	// Stop-machine done, so allow nohz_full to disable tick.
> -	tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	dying = --rnp->dying;
> +	if (!dying)
> +		// Stop-machine done, so allow nohz_full to disable tick.
> +		tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);

Note this is only locking the rdp's node, not the root node.
Therefore if CPU 0 and CPU 256 are going off at the same time and they
don't belong to the same node, the above won't protect against concurrent
TICK_DEP_BIT_RCU set/clear.

My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
anymore. I believe it was there because of issues that were fixed with:

	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
and:

	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)

It's unfortunately just suspicion because the reason for that tick dependency
is unclear but I believe it should be safe to remove now.

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-15  5:58 ` [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining Pingfan Liu
@ 2022-09-16 14:23   ` Frederic Weisbecker
  2022-09-19  4:33     ` Pingfan Liu
  2022-09-20  9:38   ` Frederic Weisbecker
  1 sibling, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-16 14:23 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> During offlining, the concurrent rcutree_offline_cpu() can not be aware
> of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> information at that point and can be utilized.
> 
> Besides, a trivial change which removes the redudant call to
> rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> rcutree_offline_cpu() can fully serve that purpose.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c        | 2 --
>  kernel/rcu/tree_plugin.h | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..8a829b64f5b2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	/* Adjust any no-longer-needed kthreads. */
> -	rcu_boost_kthread_setaffinity(rnp, -1);
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
>  	return 0;

I would suggest to make this a separate change, for bisectability and
readability.

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index ef6d3ae239b9..e5afc63bd97f 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  		    cpu != outgoingcpu)
>  			cpumask_set_cpu(cpu, cm);
>  	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> +	/*
> +	 * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> +	 * So resort to cpu_dying_mask, whose changes has already been visible.
> +	 */
> +	if (outgoingcpu != -1)
> +		cpumask_andnot(cm, cm, cpu_dying_mask);

I'm not sure how the infrastructure changes in your concurrent down patchset
but can the cpu_dying_mask concurrently change at this stage?

Thanks.

>  	if (cpumask_empty(cm))
>  		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
>  	set_cpus_allowed_ptr(t, cm);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
  2022-09-15  6:11   ` Pingfan Liu
@ 2022-09-16 14:52   ` Frederic Weisbecker
  2022-09-19 10:24     ` Pingfan Liu
  2022-09-20 10:31   ` Frederic Weisbecker
  2 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-16 14:52 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 15, 2022 at 01:58:23PM +0800, Pingfan Liu wrote:
> rcutree_online_cpu() is concurrent, so there is the following race
> scene:
> 
>         CPU 1                               CPU2
> mask_old = rcu_rnp_online_cpus(rnp);
> ...
> 
>                                    mask_new = rcu_rnp_online_cpus(rnp);
>                                    ...
>                                    set_cpus_allowed_ptr(t, cm);
> 
> set_cpus_allowed_ptr(t, cm);
> 
> Consequently, the old mask will overwrite the new mask in the task's cpus_ptr.
> 
> Since there is a mutex ->boost_kthread_mutex, using it to build an
> order, then the latest ->qsmaskinitnext will be fetched for updating cpus_ptr.
> 
> But for concurrent offlining, ->qsmaskinitnext is not reilable when
> rcutree_offline_cpu(). That is another story and comes in the following
> patch.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree_plugin.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 438ecae6bd7e..ef6d3ae239b9 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  {
>  	struct task_struct *t = rnp->boost_kthread_task;
> -	unsigned long mask = rcu_rnp_online_cpus(rnp);
> +	unsigned long mask;
>  	cpumask_var_t cm;
>  	int cpu;
>  
> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>  		return;
>  	mutex_lock(&rnp->boost_kthread_mutex);
> +	/*
> +	 * Relying on the lock to serialize, so when onlining, the latest
> +	 * qsmaskinitnext is for cpus_ptr.
> +	 */
> +	mask = rcu_rnp_online_cpus(rnp);
>  	for_each_leaf_node_possible_cpu(rnp, cpu)
>  		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>  		    cpu != outgoingcpu)

Right but you still race against concurrent rcu_report_dead() doing:

      WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)

Thanks.

> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-16 14:23   ` Frederic Weisbecker
@ 2022-09-19  4:33     ` Pingfan Liu
  2022-09-19 10:34       ` Frederic Weisbecker
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-19  4:33 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> > During offlining, the concurrent rcutree_offline_cpu() can not be aware
> > of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> > information at that point and can be utilized.
> >
> > Besides, a trivial change which removes the redudant call to
> > rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> > rcutree_offline_cpu() can fully serve that purpose.
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c        | 2 --
> >  kernel/rcu/tree_plugin.h | 6 ++++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..8a829b64f5b2 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >               return 0;
> >
> >       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -     /* Adjust any no-longer-needed kthreads. */
> > -     rcu_boost_kthread_setaffinity(rnp, -1);
> >       // Stop-machine done, so allow nohz_full to disable tick.
> >       tick_dep_clear(TICK_DEP_BIT_RCU);
> >       return 0;
>
> I would suggest to make this a separate change, for bisectability and
> readability.
>

OK, I will.

> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index ef6d3ae239b9..e5afc63bd97f 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >                   cpu != outgoingcpu)
> >                       cpumask_set_cpu(cpu, cm);
> >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > +     /*
> > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > +      */
> > +     if (outgoingcpu != -1)
> > +             cpumask_andnot(cm, cm, cpu_dying_mask);
>
> I'm not sure how the infrastructure changes in your concurrent down patchset
> but can the cpu_dying_mask concurrently change at this stage?
>

For the concurrent down patchset [1], it extends the cpu_down()
capability to let an initiator to tear down several cpus in a batch
and in parallel.

At the first step, all cpus to be torn down should experience
cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
need to fix it by using another loop to call
cpuhp_kick_ap_work_async(cpu);)

At the outmost, the pair
cpu_maps_update_begin()/cpu_maps_update_done() still prevent any new
initiator from launching another concurrent hot-add/remove event. So
cpu_dying_mask can be stable during the batched and concurrent cpus'
teardown.

[1]: https://lore.kernel.org/all/20220822021520.6996-1-kernelfans@gmail.com/
[2]: https://lore.kernel.org/all/20220822021520.6996-4-kernelfans@gmail.com/

Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-16 14:52   ` Frederic Weisbecker
@ 2022-09-19 10:24     ` Pingfan Liu
  2022-09-19 10:51       ` Frederic Weisbecker
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-19 10:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> On Thu, Sep 15, 2022 at 01:58:23PM +0800, Pingfan Liu wrote:
> > rcutree_online_cpu() is concurrent, so there is the following race
> > scene:
> >
> >         CPU 1                               CPU2
> > mask_old = rcu_rnp_online_cpus(rnp);
> > ...
> >
> >                                    mask_new = rcu_rnp_online_cpus(rnp);
> >                                    ...
> >                                    set_cpus_allowed_ptr(t, cm);
> >
> > set_cpus_allowed_ptr(t, cm);
> >
> > Consequently, the old mask will overwrite the new mask in the task's cpus_ptr.
> >
> > Since there is a mutex ->boost_kthread_mutex, using it to build an
> > order, then the latest ->qsmaskinitnext will be fetched for updating cpus_ptr.
> >
> > But for concurrent offlining, ->qsmaskinitnext is not reilable when
> > rcutree_offline_cpu(). That is another story and comes in the following
> > patch.
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree_plugin.h | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 438ecae6bd7e..ef6d3ae239b9 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> >  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >  {
> >       struct task_struct *t = rnp->boost_kthread_task;
> > -     unsigned long mask = rcu_rnp_online_cpus(rnp);
> > +     unsigned long mask;
> >       cpumask_var_t cm;
> >       int cpu;
> >
> > @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
> >               return;
> >       mutex_lock(&rnp->boost_kthread_mutex);
> > +     /*
> > +      * Relying on the lock to serialize, so when onlining, the latest
> > +      * qsmaskinitnext is for cpus_ptr.
> > +      */
> > +     mask = rcu_rnp_online_cpus(rnp);
> >       for_each_leaf_node_possible_cpu(rnp, cpu)
> >               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
> >                   cpu != outgoingcpu)
>
> Right but you still race against concurrent rcu_report_dead() doing:
>
>       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
>

Ah. Indeed, my commit log is not precisely described.

In fact, either speedup smp_init [1] or fast kexec reboot [2] still
uses the model: one hotplug initiator and several reactors.  The
initiators are still excluded from each other by the pair
cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
cpu-up and cpu-down event at the same time.

Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-19  4:33     ` Pingfan Liu
@ 2022-09-19 10:34       ` Frederic Weisbecker
  2022-09-20  3:16         ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-19 10:34 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote:
> On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index ef6d3ae239b9..e5afc63bd97f 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > >                   cpu != outgoingcpu)
> > >                       cpumask_set_cpu(cpu, cm);
> > >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > > +     /*
> > > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > > +      */
> > > +     if (outgoingcpu != -1)
> > > +             cpumask_andnot(cm, cm, cpu_dying_mask);
> >
> > I'm not sure how the infrastructure changes in your concurrent down patchset
> > but can the cpu_dying_mask concurrently change at this stage?
> >
> 
> For the concurrent down patchset [1], it extends the cpu_down()
> capability to let an initiator to tear down several cpus in a batch
> and in parallel.
> 
> At the first step, all cpus to be torn down should experience
> cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
> set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
> each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
> need to fix it by using another loop to call
> cpuhp_kick_ap_work_async(cpu);)

So if I understand correctly there is a synchronization point for all
CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ?

And how about rollbacks through cpuhp_reset_state() ?

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-19 10:24     ` Pingfan Liu
@ 2022-09-19 10:51       ` Frederic Weisbecker
  2022-09-20  3:45         ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-19 10:51 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 19, 2022 at 06:24:21PM +0800, Pingfan Liu wrote:
> On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker
> <frederic@kernel.org> wrote:
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 438ecae6bd7e..ef6d3ae239b9 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> > >  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > >  {
> > >       struct task_struct *t = rnp->boost_kthread_task;
> > > -     unsigned long mask = rcu_rnp_online_cpus(rnp);
> > > +     unsigned long mask;
> > >       cpumask_var_t cm;
> > >       int cpu;
> > >
> > > @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > >       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
> > >               return;
> > >       mutex_lock(&rnp->boost_kthread_mutex);
> > > +     /*
> > > +      * Relying on the lock to serialize, so when onlining, the latest
> > > +      * qsmaskinitnext is for cpus_ptr.
> > > +      */
> > > +     mask = rcu_rnp_online_cpus(rnp);
> > >       for_each_leaf_node_possible_cpu(rnp, cpu)
> > >               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
> > >                   cpu != outgoingcpu)
> >
> > Right but you still race against concurrent rcu_report_dead() doing:
> >
> >       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
> >
> 
> Ah. Indeed, my commit log is not precisely described.
> 
> In fact, either speedup smp_init [1] or fast kexec reboot [2] still
> uses the model: one hotplug initiator and several reactors.  The
> initiators are still excluded from each other by the pair
> cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
> cpu-up and cpu-down event at the same time.

Yes but two downing CPUs may race right?

So isn't it possible to have rcu_report_dead() racing against
rcutree_offline_cpu()?

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-19 10:34       ` Frederic Weisbecker
@ 2022-09-20  3:16         ` Pingfan Liu
  2022-09-20  9:00           ` Frederic Weisbecker
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-20  3:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 19, 2022 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index ef6d3ae239b9..e5afc63bd97f 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > >                   cpu != outgoingcpu)
> > > >                       cpumask_set_cpu(cpu, cm);
> > > >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > > > +     /*
> > > > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > > > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > > > +      */
> > > > +     if (outgoingcpu != -1)
> > > > +             cpumask_andnot(cm, cm, cpu_dying_mask);
> > >
> > > I'm not sure how the infrastructure changes in your concurrent down patchset
> > > but can the cpu_dying_mask concurrently change at this stage?
> > >
> > 
> > For the concurrent down patchset [1], it extends the cpu_down()
> > capability to let an initiator to tear down several cpus in a batch
> > and in parallel.
> > 
> > At the first step, all cpus to be torn down should experience
> > cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
> > set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
> > each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
> > need to fix it by using another loop to call
> > cpuhp_kick_ap_work_async(cpu);)
> 
> So if I understand correctly there is a synchronization point for all
> CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ?
> 

Yes, your understanding is right.

> And how about rollbacks through cpuhp_reset_state() ?
> 

Originally, cpuhp_reset_state() is not considered in my fast kexec
reboot series since at that point, all devices have been shutdown and
have no way to back. The rebooting just adventures to move on.

But yes as you point out, cpuhp_reset_state() throws a challenge to keep
the stability of cpu_dying_mask.

Considering we have the following order.
1.
  set_cpu_dying(true)
  rcutree_offline_cpu()
2. when rollback
  set_cpu_dying(false)
  rcutree_online_cpu()


The dying mask is stable before rcu routines, and
rnp->boost_kthread_mutex can be used to build a order to access the
latest cpu_dying_mask as in [1/3].


Thanks,

	Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-19 10:51       ` Frederic Weisbecker
@ 2022-09-20  3:45         ` Pingfan Liu
  2022-09-20  9:23           ` Frederic Weisbecker
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-20  3:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 19, 2022 at 12:51:49PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 19, 2022 at 06:24:21PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker
> > <frederic@kernel.org> wrote:
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 438ecae6bd7e..ef6d3ae239b9 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> > > >  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > >  {
> > > >       struct task_struct *t = rnp->boost_kthread_task;
> > > > -     unsigned long mask = rcu_rnp_online_cpus(rnp);
> > > > +     unsigned long mask;
> > > >       cpumask_var_t cm;
> > > >       int cpu;
> > > >
> > > > @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > >       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
> > > >               return;
> > > >       mutex_lock(&rnp->boost_kthread_mutex);
> > > > +     /*
> > > > +      * Relying on the lock to serialize, so when onlining, the latest
> > > > +      * qsmaskinitnext is for cpus_ptr.
> > > > +      */
> > > > +     mask = rcu_rnp_online_cpus(rnp);
> > > >       for_each_leaf_node_possible_cpu(rnp, cpu)
> > > >               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
> > > >                   cpu != outgoingcpu)
> > >
> > > Right but you still race against concurrent rcu_report_dead() doing:
> > >
> > >       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
> > >
> > 
> > Ah. Indeed, my commit log is not precisely described.
> > 
> > In fact, either speedup smp_init [1] or fast kexec reboot [2] still
> > uses the model: one hotplug initiator and several reactors.  The
> > initiators are still excluded from each other by the pair
> > cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
> > cpu-up and cpu-down event at the same time.
> 
> Yes but two downing CPUs may race right?
> 
> So isn't it possible to have rcu_report_dead() racing against
> rcutree_offline_cpu()?
> 

Yes, two downing cpus can be in a batch.

There is a nuance between onlining and offlining:
-1. onlining relies on qsmaskinitnext, which is stable at the point
rcutree_online_cpu(). And it is what this patch aims for.
-2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext
which is not cleared yet at the point rcutree_offline_cpu().

It is asymmetric owning to the point of updating qsmaskinitnext.

Now considering the racing between rcu_report_dead() and
rcutree_offline_cpu():

No matter rcutree_offline_cpu() reads out either stale or fresh value of
qsmaskinitnext, it should intersect with cpu_dying_mask.

Thanks,

	Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-16 13:42   ` Frederic Weisbecker
@ 2022-09-20  7:26     ` Pingfan Liu
  2022-09-20  9:46       ` Frederic Weisbecker
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-20  7:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> > As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> > atomic operations, but the problem is that if there are multiple
> > nohz_full CPUs going offline concurrently, the first CPU to invoke
> > rcutree_dead_cpu() will turn the tick off.  This might require an
> > atomically manipulated counter to mediate the calls to
> > rcutree_dead_cpu(). "
> > 
> > This patch introduces a new member ->dying to rcu_node, which reflects
> > the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> > the first entrance and cleared by the last.
> > 
> > Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> > no lock, no extra locking order is introduced.
> > 
> > Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c | 19 ++++++++++++++-----
> >  kernel/rcu/tree.h |  1 +
> >  2 files changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8a829b64f5b2..f8bd0fc5fd2f 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  {
> >  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > +	unsigned long flags;
> > +	u8 dying;
> >  
> >  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >  		return 0;
> >  
> >  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -	// Stop-machine done, so allow nohz_full to disable tick.
> > -	tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +	dying = --rnp->dying;
> > +	if (!dying)
> > +		// Stop-machine done, so allow nohz_full to disable tick.
> > +		tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> 
> Note this is only locking the rdp's node, not the root node.
> Therefore if CPU 0 and CPU 256 are going off at the same time and they
> don't belong to the same node, the above won't protect against concurrent
> TICK_DEP_BIT_RCU set/clear.
> 

Nice, thanks for the careful thoughts. How about moving the counting
place to the root node?

> My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> anymore. I believe it was there because of issues that were fixed with:
> 
> 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> and:
> 
> 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> 
> It's unfortunately just suspicion because the reason for that tick dependency
> is unclear but I believe it should be safe to remove now.
> 

I have gone through this tick dependency again, but got less.

I think at least from the RCU's viewpoint, it is useless since
multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
requirement for tick interrupt.


Is there a way to have a convincing test so that these code can be removed?
Or this code will be got along with?


Thanks,

	Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-20  3:16         ` Pingfan Liu
@ 2022-09-20  9:00           ` Frederic Weisbecker
  0 siblings, 0 replies; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-20  9:00 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 20, 2022 at 11:16:09AM +0800, Pingfan Liu wrote:
> On Mon, Sep 19, 2022 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> > On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote:
> > > On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index ef6d3ae239b9..e5afc63bd97f 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > > >                   cpu != outgoingcpu)
> > > > >                       cpumask_set_cpu(cpu, cm);
> > > > >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > > > > +     /*
> > > > > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > > > > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > > > > +      */
> > > > > +     if (outgoingcpu != -1)
> > > > > +             cpumask_andnot(cm, cm, cpu_dying_mask);
> > > >
> > > > I'm not sure how the infrastructure changes in your concurrent down patchset
> > > > but can the cpu_dying_mask concurrently change at this stage?
> > > >
> > > 
> > > For the concurrent down patchset [1], it extends the cpu_down()
> > > capability to let an initiator to tear down several cpus in a batch
> > > and in parallel.
> > > 
> > > At the first step, all cpus to be torn down should experience
> > > cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
> > > set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
> > > each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
> > > need to fix it by using another loop to call
> > > cpuhp_kick_ap_work_async(cpu);)
> > 
> > So if I understand correctly there is a synchronization point for all
> > CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ?
> > 
> 
> Yes, your understanding is right.
> 
> > And how about rollbacks through cpuhp_reset_state() ?
> > 
> 
> Originally, cpuhp_reset_state() is not considered in my fast kexec
> reboot series since at that point, all devices have been shutdown and
> have no way to back. The rebooting just adventures to move on.
> 
> But yes as you point out, cpuhp_reset_state() throws a challenge to keep
> the stability of cpu_dying_mask.
> 
> Considering we have the following order.
> 1.
>   set_cpu_dying(true)
>   rcutree_offline_cpu()
> 2. when rollback
>   set_cpu_dying(false)
>   rcutree_online_cpu()
> 
> 
> The dying mask is stable before rcu routines, and
> rnp->boost_kthread_mutex can be used to build a order to access the
> latest cpu_dying_mask as in [1/3].

Ok thanks for the clarification!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-20  3:45         ` Pingfan Liu
@ 2022-09-20  9:23           ` Frederic Weisbecker
  2022-10-01  2:26             ` Joel Fernandes
  0 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-20  9:23 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote:
> On Mon, Sep 19, 2022 at 12:51:49PM +0200, Frederic Weisbecker wrote:
> > On Mon, Sep 19, 2022 at 06:24:21PM +0800, Pingfan Liu wrote:
> > > On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker
> > > <frederic@kernel.org> wrote:
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index 438ecae6bd7e..ef6d3ae239b9 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> > > > >  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > > >  {
> > > > >       struct task_struct *t = rnp->boost_kthread_task;
> > > > > -     unsigned long mask = rcu_rnp_online_cpus(rnp);
> > > > > +     unsigned long mask;
> > > > >       cpumask_var_t cm;
> > > > >       int cpu;
> > > > >
> > > > > @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > > >       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
> > > > >               return;
> > > > >       mutex_lock(&rnp->boost_kthread_mutex);
> > > > > +     /*
> > > > > +      * Relying on the lock to serialize, so when onlining, the latest
> > > > > +      * qsmaskinitnext is for cpus_ptr.
> > > > > +      */
> > > > > +     mask = rcu_rnp_online_cpus(rnp);
> > > > >       for_each_leaf_node_possible_cpu(rnp, cpu)
> > > > >               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
> > > > >                   cpu != outgoingcpu)
> > > >
> > > > Right but you still race against concurrent rcu_report_dead() doing:
> > > >
> > > >       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
> > > >
> > > 
> > > Ah. Indeed, my commit log is not precisely described.
> > > 
> > > In fact, either speedup smp_init [1] or fast kexec reboot [2] still
> > > uses the model: one hotplug initiator and several reactors.  The
> > > initiators are still excluded from each other by the pair
> > > cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
> > > cpu-up and cpu-down event at the same time.
> > 
> > Yes but two downing CPUs may race right?
> > 
> > So isn't it possible to have rcu_report_dead() racing against
> > rcutree_offline_cpu()?
> > 
> 
> Yes, two downing cpus can be in a batch.
> 
> There is a nuance between onlining and offlining:
> -1. onlining relies on qsmaskinitnext, which is stable at the point
> rcutree_online_cpu(). And it is what this patch aims for.
> -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext
> which is not cleared yet at the point rcutree_offline_cpu().
> 
> It is asymmetric owning to the point of updating qsmaskinitnext.
> 
> Now considering the racing between rcu_report_dead() and
> rcutree_offline_cpu():
> 
> No matter rcutree_offline_cpu() reads out either stale or fresh value of
> qsmaskinitnext, it should intersect with cpu_dying_mask.

Ah I see now, so the race can happen but it is then cancelled by the
stable READ on cpu_dying_mask. Ok.

> 
> Thanks,
> 
> 	Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-15  5:58 ` [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining Pingfan Liu
  2022-09-16 14:23   ` Frederic Weisbecker
@ 2022-09-20  9:38   ` Frederic Weisbecker
  2022-09-21 11:48     ` Pingfan Liu
  1 sibling, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-20  9:38 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> During offlining, the concurrent rcutree_offline_cpu() can not be aware
> of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> information at that point and can be utilized.
> 
> Besides, a trivial change which removes the redudant call to
> rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> rcutree_offline_cpu() can fully serve that purpose.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c        | 2 --
>  kernel/rcu/tree_plugin.h | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..8a829b64f5b2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	/* Adjust any no-longer-needed kthreads. */
> -	rcu_boost_kthread_setaffinity(rnp, -1);
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
>  	return 0;
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index ef6d3ae239b9..e5afc63bd97f 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  		    cpu != outgoingcpu)
>  			cpumask_set_cpu(cpu, cm);
>  	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> +	/*
> +	 * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.

For clarification, the comment could be:

           While concurrently offlining, rcu_report_dead() can race, making
           ->qsmaskinitnext unstable. So rely on cpu_dying_mask which is stable
           and already contains all the currently offlining CPUs.

Thanks!

> +	 * So resort to cpu_dying_mask, whose changes has already been visible.
> +	 */
> +	if (outgoingcpu != -1)
> +		cpumask_andnot(cm, cm, cpu_dying_mask);
>  	if (cpumask_empty(cm))
>  		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
>  	set_cpus_allowed_ptr(t, cm);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-20  7:26     ` Pingfan Liu
@ 2022-09-20  9:46       ` Frederic Weisbecker
  2022-09-20 19:13         ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-20  9:46 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > Note this is only locking the rdp's node, not the root node.
> > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > don't belong to the same node, the above won't protect against concurrent
> > TICK_DEP_BIT_RCU set/clear.
> > 
> 
> Nice, thanks for the careful thoughts. How about moving the counting
> place to the root node?

You could but then you'd need to lock the root node.

> 
> > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > anymore. I believe it was there because of issues that were fixed with:
> > 
> > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > and:
> > 
> > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > 
> > It's unfortunately just suspicion because the reason for that tick dependency
> > is unclear but I believe it should be safe to remove now.
> > 
> 
> I have gone through this tick dependency again, but got less.
> 
> I think at least from the RCU's viewpoint, it is useless since
> multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> requirement for tick interrupt.

Partly yes.

> Is there a way to have a convincing test so that these code can be removed?
> Or this code will be got along with?

Hmm, Paul might remember which rcutorture scenario would trigger it?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
  2022-09-15  6:11   ` Pingfan Liu
  2022-09-16 14:52   ` Frederic Weisbecker
@ 2022-09-20 10:31   ` Frederic Weisbecker
  2022-09-21 11:56     ` Pingfan Liu
  2 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-20 10:31 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 15, 2022 at 01:58:23PM +0800, Pingfan Liu wrote:
> rcutree_online_cpu() is concurrent, so there is the following race
> scene:
> 
>         CPU 1                               CPU2
> mask_old = rcu_rnp_online_cpus(rnp);
> ...
> 
>                                    mask_new = rcu_rnp_online_cpus(rnp);
>                                    ...
>                                    set_cpus_allowed_ptr(t, cm);
> 
> set_cpus_allowed_ptr(t, cm);

Just for clarity, may I suggest:


          CPU 0                                    CPU 1
          -----                                    -----
       rcutree_online_cpu()
           rcu_boost_kthread_setaffinity()
               mask = rcu_rnp_online_cpus(rnp);
                                                rcu_cpu_starting()
                                                    WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
                                                rcutree_online_cpu()
                                                    rcu_boost_kthread_setaffinity()
                                                        mask = rcu_rnp_online_cpus(rnp);
                                                        set_cpus_allowed_ptr(t, cm);
               set_cpus_allowed_ptr(t, cm);


> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>  		return;
>  	mutex_lock(&rnp->boost_kthread_mutex);
> +	/*
> +	 * Relying on the lock to serialize, so when onlining, the latest
> +	 * qsmaskinitnext is for cpus_ptr.
> +	 */

Given how tricky the situation is, I suggest to also add more details to the
comment:

        /*
         * Rely on the boost mutex to serialize ->qsmaskinitnext and
         * set_cpus_allowed_ptr(), so when concurrently onlining, racing against
	 * rcu_cpu_starting() makes sure that the latest update eventually
	 * wins with the right affinity setting.
         */

Thanks!

> +	mask = rcu_rnp_online_cpus(rnp);
>  	for_each_leaf_node_possible_cpu(rnp, cpu)
>  		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>  		    cpu != outgoingcpu)
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-20  9:46       ` Frederic Weisbecker
@ 2022-09-20 19:13         ` Paul E. McKenney
  2022-09-22  9:29           ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-09-20 19:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Pingfan Liu, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 20, 2022 at 11:46:45AM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > > Note this is only locking the rdp's node, not the root node.
> > > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > > don't belong to the same node, the above won't protect against concurrent
> > > TICK_DEP_BIT_RCU set/clear.
> > > 
> > 
> > Nice, thanks for the careful thoughts. How about moving the counting
> > place to the root node?
> 
> You could but then you'd need to lock the root node.
> 
> > > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > > anymore. I believe it was there because of issues that were fixed with:
> > > 
> > > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > > and:
> > > 
> > > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > > 
> > > It's unfortunately just suspicion because the reason for that tick dependency
> > > is unclear but I believe it should be safe to remove now.
> > > 
> > 
> > I have gone through this tick dependency again, but got less.
> > 
> > I think at least from the RCU's viewpoint, it is useless since
> > multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> > requirement for tick interrupt.
> 
> Partly yes.
> 
> > Is there a way to have a convincing test so that these code can be removed?
> > Or this code will be got along with?
> 
> Hmm, Paul might remember which rcutorture scenario would trigger it?

TREE04 on multisocket systems, preferably with faster CPU-hotplug
operations.  This can be accomplished by adding this to the kvm.sh
command line:

	rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30

It does take some time to run.  I did 4,000 hours worth of TREE04
to confirm lack of bug.  But an 80-CPU dual-socket system can run
10 concurrent instances of TREE04, which gets things down to a more
manageable 400 hours.  Please let me know if you don't have access
to a few such systems.

I will let Frederic identify which commit(s) should be reverted in
order to test the test.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining
  2022-09-20  9:38   ` Frederic Weisbecker
@ 2022-09-21 11:48     ` Pingfan Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-21 11:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 20, 2022 at 5:39 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> > During offlining, the concurrent rcutree_offline_cpu() can not be aware
> > of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> > information at that point and can be utilized.
> >
> > Besides, a trivial change which removes the redudant call to
> > rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> > rcutree_offline_cpu() can fully serve that purpose.
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c        | 2 --
> >  kernel/rcu/tree_plugin.h | 6 ++++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..8a829b64f5b2 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >               return 0;
> >
> >       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -     /* Adjust any no-longer-needed kthreads. */
> > -     rcu_boost_kthread_setaffinity(rnp, -1);
> >       // Stop-machine done, so allow nohz_full to disable tick.
> >       tick_dep_clear(TICK_DEP_BIT_RCU);
> >       return 0;
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index ef6d3ae239b9..e5afc63bd97f 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >                   cpu != outgoingcpu)
> >                       cpumask_set_cpu(cpu, cm);
> >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > +     /*
> > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
>
> For clarification, the comment could be:
>
>            While concurrently offlining, rcu_report_dead() can race, making
>            ->qsmaskinitnext unstable. So rely on cpu_dying_mask which is stable
>            and already contains all the currently offlining CPUs.
>

It  is a neat description.

Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-20 10:31   ` Frederic Weisbecker
@ 2022-09-21 11:56     ` Pingfan Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-21 11:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 20, 2022 at 12:31:43PM +0200, Frederic Weisbecker wrote:
> On Thu, Sep 15, 2022 at 01:58:23PM +0800, Pingfan Liu wrote:
> > rcutree_online_cpu() is concurrent, so there is the following race
> > scene:
> > 
> >         CPU 1                               CPU2
> > mask_old = rcu_rnp_online_cpus(rnp);
> > ...
> > 
> >                                    mask_new = rcu_rnp_online_cpus(rnp);
> >                                    ...
> >                                    set_cpus_allowed_ptr(t, cm);
> > 
> > set_cpus_allowed_ptr(t, cm);
> 
> Just for clarity, may I suggest:
> 
> 
>           CPU 0                                    CPU 1
>           -----                                    -----
>        rcutree_online_cpu()
>            rcu_boost_kthread_setaffinity()
>                mask = rcu_rnp_online_cpus(rnp);
>                                                 rcu_cpu_starting()
>                                                     WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
>                                                 rcutree_online_cpu()
>                                                     rcu_boost_kthread_setaffinity()
>                                                         mask = rcu_rnp_online_cpus(rnp);
>                                                         set_cpus_allowed_ptr(t, cm);
>                set_cpus_allowed_ptr(t, cm);
> 
> 

OK.

> > @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >  	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
> >  		return;
> >  	mutex_lock(&rnp->boost_kthread_mutex);
> > +	/*
> > +	 * Relying on the lock to serialize, so when onlining, the latest
> > +	 * qsmaskinitnext is for cpus_ptr.
> > +	 */
> 
> Given how tricky the situation is, I suggest to also add more details to the
> comment:
> 
>         /*
>          * Rely on the boost mutex to serialize ->qsmaskinitnext and
>          * set_cpus_allowed_ptr(), so when concurrently onlining, racing against
> 	 * rcu_cpu_starting() makes sure that the latest update eventually
> 	 * wins with the right affinity setting.
>          */
> 

Appreciate for your re-phrase. I will use it in the next version


Thanks,
	Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-20 19:13         ` Paul E. McKenney
@ 2022-09-22  9:29           ` Pingfan Liu
  2022-09-22 13:54             ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-22  9:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 20, 2022 at 12:13:39PM -0700, Paul E. McKenney wrote:
> On Tue, Sep 20, 2022 at 11:46:45AM +0200, Frederic Weisbecker wrote:
> > On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> > > On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > > > Note this is only locking the rdp's node, not the root node.
> > > > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > > > don't belong to the same node, the above won't protect against concurrent
> > > > TICK_DEP_BIT_RCU set/clear.
> > > > 
> > > 
> > > Nice, thanks for the careful thoughts. How about moving the counting
> > > place to the root node?
> > 
> > You could but then you'd need to lock the root node.
> > 
> > > > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > > > anymore. I believe it was there because of issues that were fixed with:
> > > > 
> > > > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > > > and:
> > > > 
> > > > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > > > 
> > > > It's unfortunately just suspicion because the reason for that tick dependency
> > > > is unclear but I believe it should be safe to remove now.
> > > > 
> > > 
> > > I have gone through this tick dependency again, but got less.
> > > 
> > > I think at least from the RCU's viewpoint, it is useless since
> > > multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> > > requirement for tick interrupt.
> > 
> > Partly yes.
> > 
> > > Is there a way to have a convincing test so that these code can be removed?
> > > Or this code will be got along with?
> > 
> > Hmm, Paul might remember which rcutorture scenario would trigger it?
> 
> TREE04 on multisocket systems, preferably with faster CPU-hotplug
> operations.  This can be accomplished by adding this to the kvm.sh
> command line:
> 
> 	rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
> 

Is it ok with "sh kvm.sh  --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs TREE04"

> It does take some time to run.  I did 4,000 hours worth of TREE04
                                        ^^^ '--duration=4000h' can serve this purpose?
Is it related with the cpu's freq?

> to confirm lack of bug.  But an 80-CPU dual-socket system can run
> 10 concurrent instances of TREE04, which gets things down to a more

The total demanded hours H = 4000/(system_cpu_num/8)?

> manageable 400 hours.  Please let me know if you don't have access
> to a few such systems.
>

I am happy to have a try if needed. I will try to get a powerful
machine, which can shrink the test time.

> I will let Frederic identify which commit(s) should be reverted in
> order to test the test.
> 

My understanding is after removing the tick dep by
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..cbfc884f04a4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
        WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
        /* Adjust any no-longer-needed kthreads. */
        rcu_boost_kthread_setaffinity(rnp, -1);
-       // Stop-machine done, so allow nohz_full to disable tick.
-       tick_dep_clear(TICK_DEP_BIT_RCU);
        return 0;
 }

@@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
        sync_sched_exp_online_cleanup(cpu);
        rcutree_affinity_setting(cpu, -1);

-       // Stop-machine done, so allow nohz_full to disable tick.
-       tick_dep_clear(TICK_DEP_BIT_RCU);
        return 0;
 }

@@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)

        rcutree_affinity_setting(cpu, cpu);

-       // nohz_full CPUs need the tick for stop-machine to work quickly
-       tick_dep_set(TICK_DEP_BIT_RCU);
        return 0;
 }

If the TREE04 can success, then move on to revert the commit(s)
identified by Frederic, and do test again.

At this time, a TREE04 failure is expected.

If the above two results are observed, TICK_DEP_BIT_RCU can be
removed.

Is my understanding right?


Thanks,

	Pingfan

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-22  9:29           ` Pingfan Liu
@ 2022-09-22 13:54             ` Paul E. McKenney
  2022-09-23 22:13               ` Frederic Weisbecker
  2022-09-26  6:34               ` Pingfan Liu
  0 siblings, 2 replies; 61+ messages in thread
From: Paul E. McKenney @ 2022-09-22 13:54 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 22, 2022 at 05:29:32PM +0800, Pingfan Liu wrote:
> On Tue, Sep 20, 2022 at 12:13:39PM -0700, Paul E. McKenney wrote:
> > On Tue, Sep 20, 2022 at 11:46:45AM +0200, Frederic Weisbecker wrote:
> > > On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> > > > On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > > > > Note this is only locking the rdp's node, not the root node.
> > > > > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > > > > don't belong to the same node, the above won't protect against concurrent
> > > > > TICK_DEP_BIT_RCU set/clear.
> > > > > 
> > > > 
> > > > Nice, thanks for the careful thoughts. How about moving the counting
> > > > place to the root node?
> > > 
> > > You could but then you'd need to lock the root node.
> > > 
> > > > > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > > > > anymore. I believe it was there because of issues that were fixed with:
> > > > > 
> > > > > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > > > > and:
> > > > > 
> > > > > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > > > > 
> > > > > It's unfortunately just suspicion because the reason for that tick dependency
> > > > > is unclear but I believe it should be safe to remove now.
> > > > > 
> > > > 
> > > > I have gone through this tick dependency again, but got less.
> > > > 
> > > > I think at least from the RCU's viewpoint, it is useless since
> > > > multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> > > > requirement for tick interrupt.
> > > 
> > > Partly yes.
> > > 
> > > > Is there a way to have a convincing test so that these code can be removed?
> > > > Or this code will be got along with?
> > > 
> > > Hmm, Paul might remember which rcutorture scenario would trigger it?
> > 
> > TREE04 on multisocket systems, preferably with faster CPU-hotplug
> > operations.  This can be accomplished by adding this to the kvm.sh
> > command line:
> > 
> > 	rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
> > 
> 
> Is it ok with "sh kvm.sh  --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs TREE04"

If you have tools/.../rcutorture/bin on your path, yes.  This would default
to a 30-minute run.  If you have at least 16 CPUs, you should add
"--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
do this:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"

This would run four concurrent instances of the TREE04 scenario, each for
10 hours, for a total of 40 hours of test time.

> > It does take some time to run.  I did 4,000 hours worth of TREE04
>                                         ^^^ '--duration=4000h' can serve this purpose?

You could, at least if you replace the "=" with a space character, but
that really would run a six-month test, which is probably not what you
want to do.  There being 8,760 hours in a year and all that.

> Is it related with the cpu's freq?

Not at all.  '--duration 10h' would run ten hours of wall-clock time
regardless of the CPU frequencies.

> > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > 10 concurrent instances of TREE04, which gets things down to a more
> 
> The total demanded hours H = 4000/(system_cpu_num/8)?

Yes.  You can also use multiple systems, which is what kvm-remote.sh is
intended for, again assuming 80 CPUs per system to keep the arithmetic
simple:

tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"

Here "sys1" is the name of the system, on which you must have an account
so that "ssh sys1 date" runs the date command on the first remote system.
You really do have to use the "--cpus 80" because kvm-remote.sh does not
assume that the system that it is running on is one of the test systems.

> > manageable 400 hours.  Please let me know if you don't have access
> > to a few such systems.
> 
> I am happy to have a try if needed. I will try to get a powerful
> machine, which can shrink the test time.

Larger numbers of little systems work, also, but in my experience you need
a dual-socket system to have a reasonable chance of reproducing this bug.
Each socket can be small, though, if that helps.

If you work for a cloud provider or some such, you can probably get a
large number of systems.  If you can only get a few, you can do initial
testing, and then we can work out what to do about heavier-duty testing.

> > I will let Frederic identify which commit(s) should be reverted in
> > order to test the test.
> > 
> 
> My understanding is after removing the tick dep by
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..cbfc884f04a4 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
>         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
>         /* Adjust any no-longer-needed kthreads. */
>         rcu_boost_kthread_setaffinity(rnp, -1);
> -       // Stop-machine done, so allow nohz_full to disable tick.
> -       tick_dep_clear(TICK_DEP_BIT_RCU);
>         return 0;
>  }
> 
> @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
>         sync_sched_exp_online_cleanup(cpu);
>         rcutree_affinity_setting(cpu, -1);
> 
> -       // Stop-machine done, so allow nohz_full to disable tick.
> -       tick_dep_clear(TICK_DEP_BIT_RCU);
>         return 0;
>  }
> 
> @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> 
>         rcutree_affinity_setting(cpu, cpu);
> 
> -       // nohz_full CPUs need the tick for stop-machine to work quickly
> -       tick_dep_set(TICK_DEP_BIT_RCU);
>         return 0;
>  }
> 
> If the TREE04 can success, then move on to revert the commit(s)
> identified by Frederic, and do test again.
> 
> At this time, a TREE04 failure is expected.
> 
> If the above two results are observed, TICK_DEP_BIT_RCU can be
> removed.
> 
> Is my understanding right?

Seems plausible to me, but I again defer to Frederic.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-22 13:54             ` Paul E. McKenney
@ 2022-09-23 22:13               ` Frederic Weisbecker
  2022-09-26  6:34               ` Pingfan Liu
  1 sibling, 0 replies; 61+ messages in thread
From: Frederic Weisbecker @ 2022-09-23 22:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Pingfan Liu, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 22, 2022 at 05:29:32PM +0800, Pingfan Liu wrote:
> > My understanding is after removing the tick dep by
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..cbfc884f04a4 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> >         /* Adjust any no-longer-needed kthreads. */
> >         rcu_boost_kthread_setaffinity(rnp, -1);
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
> >         sync_sched_exp_online_cleanup(cpu);
> >         rcutree_affinity_setting(cpu, -1);
> > 
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > 
> >         rcutree_affinity_setting(cpu, cpu);
> > 
> > -       // nohz_full CPUs need the tick for stop-machine to work quickly
> > -       tick_dep_set(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > If the TREE04 can success, then move on to revert the commit(s)
> > identified by Frederic, and do test again.
> > 
> > At this time, a TREE04 failure is expected.
> > 
> > If the above two results are observed, TICK_DEP_BIT_RCU can be
> > removed.
> > 
> > Is my understanding right?
> 
> Seems plausible to me, but I again defer to Frederic.

Right, more precisely the commit to revert is:

     96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")

which is the diff Pingfan Liu proposed above. The other uses of TICK_DEP_BIT_RCU
will still be needed though.

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-22 13:54             ` Paul E. McKenney
  2022-09-23 22:13               ` Frederic Weisbecker
@ 2022-09-26  6:34               ` Pingfan Liu
  2022-09-26 22:23                 ` Paul E. McKenney
  1 sibling, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-26  6:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

Sorry to reply late. I just realize this e-mail misses in my gmail.

On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
[...]
> 
> If you have tools/.../rcutorture/bin on your path, yes.  This would default
> to a 30-minute run.  If you have at least 16 CPUs, you should add
                                            ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8

> "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> do this:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> 

I have tried to find a two socket system with 128 cpus and run
  sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04

Where 250*16=4000


> This would run four concurrent instances of the TREE04 scenario, each for
> 10 hours, for a total of 40 hours of test time.
> 
> > > It does take some time to run.  I did 4,000 hours worth of TREE04
> >                                         ^^^ '--duration=4000h' can serve this purpose?
> 
> You could, at least if you replace the "=" with a space character, but
> that really would run a six-month test, which is probably not what you
> want to do.  There being 8,760 hours in a year and all that.
> 
> > Is it related with the cpu's freq?
> 
> Not at all.  '--duration 10h' would run ten hours of wall-clock time
> regardless of the CPU frequencies.
> 
> > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > 10 concurrent instances of TREE04, which gets things down to a more
> > 
> > The total demanded hours H = 4000/(system_cpu_num/8)?
> 
> Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> intended for, again assuming 80 CPUs per system to keep the arithmetic
> simple:
> 
> tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> 

That is appealing.

I will see if any opportunity to grasp a batch of machines to run the
test.


Thanks,

	Pingfan
> Here "sys1" is the name of the system, on which you must have an account
> so that "ssh sys1 date" runs the date command on the first remote system.
> You really do have to use the "--cpus 80" because kvm-remote.sh does not
> assume that the system that it is running on is one of the test systems.
> 
> > > manageable 400 hours.  Please let me know if you don't have access
> > > to a few such systems.
> > 
> > I am happy to have a try if needed. I will try to get a powerful
> > machine, which can shrink the test time.
> 
> Larger numbers of little systems work, also, but in my experience you need
> a dual-socket system to have a reasonable chance of reproducing this bug.
> Each socket can be small, though, if that helps.
> 
> If you work for a cloud provider or some such, you can probably get a
> large number of systems.  If you can only get a few, you can do initial
> testing, and then we can work out what to do about heavier-duty testing.
> 
> > > I will let Frederic identify which commit(s) should be reverted in
> > > order to test the test.
> > > 
> > 
> > My understanding is after removing the tick dep by
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..cbfc884f04a4 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> >         /* Adjust any no-longer-needed kthreads. */
> >         rcu_boost_kthread_setaffinity(rnp, -1);
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
> >         sync_sched_exp_online_cleanup(cpu);
> >         rcutree_affinity_setting(cpu, -1);
> > 
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > 
> >         rcutree_affinity_setting(cpu, cpu);
> > 
> > -       // nohz_full CPUs need the tick for stop-machine to work quickly
> > -       tick_dep_set(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > If the TREE04 can success, then move on to revert the commit(s)
> > identified by Frederic, and do test again.
> > 
> > At this time, a TREE04 failure is expected.
> > 
> > If the above two results are observed, TICK_DEP_BIT_RCU can be
> > removed.
> > 
> > Is my understanding right?
> 
> Seems plausible to me, but I again defer to Frederic.
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-15  5:58 ` [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining Pingfan Liu
  2022-09-16 13:42   ` Frederic Weisbecker
@ 2022-09-26 16:13   ` Joel Fernandes
  2022-09-27  9:42     ` Pingfan Liu
  1 sibling, 1 reply; 61+ messages in thread
From: Joel Fernandes @ 2022-09-26 16:13 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> atomic operations, but the problem is that if there are multiple
> nohz_full CPUs going offline concurrently, the first CPU to invoke
> rcutree_dead_cpu() will turn the tick off.  This might require an
> atomically manipulated counter to mediate the calls to
> rcutree_dead_cpu(). "
> 
> This patch introduces a new member ->dying to rcu_node, which reflects
> the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> the first entrance and cleared by the last.
> 
> Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> no lock, no extra locking order is introduced.
> 
> Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c | 19 ++++++++++++++-----
>  kernel/rcu/tree.h |  1 +
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8a829b64f5b2..f8bd0fc5fd2f 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
>  {
>  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> +	unsigned long flags;
> +	u8 dying;
>  
>  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	// Stop-machine done, so allow nohz_full to disable tick.
> -	tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	dying = --rnp->dying;
> +	if (!dying)
> +		// Stop-machine done, so allow nohz_full to disable tick.
> +		tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	return 0;
>  }
>  
> @@ -4020,17 +4026,20 @@ int rcutree_offline_cpu(unsigned int cpu)
>  	unsigned long flags;
>  	struct rcu_data *rdp;
>  	struct rcu_node *rnp;
> +	u8 dying;
>  
>  	rdp = per_cpu_ptr(&rcu_data, cpu);
>  	rnp = rdp->mynode;
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	rnp->ffmask &= ~rdp->grpmask;

Just to ensure the first increment sets the tick dep and the last decrement
resets it, it would be nice to add a check here:

WARN_ON_ONCE(!rnp->dying && tick_dep_test(TICK_DEP_BIT_RCU));

And correpondingly on the tick decrement:
WARN_ON_ONCE(rnp->dying > 0 && !tick_dep_test(TICK_DEP_BIT_RCU));

Of course that will require adding a new API: tick_dep_test, but might be
worth it.

(I think this should catch concurrency bugs such as involving the rnp lock
that Frederic pointed out).

thanks,

 - Joel



> +	/* Let rcutree_dead_cpu() know a new offlining. */
> +	dying = rnp->dying++;
> +	if (!dying)
> +		// nohz_full CPUs need the tick for stop-machine to work quickly
> +		tick_dep_set(TICK_DEP_BIT_RCU);



>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  
>  	rcutree_affinity_setting(cpu, cpu);
> -
> -	// nohz_full CPUs need the tick for stop-machine to work quickly
> -	tick_dep_set(TICK_DEP_BIT_RCU);
>  	return 0;
>  }
>  
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index d4a97e40ea9c..b508a12ac953 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -81,6 +81,7 @@ struct rcu_node {
>  	int	grphi;		/* highest-numbered CPU here. */
>  	u8	grpnum;		/* group number for next level up. */
>  	u8	level;		/* root is at level 0. */
> +	u8	dying;		/* num of concurrent rdp offlining */
>  	bool	wait_blkd_tasks;/* Necessary to wait for blocked tasks to */
>  				/*  exit RCU read-side critical sections */
>  				/*  before propagating offline up the */
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-26  6:34               ` Pingfan Liu
@ 2022-09-26 22:23                 ` Paul E. McKenney
  2022-09-27  9:59                   ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-09-26 22:23 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> Sorry to reply late. I just realize this e-mail misses in my gmail.
> 
> On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> [...]
> > 
> > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > to a 30-minute run.  If you have at least 16 CPUs, you should add
>                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8

Yes, you will get some benefit from --allcpus on systems with from 9-15
CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.

> > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > do this:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > 
> 
> I have tried to find a two socket system with 128 cpus and run
>   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> 
> Where 250*16=4000

That would work.

> > This would run four concurrent instances of the TREE04 scenario, each for
> > 10 hours, for a total of 40 hours of test time.
> > 
> > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > 
> > You could, at least if you replace the "=" with a space character, but
> > that really would run a six-month test, which is probably not what you
> > want to do.  There being 8,760 hours in a year and all that.
> > 
> > > Is it related with the cpu's freq?
> > 
> > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > regardless of the CPU frequencies.
> > 
> > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > 
> > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > 
> > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > simple:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > 
> 
> That is appealing.
> 
> I will see if any opportunity to grasp a batch of machines to run the
> test.

Initial tests with smaller numbers of CPUs are also useful, for example,
in case reversion causes some bug due to bad interaction with a later
commit.

Please let me know how it goes!

							Thanx, Paul

> Thanks,
> 
> 	Pingfan
> > Here "sys1" is the name of the system, on which you must have an account
> > so that "ssh sys1 date" runs the date command on the first remote system.
> > You really do have to use the "--cpus 80" because kvm-remote.sh does not
> > assume that the system that it is running on is one of the test systems.
> > 
> > > > manageable 400 hours.  Please let me know if you don't have access
> > > > to a few such systems.
> > > 
> > > I am happy to have a try if needed. I will try to get a powerful
> > > machine, which can shrink the test time.
> > 
> > Larger numbers of little systems work, also, but in my experience you need
> > a dual-socket system to have a reasonable chance of reproducing this bug.
> > Each socket can be small, though, if that helps.
> > 
> > If you work for a cloud provider or some such, you can probably get a
> > large number of systems.  If you can only get a few, you can do initial
> > testing, and then we can work out what to do about heavier-duty testing.
> > 
> > > > I will let Frederic identify which commit(s) should be reverted in
> > > > order to test the test.
> > > > 
> > > 
> > > My understanding is after removing the tick dep by
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 79aea7df4345..cbfc884f04a4 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > >         /* Adjust any no-longer-needed kthreads. */
> > >         rcu_boost_kthread_setaffinity(rnp, -1);
> > > -       // Stop-machine done, so allow nohz_full to disable tick.
> > > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> > >         return 0;
> > >  }
> > > 
> > > @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
> > >         sync_sched_exp_online_cleanup(cpu);
> > >         rcutree_affinity_setting(cpu, -1);
> > > 
> > > -       // Stop-machine done, so allow nohz_full to disable tick.
> > > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> > >         return 0;
> > >  }
> > > 
> > > @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > 
> > >         rcutree_affinity_setting(cpu, cpu);
> > > 
> > > -       // nohz_full CPUs need the tick for stop-machine to work quickly
> > > -       tick_dep_set(TICK_DEP_BIT_RCU);
> > >         return 0;
> > >  }
> > > 
> > > If the TREE04 can success, then move on to revert the commit(s)
> > > identified by Frederic, and do test again.
> > > 
> > > At this time, a TREE04 failure is expected.
> > > 
> > > If the above two results are observed, TICK_DEP_BIT_RCU can be
> > > removed.
> > > 
> > > Is my understanding right?
> > 
> > Seems plausible to me, but I again defer to Frederic.
> > 
> > 							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-26 16:13   ` Joel Fernandes
@ 2022-09-27  9:42     ` Pingfan Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-27  9:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, Paul E. McKenney, David Woodhouse, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Mon, Sep 26, 2022 at 04:13:42PM +0000, Joel Fernandes wrote:
> On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> > As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> > atomic operations, but the problem is that if there are multiple
> > nohz_full CPUs going offline concurrently, the first CPU to invoke
> > rcutree_dead_cpu() will turn the tick off.  This might require an
> > atomically manipulated counter to mediate the calls to
> > rcutree_dead_cpu(). "
> > 
> > This patch introduces a new member ->dying to rcu_node, which reflects
> > the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> > the first entrance and cleared by the last.
> > 
> > Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> > no lock, no extra locking order is introduced.
> > 
> > Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c | 19 ++++++++++++++-----
> >  kernel/rcu/tree.h |  1 +
> >  2 files changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8a829b64f5b2..f8bd0fc5fd2f 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  {
> >  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > +	unsigned long flags;
> > +	u8 dying;
> >  
> >  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >  		return 0;
> >  
> >  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -	// Stop-machine done, so allow nohz_full to disable tick.
> > -	tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +	dying = --rnp->dying;
> > +	if (!dying)
> > +		// Stop-machine done, so allow nohz_full to disable tick.
> > +		tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >  	return 0;
> >  }
> >  
> > @@ -4020,17 +4026,20 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  	unsigned long flags;
> >  	struct rcu_data *rdp;
> >  	struct rcu_node *rnp;
> > +	u8 dying;
> >  
> >  	rdp = per_cpu_ptr(&rcu_data, cpu);
> >  	rnp = rdp->mynode;
> >  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >  	rnp->ffmask &= ~rdp->grpmask;
> 
> Just to ensure the first increment sets the tick dep and the last decrement
> resets it, it would be nice to add a check here:
> 
> WARN_ON_ONCE(!rnp->dying && tick_dep_test(TICK_DEP_BIT_RCU));
> 
> And correpondingly on the tick decrement:
> WARN_ON_ONCE(rnp->dying > 0 && !tick_dep_test(TICK_DEP_BIT_RCU));
> 
> Of course that will require adding a new API: tick_dep_test, but might be
> worth it.
> 
> (I think this should catch concurrency bugs such as involving the rnp lock
> that Frederic pointed out).
> 

Thank for your suggestion. But let us see the agressive method firstly,
i.e. removing the tick dep completely. At present, it seems promising.


Thanks,

	Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-26 22:23                 ` Paul E. McKenney
@ 2022-09-27  9:59                   ` Pingfan Liu
  2022-09-29  8:19                     ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-27  9:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 26, 2022 at 03:23:52PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> > Sorry to reply late. I just realize this e-mail misses in my gmail.
> > 
> > On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> > [...]
> > > 
> > > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > > to a 30-minute run.  If you have at least 16 CPUs, you should add
> >                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8
> 
> Yes, you will get some benefit from --allcpus on systems with from 9-15
> CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.
> 
> > > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > > do this:
> > > 
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > > 
> > 
> > I have tried to find a two socket system with 128 cpus and run
> >   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> > 
> > Where 250*16=4000
> 
> That would work.
> 

This job has successfully run 24+ hours. (But maybe I can only keep it
about 180 hours)

> > > This would run four concurrent instances of the TREE04 scenario, each for
> > > 10 hours, for a total of 40 hours of test time.
> > > 
> > > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > > 
> > > You could, at least if you replace the "=" with a space character, but
> > > that really would run a six-month test, which is probably not what you
> > > want to do.  There being 8,760 hours in a year and all that.
> > > 
> > > > Is it related with the cpu's freq?
> > > 
> > > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > > regardless of the CPU frequencies.
> > > 
> > > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > > 
> > > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > > 
> > > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > > simple:
> > > 
> > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > > 
> > 
> > That is appealing.
> > 
> > I will see if any opportunity to grasp a batch of machines to run the
> > test.
> 
> Initial tests with smaller numbers of CPUs are also useful, for example,
> in case reversion causes some bug due to bad interaction with a later
> commit.
> 
> Please let me know how it goes!
> 

I have managed to grasp three two-socket machine, each has 256 cpus.
The test has run about 7 hours till now without any problem by the following command:
tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
--duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"

It seems promising.


Thanks,

	Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-27  9:59                   ` Pingfan Liu
@ 2022-09-29  8:19                     ` Pingfan Liu
  2022-09-29  8:20                       ` Pingfan Liu
  2022-09-30 15:44                       ` Paul E. McKenney
  0 siblings, 2 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-09-29  8:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

[-- Attachment #1: Type: text/plain, Size: 7758 bytes --]

On Tue, Sep 27, 2022 at 5:59 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Mon, Sep 26, 2022 at 03:23:52PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> > > Sorry to reply late. I just realize this e-mail misses in my gmail.
> > >
> > > On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> > > [...]
> > > >
> > > > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > > > to a 30-minute run.  If you have at least 16 CPUs, you should add
> > >                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8
> >
> > Yes, you will get some benefit from --allcpus on systems with from 9-15
> > CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.
> >
> > > > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > > > do this:
> > > >
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > > >
> > >
> > > I have tried to find a two socket system with 128 cpus and run
> > >   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> > >
> > > Where 250*16=4000
> >
> > That would work.
> >
>
> This job has successfully run 24+ hours. (But maybe I can only keep it
> about 180 hours)
>
> > > > This would run four concurrent instances of the TREE04 scenario, each for
> > > > 10 hours, for a total of 40 hours of test time.
> > > >
> > > > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > > > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > > >
> > > > You could, at least if you replace the "=" with a space character, but
> > > > that really would run a six-month test, which is probably not what you
> > > > want to do.  There being 8,760 hours in a year and all that.
> > > >
> > > > > Is it related with the cpu's freq?
> > > >
> > > > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > > > regardless of the CPU frequencies.
> > > >
> > > > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > > >
> > > > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > > >
> > > > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > > > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > > > simple:
> > > >
> > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > > >
> > >
> > > That is appealing.
> > >
> > > I will see if any opportunity to grasp a batch of machines to run the
> > > test.
> >
> > Initial tests with smaller numbers of CPUs are also useful, for example,
> > in case reversion causes some bug due to bad interaction with a later
> > commit.
> >
> > Please let me know how it goes!
> >
>
> I have managed to grasp three two-socket machine, each has 256 cpus.
> The test has run about 7 hours till now without any problem by the following command:
> tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
>
> It seems promising.
>

The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
Make CPU-hotplug removal operations enable tick") reverted. It is
close to the end, but unfortunately it fails.
Quote from remote-log
"
TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
total-gps=9011687] n_max_cbs: 4111392
TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
total-gps=8907816] n_max_cbs: 2411104
TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
QEMU killed
TREE04.59 no success message, 10547 successful version messages
^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
TREE04.59
^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
total-gps=8835575] n_max_cbs: 2329432
...
...
TREE04.91 ------- 4895716 GPs (30.2205/s) [rcu: g39322065 f0x0
total-gps=9830808] n_max_cbs: 2208839
TREE04.92 ------- 4902696 GPs (30.2636/s) [rcu: g39113441 f0x0
total-gps=9778652] n_max_cbs: 1412377
TREE04.93 ------- 4891393 GPs (30.1938/s) [rcu: g39244749 f0x0
total-gps=9811481] n_max_cbs: 1772653
TREE04.94 ------- 4921510 GPs (30.3797/s) [rcu: g39187349 f0x0
total-gps=9797129] n_max_cbs: 1120534
TREE04.95 ------- 4885795 GPs (30.1592/s) [rcu: g39020985 f0x0
total-gps=9755538] n_max_cbs: 1178416
TREE04.96 ------- 4889097 GPs (30.1796/s) [rcu: g39097057 f0x0
total-gps=9774556] n_max_cbs: 1861434
1 runs with runtime errors.
 --- Done at Wed Sep 28 08:40:31 PM EDT 2022 (1d 21:06:57) exitcode 2
"

Quote from  console.log of TREE04.59
"
.....
[162001.696486] rcu-torture: rcu_torture_barrier_cbs is stopping
[162001.697004] rcu-torture: Stopping rcu_torture_fwd_prog task
[162001.697662] rcu_torture_fwd_prog n_max_cbs: 0
[162001.698195] rcu_torture_fwd_prog: Starting forward-progress test 0
[162001.698782] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
[162001.707571] rcu_torture_fwd_prog_cr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.738504] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
[162002.746491] rcu_torture_fwd_prog_nr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.850483] rcu_torture_fwd_prog: tested 2105 tested_tries 2107
[162002.851008] rcu-torture: rcu_torture_fwd_prog is stopping
[162002.851542] rcu-torture: Stopping rcu_torture_writer task
[162004.530463] rcu-torture: rtc: 00000000ac003c99 ver: 800516 tfle: 0
rta: 800517 rtaf: 0 rtf: 800507 rtmbe: 0 rtmbkf: 0/142699 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 205710931 onoff:
185194/185194:185196/185196 1,1860:1,3263 25610601:47601063 (HZ=1000)
barrier: 773783/773783:0 read-exits: 184960 nocb-toggles: 0:0
[162004.532583] rcu-torture: Reader Pipe:  343007605654 1359216 0 0 0
0 0 0 0 0 0
[162004.533113] rcu-torture: Reader Batch:  342996212546 12752324 0 0
0 0 0 0 0 0 0
[162004.533648] rcu-torture: Free-Block Circulation:  800516 800515
800514 800513 800512 800511 800510 800509 800508 800507 0
[162004.534442] ??? Writer stall state RTWS_EXP_SYNC(4) g30755544 f0x0
->state 0x2 cpu 0
[162004.535057] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
->state: 0x402 ->rt_priority 0 delta ->gp_start 1674 ->gp_activity
1670 ->gp_req_activity 1674 ->gp_wake_time 1674 ->gp_wake_seq 30755540
->gp_seq 30755544 ->gp_seq_needed 30755544 ->gp_max 989 ->gp_flags 0x0
[162004.536805] rcu:    CB 1^0->2 KbclSW F2838 L2838 C0 ..... q0 S CPU 0
[162004.537277] rcu:    CB 2^0->3 KbclSW F2911 L2911 C7 ..... q0 S CPU 0
[162004.537742] rcu:    CB 3^0->-1 KbclSW F1686 L1686 C2 ..... q0 S CPU 0
[162004.538217] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2176869 S CPU 0
[162004.538729] rcu:    CB 4^4->5 KbclSW F2912 L2912 C7 ..... q0 S CPU 0
[162004.539202] rcu:    CB 5^4->6 KbclSW F2871 L2872 C1 ..... q0 S CPU 0
[162004.539667] rcu:    CB 6^4->7 KbclSW F4060 L4060 C0 ..... q0 S CPU 0
[162004.540136] rcu:    CB 7^4->-1 KbclSW F5763 L5763 C1 ..... q0 S CPU 0
[162004.540653] rcu: RCU callbacks invoked since boot: 1431149091
[162004.541076] rcu-torture: rcu_torture_stats is stopping
"

I have no idea whether this is related to the reverted commit.


Thanks,

Pingfan

[-- Attachment #2: remote-log --]
[-- Type: application/octet-stream, Size: 60443 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-29  8:19                     ` Pingfan Liu
@ 2022-09-29  8:20                       ` Pingfan Liu
  2022-09-30 13:04                         ` Joel Fernandes
  2022-09-30 15:44                       ` Paul E. McKenney
  1 sibling, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-09-29  8:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
[...]
> "
>
> I have no idea whether this is related to the reverted commit.
>

I have started another test against clean v6.0-rc7 to see whether this
is an issue with the mainline.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-29  8:20                       ` Pingfan Liu
@ 2022-09-30 13:04                         ` Joel Fernandes
  2022-10-02 14:06                           ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Joel Fernandes @ 2022-09-30 13:04 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Paul E. McKenney, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> [...]
> > "
> >
> > I have no idea whether this is related to the reverted commit.
> >
>
> I have started another test against clean v6.0-rc7 to see whether this
> is an issue with the mainline.

I am not sure what exactly you are reverting (you could clarify that),
but if you are just removing the entire TICK_DEP_BIT_RCU, I do
remember (and mentioned on IRC to others recently) that without this
NOHZ_FULL has a hard time ending grace-periods because the forcing of
tick is needed for this configuration if we are spinning in the kernel
with the tick turned off. That seems to align with your TREE04
(NOHZ_FULL) configuration.

Also, the commit Frederic suggested to revert seems to be a cosmetic
optimization in the interrupt-entry path. That should not change
functionality I believe. So I did not fully follow why reverting that
is relevant (maybe Frederic can clarify?).

Or, are you trying to just remove its use from the hotplug path?

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-29  8:19                     ` Pingfan Liu
  2022-09-29  8:20                       ` Pingfan Liu
@ 2022-09-30 15:44                       ` Paul E. McKenney
  2022-10-02 13:29                         ` Pingfan Liu
  1 sibling, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-09-30 15:44 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Sep 29, 2022 at 04:19:28PM +0800, Pingfan Liu wrote:
> On Tue, Sep 27, 2022 at 5:59 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > On Mon, Sep 26, 2022 at 03:23:52PM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> > > > Sorry to reply late. I just realize this e-mail misses in my gmail.
> > > >
> > > > On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> > > > [...]
> > > > >
> > > > > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > > > > to a 30-minute run.  If you have at least 16 CPUs, you should add
> > > >                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8
> > >
> > > Yes, you will get some benefit from --allcpus on systems with from 9-15
> > > CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.
> > >
> > > > > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > > > > do this:
> > > > >
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > > > >
> > > >
> > > > I have tried to find a two socket system with 128 cpus and run
> > > >   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> > > >
> > > > Where 250*16=4000
> > >
> > > That would work.
> > >
> >
> > This job has successfully run 24+ hours. (But maybe I can only keep it
> > about 180 hours)
> >
> > > > > This would run four concurrent instances of the TREE04 scenario, each for
> > > > > 10 hours, for a total of 40 hours of test time.
> > > > >
> > > > > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > > > > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > > > >
> > > > > You could, at least if you replace the "=" with a space character, but
> > > > > that really would run a six-month test, which is probably not what you
> > > > > want to do.  There being 8,760 hours in a year and all that.
> > > > >
> > > > > > Is it related with the cpu's freq?
> > > > >
> > > > > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > > > > regardless of the CPU frequencies.
> > > > >
> > > > > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > > > >
> > > > > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > > > >
> > > > > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > > > > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > > > > simple:
> > > > >
> > > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > > > >
> > > >
> > > > That is appealing.
> > > >
> > > > I will see if any opportunity to grasp a batch of machines to run the
> > > > test.
> > >
> > > Initial tests with smaller numbers of CPUs are also useful, for example,
> > > in case reversion causes some bug due to bad interaction with a later
> > > commit.
> > >
> > > Please let me know how it goes!
> > >
> >
> > I have managed to grasp three two-socket machine, each has 256 cpus.
> > The test has run about 7 hours till now without any problem by the following command:
> > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> >
> > It seems promising.
> >
> 
> The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> Make CPU-hotplug removal operations enable tick") reverted. It is
> close to the end, but unfortunately it fails.
> Quote from remote-log
> "
> TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> total-gps=9011687] n_max_cbs: 4111392
> TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> total-gps=8907816] n_max_cbs: 2411104
> TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> QEMU killed
> TREE04.59 no success message, 10547 successful version messages
> ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> ^[[033mWARNING: ^[[mAssertion failure in
> /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> TREE04.59
> ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> total-gps=8835575] n_max_cbs: 2329432

First, thank you for running this!

This is not the typical failure that we were seeing, which would show
up as a 2.199.0-second RCU CPU stall during which time there would be
no console messages.

But please do let me know how continuing tests go!

							Thanx, Paul

> ...
> ...
> TREE04.91 ------- 4895716 GPs (30.2205/s) [rcu: g39322065 f0x0
> total-gps=9830808] n_max_cbs: 2208839
> TREE04.92 ------- 4902696 GPs (30.2636/s) [rcu: g39113441 f0x0
> total-gps=9778652] n_max_cbs: 1412377
> TREE04.93 ------- 4891393 GPs (30.1938/s) [rcu: g39244749 f0x0
> total-gps=9811481] n_max_cbs: 1772653
> TREE04.94 ------- 4921510 GPs (30.3797/s) [rcu: g39187349 f0x0
> total-gps=9797129] n_max_cbs: 1120534
> TREE04.95 ------- 4885795 GPs (30.1592/s) [rcu: g39020985 f0x0
> total-gps=9755538] n_max_cbs: 1178416
> TREE04.96 ------- 4889097 GPs (30.1796/s) [rcu: g39097057 f0x0
> total-gps=9774556] n_max_cbs: 1861434
> 1 runs with runtime errors.
>  --- Done at Wed Sep 28 08:40:31 PM EDT 2022 (1d 21:06:57) exitcode 2
> "
> 
> Quote from  console.log of TREE04.59
> "
> .....
> [162001.696486] rcu-torture: rcu_torture_barrier_cbs is stopping
> [162001.697004] rcu-torture: Stopping rcu_torture_fwd_prog task
> [162001.697662] rcu_torture_fwd_prog n_max_cbs: 0
> [162001.698195] rcu_torture_fwd_prog: Starting forward-progress test 0
> [162001.698782] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
> [162001.707571] rcu_torture_fwd_prog_cr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.738504] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
> [162002.746491] rcu_torture_fwd_prog_nr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.850483] rcu_torture_fwd_prog: tested 2105 tested_tries 2107
> [162002.851008] rcu-torture: rcu_torture_fwd_prog is stopping
> [162002.851542] rcu-torture: Stopping rcu_torture_writer task
> [162004.530463] rcu-torture: rtc: 00000000ac003c99 ver: 800516 tfle: 0
> rta: 800517 rtaf: 0 rtf: 800507 rtmbe: 0 rtmbkf: 0/142699 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 205710931 onoff:
> 185194/185194:185196/185196 1,1860:1,3263 25610601:47601063 (HZ=1000)
> barrier: 773783/773783:0 read-exits: 184960 nocb-toggles: 0:0
> [162004.532583] rcu-torture: Reader Pipe:  343007605654 1359216 0 0 0
> 0 0 0 0 0 0
> [162004.533113] rcu-torture: Reader Batch:  342996212546 12752324 0 0
> 0 0 0 0 0 0 0
> [162004.533648] rcu-torture: Free-Block Circulation:  800516 800515
> 800514 800513 800512 800511 800510 800509 800508 800507 0
> [162004.534442] ??? Writer stall state RTWS_EXP_SYNC(4) g30755544 f0x0
> ->state 0x2 cpu 0
> [162004.535057] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
> ->state: 0x402 ->rt_priority 0 delta ->gp_start 1674 ->gp_activity
> 1670 ->gp_req_activity 1674 ->gp_wake_time 1674 ->gp_wake_seq 30755540
> ->gp_seq 30755544 ->gp_seq_needed 30755544 ->gp_max 989 ->gp_flags 0x0
> [162004.536805] rcu:    CB 1^0->2 KbclSW F2838 L2838 C0 ..... q0 S CPU 0
> [162004.537277] rcu:    CB 2^0->3 KbclSW F2911 L2911 C7 ..... q0 S CPU 0
> [162004.537742] rcu:    CB 3^0->-1 KbclSW F1686 L1686 C2 ..... q0 S CPU 0
> [162004.538217] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2176869 S CPU 0
> [162004.538729] rcu:    CB 4^4->5 KbclSW F2912 L2912 C7 ..... q0 S CPU 0
> [162004.539202] rcu:    CB 5^4->6 KbclSW F2871 L2872 C1 ..... q0 S CPU 0
> [162004.539667] rcu:    CB 6^4->7 KbclSW F4060 L4060 C0 ..... q0 S CPU 0
> [162004.540136] rcu:    CB 7^4->-1 KbclSW F5763 L5763 C1 ..... q0 S CPU 0
> [162004.540653] rcu: RCU callbacks invoked since boot: 1431149091
> [162004.541076] rcu-torture: rcu_torture_stats is stopping
> "
> 
> I have no idea whether this is related to the reverted commit.
> 
> 
> Thanks,
> 
> Pingfan



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-09-20  9:23           ` Frederic Weisbecker
@ 2022-10-01  2:26             ` Joel Fernandes
  2022-10-02 12:34               ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Joel Fernandes @ 2022-10-01  2:26 UTC (permalink / raw)
  To: Frederic Weisbecker, Pingfan Liu
  Cc: rcu, Paul E. McKenney, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Jason A. Donenfeld



On 9/20/22 05:23, Frederic Weisbecker wrote:
> On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote:
>> On Mon, Sep 19, 2022 at 12:51:49PM +0200, Frederic Weisbecker wrote:
>>> On Mon, Sep 19, 2022 at 06:24:21PM +0800, Pingfan Liu wrote:
>>>> On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker
>>>> <frederic@kernel.org> wrote:
>>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>>> index 438ecae6bd7e..ef6d3ae239b9 100644
>>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>>> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
>>>>>>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>  {
>>>>>>       struct task_struct *t = rnp->boost_kthread_task;
>>>>>> -     unsigned long mask = rcu_rnp_online_cpus(rnp);
>>>>>> +     unsigned long mask;
>>>>>>       cpumask_var_t cm;
>>>>>>       int cpu;
>>>>>>
>>>>>> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>>>>>>               return;
>>>>>>       mutex_lock(&rnp->boost_kthread_mutex);
>>>>>> +     /*
>>>>>> +      * Relying on the lock to serialize, so when onlining, the latest
>>>>>> +      * qsmaskinitnext is for cpus_ptr.
>>>>>> +      */
>>>>>> +     mask = rcu_rnp_online_cpus(rnp);
>>>>>>       for_each_leaf_node_possible_cpu(rnp, cpu)
>>>>>>               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>>>>>>                   cpu != outgoingcpu)
>>>>>
>>>>> Right but you still race against concurrent rcu_report_dead() doing:
>>>>>
>>>>>       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
>>>>>
>>>>
>>>> Ah. Indeed, my commit log is not precisely described.
>>>>
>>>> In fact, either speedup smp_init [1] or fast kexec reboot [2] still
>>>> uses the model: one hotplug initiator and several reactors.  The
>>>> initiators are still excluded from each other by the pair
>>>> cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
>>>> cpu-up and cpu-down event at the same time.
>>>
>>> Yes but two downing CPUs may race right?
>>>
>>> So isn't it possible to have rcu_report_dead() racing against
>>> rcutree_offline_cpu()?
>>>
>>
>> Yes, two downing cpus can be in a batch.
>>
>> There is a nuance between onlining and offlining:
>> -1. onlining relies on qsmaskinitnext, which is stable at the point
>> rcutree_online_cpu(). And it is what this patch aims for.
>> -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext
>> which is not cleared yet at the point rcutree_offline_cpu().
>>
>> It is asymmetric owning to the point of updating qsmaskinitnext.
>>
>> Now considering the racing between rcu_report_dead() and
>> rcutree_offline_cpu():
>>
>> No matter rcutree_offline_cpu() reads out either stale or fresh value of
>> qsmaskinitnext, it should intersect with cpu_dying_mask.
> 
> Ah I see now, so the race can happen but it is then cancelled by the
> stable READ on cpu_dying_mask. Ok.
> 

Maybe I am missing something but, concurrent rcu_report_dead() on CPU's
of the same node can corrupt ->qsmaskinitnext as Frederick showed. That
may not be a problem for affinities (due to the cpu_dying_mask), but is
that not an issue for RCU operation itself?

i.e. due to the corruption, ->qsmaskinitnext is in an inconsistent state
causing *RCU* issues.

Or is there a reason that that is not an issue?

thanks,

 - Joel






^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-10-01  2:26             ` Joel Fernandes
@ 2022-10-02 12:34               ` Pingfan Liu
  2022-10-02 15:52                 ` Joel Fernandes
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-10-02 12:34 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, rcu, Paul E. McKenney, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Sat, Oct 1, 2022 at 10:26 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
>
>
> On 9/20/22 05:23, Frederic Weisbecker wrote:
> > On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote:
> >> On Mon, Sep 19, 2022 at 12:51:49PM +0200, Frederic Weisbecker wrote:
> >>> On Mon, Sep 19, 2022 at 06:24:21PM +0800, Pingfan Liu wrote:
> >>>> On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker
> >>>> <frederic@kernel.org> wrote:
> >>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>>>>> index 438ecae6bd7e..ef6d3ae239b9 100644
> >>>>>> --- a/kernel/rcu/tree_plugin.h
> >>>>>> +++ b/kernel/rcu/tree_plugin.h
> >>>>>> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> >>>>>>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >>>>>>  {
> >>>>>>       struct task_struct *t = rnp->boost_kthread_task;
> >>>>>> -     unsigned long mask = rcu_rnp_online_cpus(rnp);
> >>>>>> +     unsigned long mask;
> >>>>>>       cpumask_var_t cm;
> >>>>>>       int cpu;
> >>>>>>
> >>>>>> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >>>>>>       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
> >>>>>>               return;
> >>>>>>       mutex_lock(&rnp->boost_kthread_mutex);
> >>>>>> +     /*
> >>>>>> +      * Relying on the lock to serialize, so when onlining, the latest
> >>>>>> +      * qsmaskinitnext is for cpus_ptr.
> >>>>>> +      */
> >>>>>> +     mask = rcu_rnp_online_cpus(rnp);
> >>>>>>       for_each_leaf_node_possible_cpu(rnp, cpu)
> >>>>>>               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
> >>>>>>                   cpu != outgoingcpu)
> >>>>>
> >>>>> Right but you still race against concurrent rcu_report_dead() doing:
> >>>>>
> >>>>>       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
> >>>>>
> >>>>
> >>>> Ah. Indeed, my commit log is not precisely described.
> >>>>
> >>>> In fact, either speedup smp_init [1] or fast kexec reboot [2] still
> >>>> uses the model: one hotplug initiator and several reactors.  The
> >>>> initiators are still excluded from each other by the pair
> >>>> cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
> >>>> cpu-up and cpu-down event at the same time.
> >>>
> >>> Yes but two downing CPUs may race right?
> >>>
> >>> So isn't it possible to have rcu_report_dead() racing against
> >>> rcutree_offline_cpu()?
> >>>
> >>
> >> Yes, two downing cpus can be in a batch.
> >>
> >> There is a nuance between onlining and offlining:
> >> -1. onlining relies on qsmaskinitnext, which is stable at the point
> >> rcutree_online_cpu(). And it is what this patch aims for.
> >> -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext
> >> which is not cleared yet at the point rcutree_offline_cpu().
> >>
> >> It is asymmetric owning to the point of updating qsmaskinitnext.
> >>
> >> Now considering the racing between rcu_report_dead() and
> >> rcutree_offline_cpu():
> >>
> >> No matter rcutree_offline_cpu() reads out either stale or fresh value of
> >> qsmaskinitnext, it should intersect with cpu_dying_mask.
> >
> > Ah I see now, so the race can happen but it is then cancelled by the
> > stable READ on cpu_dying_mask. Ok.
> >
>
> Maybe I am missing something but, concurrent rcu_report_dead() on CPU's
> of the same node can corrupt ->qsmaskinitnext as Frederick showed. That

There is a pair of lock ops around WRITE_ONCE(rnp->qsmaskinitnext,
rnp->qsmaskinitnext & ~mask), i.e. raw_spin_lock_irqsave_rcu_node(rnp,
flags) and raw_spin_unlock_irqrestore_rcu_node(rnp, flags);

So it should not be a problem, right?

Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-30 15:44                       ` Paul E. McKenney
@ 2022-10-02 13:29                         ` Pingfan Liu
  2022-10-02 15:08                           ` Frederic Weisbecker
  2022-10-02 16:20                           ` Paul E. McKenney
  0 siblings, 2 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-10-02 13:29 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
[...]
> > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > The test has run about 7 hours till now without any problem by the following command:
> > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > >
> > > It seems promising.
> > >
> >
> > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > Make CPU-hotplug removal operations enable tick") reverted. It is
> > close to the end, but unfortunately it fails.
> > Quote from remote-log
> > "
> > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > total-gps=9011687] n_max_cbs: 4111392
> > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > total-gps=8907816] n_max_cbs: 2411104
> > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > QEMU killed
> > TREE04.59 no success message, 10547 successful version messages
> > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > ^[[033mWARNING: ^[[mAssertion failure in
> > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > TREE04.59
> > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > total-gps=8835575] n_max_cbs: 2329432
>
> First, thank you for running this!
>
> This is not the typical failure that we were seeing, which would show
> up as a 2.199.0-second RCU CPU stall during which time there would be
> no console messages.
>
> But please do let me know how continuing tests go!
>

This time, the same test environment except against v6.0-rc7 mainline,
also encountered the not typical failure.

Quote from remote-log:
"
TREE04.11 ------- 555429 GPs (3.42857/s) n_max_cbs: 1907355
QEMU killed
TREE04.11 no success message, 10547 successful version messages
^[[033mWARNING: ^[[mTREE04.11 GP HANG at 555429 torture stat 1330
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.11/console.log
TREE04.11
^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 9213
TREE04.12 ------- 4379523 GPs (27.0341/s) [rcu: g35803501 f0x0
total-gps=8951168] n_max_cbs: 2585955
...
TREE04.42 ------- 4383911 GPs (27.0612/s) [rcu: g35706861 f0x0
total-gps=8927008] n_max_cbs: 2544071
TREE04.43 ------- 4370806 GPs (26.9803/s) [rcu: g35793529 f0x0
total-gps=8948675] n_max_cbs: 2490251
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.43/console.log
TREE04.43
^[[033mWARNING: ^[[mSummary: Warnings: 1 Call Traces: 1
TREE04.44 ------- 4379942 GPs (27.0367/s) [rcu: g35806165 f0x0
total-gps=8951834] n_max_cbs: 1473528
...
TREE04.95 ------- 4850446 GPs (29.941/s) [rcu: g38867685 f0x0
total-gps=9717212] n_max_cbs: 2917447
TREE04.96 ------- 4880570 GPs (30.127/s) [rcu: g39023821 f0x0
total-gps=9756247] n_max_cbs: 1267969
2 runs with runtime errors.
 --- Done at Fri Sep 30 09:35:31 PM EDT 2022 (1d 21:07:02) exitcode 2
"

Quote from TREE04.11 console.log
"
...
[162001.301549] rcu_torture_fwd_prog n_max_cbs: 281053
[162001.301994] rcu_torture_fwd_prog: Starting forward-progress test 0
[162001.302531] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
[162001.308449] rcu_torture_fwd_prog_cr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.352445] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
[162002.360429] rcu_torture_fwd_prog_nr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.464428] rcu_torture_fwd_prog: tested 2104 tested_tries 2105
[162002.464989] rcu-torture: rcu_torture_fwd_prog is stopping
[162002.465518] rcu-torture: Stopping rcu_torture_writer task
[162005.040419] rcu-torture: rtc: 0000000089bf6f56 ver: 555429 tfle: 0
rta: 555430 rtaf: 0 rtf: 555420 rtmbe: 0 rtmbkf: 0/234319 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206274892 onoff:
184783/184783:184787/184787 1,2001:2,3303 25759026:47497333 (HZ=1000)
barrier: 773866/773866:0 read-exits: 185056 nocb-toggles: 0:0
[162005.042858] rcu-torture: Reader Pipe:  344206401977 936546 0 0 0 0 0 0 0 0 0
[162005.043504] rcu-torture: Reader Batch:  344194661258 12677265 0 0
0 0 0 0 0 0 0
[162005.044047] rcu-torture: Free-Block Circulation:  555429 555428
555427 555426 555425 555424 555423 555422 555421 555420 0
[162005.044857] ??? Writer stall state RTWS_COND_SYNC_EXP(8) g30399832
f0x0 ->state 0x2 cpu 1
[162005.045474] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
->state: 0x402 ->rt_priority 0 delta ->gp_start 2575 ->gp_activity
2571 ->gp_req_activity 2575 ->gp_wake_time 2575 ->gp_wake_seq 30399828
->gp_seq 30399832 ->gp_seq_needed 30399832 ->gp_max 1118 ->gp_flags
0x0
[162005.047265] rcu:    CB 1^0->2 KbclSW F3852 L3852 C5 ..... q0 S CPU 0
[162005.047856] rcu:    CB 2^0->3 KbclSW F3747 L3747 C2 ..... q0 S CPU 0
[162005.048415] rcu:    CB 3^0->-1 KbclSW F2583 L2583 C2 ..... q0 S CPU 0
[162005.048992] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2042607 S CPU 0
[162005.049569] rcu:    CB 4^4->5 KbclSW F3839 L3839 C6 ..... q0 S CPU 0
[162005.050040] rcu:    CB 5^4->6 KbclSW F4449 L4449 C0 ..... q0 S CPU 0
[162005.050515] rcu:    CB 6^4->7 KbclSW F3784 L3785 C1 ..... q0 S CPU 0
[162005.050984] rcu:    CB 7^4->-1 KbclSW F3837 L3837 C7 ..... q0 S CPU 0
[162005.051478] rcu: RCU callbacks invoked since boot: 1422160737
[162005.051908] rcu-torture: rcu_torture_stats is stopping
"

Quote from TREE04.43 console.log
"
...
[162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
[162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
0 0 0 0 0 0
[162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
0 0 0 0 0 0 0
[162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
[162003.266980] rcu-torture: rcu_torture_stats is stopping
[162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
[162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
[162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
= ffffffffa676d9a0
[162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
[162003.281563] mem_dump_obj(NULL): NULL pointer
[162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
[162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
ffff965e01a58000 pointer offset 0 allocated at
rcu_torture_cleanup+0x432/0x4e0
[162003.284138]     kmem_cache_alloc+0x19d/0x1b0
[162003.284522]     rcu_torture_cleanup+0x432/0x4e0
[162003.284918]     torture_shutdown+0xce/0x1c0
[162003.285285]     kthread+0xb7/0xe0
[162003.285587]     ret_from_fork+0x22/0x30
[162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
ffff965e01a58000 pointer offset 8 allocated at
rcu_torture_cleanup+0x432/0x4e0
[162003.287045]     kmem_cache_alloc+0x19d/0x1b0
[162003.287436]     rcu_torture_cleanup+0x432/0x4e0
[162003.287835]     torture_shutdown+0xce/0x1c0
[162003.288202]     kthread+0xb7/0xe0
[162003.288510]     ret_from_fork+0x22/0x30
[162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
[162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
[162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
[162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
[162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
[162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
region starting at 0xffffa0aa0002d000 allocated at
rcu_torture_cleanup.cold+0x287/0x3ef
[162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
region starting at 0xffffa0aa0002d000 allocated at
rcu_torture_cleanup.cold+0x287/0x3ef
[162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
[162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
0 0 0 0 0 0
[162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
0 0 0 0 0 0 0
[162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
[162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
fqs_stutter=3 test_boost=1/0 test_boost_interval=7
test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
[162003.304162] i2c i2c-0: shutdown
[162003.304461] alarmtimer alarmtimer.0.auto: shutdown
[162003.304878] psmouse serio1: shutdown
...
"

I remember if reverting the commits suggested by Frederic, the failure
would occur within an hour.

But unfortunately, I did not keep the data. I will run it again and
submit the data.


Thanks,

Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-09-30 13:04                         ` Joel Fernandes
@ 2022-10-02 14:06                           ` Pingfan Liu
  2022-10-02 16:11                             ` Joel Fernandes
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-10-02 14:06 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> > >
> > [...]
> > > "
> > >
> > > I have no idea whether this is related to the reverted commit.
> > >
> >
> > I have started another test against clean v6.0-rc7 to see whether this
> > is an issue with the mainline.
>
> I am not sure what exactly you are reverting (you could clarify that),

commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
But due to conflict, "git revert" can not work directly. So I applied
it by hand.

> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
> remember (and mentioned on IRC to others recently) that without this
> NOHZ_FULL has a hard time ending grace-periods because the forcing of
> tick is needed for this configuration if we are spinning in the kernel
> with the tick turned off. That seems to align with your TREE04
> (NOHZ_FULL) configuration.
>

Yes, that is the scenario.

> Also, the commit Frederic suggested to revert seems to be a cosmetic
> optimization in the interrupt-entry path. That should not change
> functionality I believe. So I did not fully follow why reverting that
> is relevant (maybe Frederic can clarify?).
>

Leave this question to Frederic.

And I schedule a test which reverts three commits
  96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
  53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
and:
  a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)

Let us see the result. (Last time, it failed within an hour)

> Or, are you trying to just remove its use from the hotplug path?
>

Not from the hotplug path. Just revert the commit and run the standard
TREE04 test.

Thanks,

     Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 13:29                         ` Pingfan Liu
@ 2022-10-02 15:08                           ` Frederic Weisbecker
  2022-10-02 16:20                             ` Paul E. McKenney
  2022-10-02 16:20                           ` Paul E. McKenney
  1 sibling, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-10-02 15:08 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: paulmck, rcu, David Woodhouse, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> [...]
> > > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > > The test has run about 7 hours till now without any problem by the following command:
> > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > > >
> > > > It seems promising.
> > > >
> > >
> > > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > > Make CPU-hotplug removal operations enable tick") reverted. It is
> > > close to the end, but unfortunately it fails.
> > > Quote from remote-log
> > > "
> > > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > > total-gps=9011687] n_max_cbs: 4111392
> > > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > > total-gps=8907816] n_max_cbs: 2411104
> > > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > > QEMU killed
> > > TREE04.59 no success message, 10547 successful version messages
> > > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > > ^[[033mWARNING: ^[[mAssertion failure in
> > > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > > TREE04.59
> > > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > > total-gps=8835575] n_max_cbs: 2329432
> >
> > First, thank you for running this!
> >
> > This is not the typical failure that we were seeing, which would show
> > up as a 2.199.0-second RCU CPU stall during which time there would be
> > no console messages.
> >
> > But please do let me know how continuing tests go!
> >
> 
> This time, the same test environment except against v6.0-rc7 mainline,
> also encountered the not typical failure.

Interesting, I'm trying to reproduce...

Thanks!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()
  2022-10-02 12:34               ` Pingfan Liu
@ 2022-10-02 15:52                 ` Joel Fernandes
  0 siblings, 0 replies; 61+ messages in thread
From: Joel Fernandes @ 2022-10-02 15:52 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, Paul E. McKenney, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On 10/2/2022 8:34 AM, Pingfan Liu wrote:
> On Sat, Oct 1, 2022 at 10:26 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>> On 9/20/22 05:23, Frederic Weisbecker wrote:
>>> On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote:
[...]
>>>>>> <frederic@kernel.org> wrote:
>>>>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>>>>> index 438ecae6bd7e..ef6d3ae239b9 100644
>>>>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>>>>> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
>>>>>>>>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>>>  {
>>>>>>>>       struct task_struct *t = rnp->boost_kthread_task;
>>>>>>>> -     unsigned long mask = rcu_rnp_online_cpus(rnp);
>>>>>>>> +     unsigned long mask;
>>>>>>>>       cpumask_var_t cm;
>>>>>>>>       int cpu;
>>>>>>>>
>>>>>>>> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>>>       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>>>>>>>>               return;
>>>>>>>>       mutex_lock(&rnp->boost_kthread_mutex);
>>>>>>>> +     /*
>>>>>>>> +      * Relying on the lock to serialize, so when onlining, the latest
>>>>>>>> +      * qsmaskinitnext is for cpus_ptr.
>>>>>>>> +      */
>>>>>>>> +     mask = rcu_rnp_online_cpus(rnp);
>>>>>>>>       for_each_leaf_node_possible_cpu(rnp, cpu)
>>>>>>>>               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>>>>>>>>                   cpu != outgoingcpu)
>>>>>>>
>>>>>>> Right but you still race against concurrent rcu_report_dead() doing:
>>>>>>>
>>>>>>>       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
>>>>>>>
>>>>>>
>>>>>> Ah. Indeed, my commit log is not precisely described.
>>>>>>
>>>>>> In fact, either speedup smp_init [1] or fast kexec reboot [2] still
>>>>>> uses the model: one hotplug initiator and several reactors.  The
>>>>>> initiators are still excluded from each other by the pair
>>>>>> cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
>>>>>> cpu-up and cpu-down event at the same time.
>>>>>
>>>>> Yes but two downing CPUs may race right?
>>>>>
>>>>> So isn't it possible to have rcu_report_dead() racing against
>>>>> rcutree_offline_cpu()?
>>>>>
>>>>
>>>> Yes, two downing cpus can be in a batch.
>>>>
>>>> There is a nuance between onlining and offlining:
>>>> -1. onlining relies on qsmaskinitnext, which is stable at the point
>>>> rcutree_online_cpu(). And it is what this patch aims for.
>>>> -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext
>>>> which is not cleared yet at the point rcutree_offline_cpu().
>>>>
>>>> It is asymmetric owning to the point of updating qsmaskinitnext.
>>>>
>>>> Now considering the racing between rcu_report_dead() and
>>>> rcutree_offline_cpu():
>>>>
>>>> No matter rcutree_offline_cpu() reads out either stale or fresh value of
>>>> qsmaskinitnext, it should intersect with cpu_dying_mask.
>>>
>>> Ah I see now, so the race can happen but it is then cancelled by the
>>> stable READ on cpu_dying_mask. Ok.
>>>
>>
>> Maybe I am missing something but, concurrent rcu_report_dead() on CPU's
>> of the same node can corrupt ->qsmaskinitnext as Frederick showed. That
> 
> There is a pair of lock ops around WRITE_ONCE(rnp->qsmaskinitnext,
> rnp->qsmaskinitnext & ~mask), i.e. raw_spin_lock_irqsave_rcu_node(rnp,
> flags) and raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> 
> So it should not be a problem, right?

Correct, I think should not be problem. I got thrown away a bit by the comment
"Right but you still race against concurrent rcu_report_dead()".

Even with concurrent offlining, a stable ->qsmaskinitnext should ensure that the
hotplugged CPUs are properly considered in rcu_gp_init(), so I don't see a
problem with your patch so far... but I'll continue to dig when you send the
next patch. My fear was if a bit in ->qsmaskinitnext is not cleared during
offlining race, then grace period will hang.

Thanks,

 - Joel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 14:06                           ` Pingfan Liu
@ 2022-10-02 16:11                             ` Joel Fernandes
  2022-10-02 16:24                               ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Joel Fernandes @ 2022-10-02 16:11 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Paul E. McKenney, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld



On 10/2/2022 10:06 AM, Pingfan Liu wrote:
> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>
>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>
>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>
>>> [...]
>>>> "
>>>>
>>>> I have no idea whether this is related to the reverted commit.
>>>>
>>>
>>> I have started another test against clean v6.0-rc7 to see whether this
>>> is an issue with the mainline.
>>
>> I am not sure what exactly you are reverting (you could clarify that),
> 
> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
> But due to conflict, "git revert" can not work directly. So I applied
> it by hand.
> 
>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
>> remember (and mentioned on IRC to others recently) that without this
>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
>> tick is needed for this configuration if we are spinning in the kernel
>> with the tick turned off. That seems to align with your TREE04
>> (NOHZ_FULL) configuration.
>>
> 
> Yes, that is the scenario.
> 
>> Also, the commit Frederic suggested to revert seems to be a cosmetic
>> optimization in the interrupt-entry path. That should not change
>> functionality I believe. So I did not fully follow why reverting that
>> is relevant (maybe Frederic can clarify?).
>>
> 
> Leave this question to Frederic.

I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
a functional change to CPU hotplug path.

Sorry for the noise.

Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
might jump into the investigation with you guys, but I have to make time for
Lazy-RCU v7 next :)

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 13:29                         ` Pingfan Liu
  2022-10-02 15:08                           ` Frederic Weisbecker
@ 2022-10-02 16:20                           ` Paul E. McKenney
       [not found]                             ` <CAFgQCTtgLfc0NeYqyWk4Ew-pA9rMREjRjWSnQhYLv-V5117s9Q@mail.gmail.com>
       [not found]                             ` <CAFgQCTtNetv7v_Law=abPtngC8Gv6OGcGz9M_wWMxz_GAEWDUQ@mail.gmail.com>
  1 sibling, 2 replies; 61+ messages in thread
From: Paul E. McKenney @ 2022-10-02 16:20 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> [...]
> > > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > > The test has run about 7 hours till now without any problem by the following command:
> > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > > >
> > > > It seems promising.
> > > >
> > >
> > > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > > Make CPU-hotplug removal operations enable tick") reverted. It is
> > > close to the end, but unfortunately it fails.
> > > Quote from remote-log
> > > "
> > > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > > total-gps=9011687] n_max_cbs: 4111392
> > > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > > total-gps=8907816] n_max_cbs: 2411104
> > > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > > QEMU killed
> > > TREE04.59 no success message, 10547 successful version messages
> > > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > > ^[[033mWARNING: ^[[mAssertion failure in
> > > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > > TREE04.59
> > > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > > total-gps=8835575] n_max_cbs: 2329432
> >
> > First, thank you for running this!
> >
> > This is not the typical failure that we were seeing, which would show
> > up as a 2.199.0-second RCU CPU stall during which time there would be
> > no console messages.
> >
> > But please do let me know how continuing tests go!
> >
> 
> This time, the same test environment except against v6.0-rc7 mainline,
> also encountered the not typical failure.
> 
> Quote from remote-log:
> "
> TREE04.11 ------- 555429 GPs (3.42857/s) n_max_cbs: 1907355
> QEMU killed
> TREE04.11 no success message, 10547 successful version messages
> ^[[033mWARNING: ^[[mTREE04.11 GP HANG at 555429 torture stat 1330
> ^[[033mWARNING: ^[[mAssertion failure in
> /home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.11/console.log
> TREE04.11
> ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 9213
> TREE04.12 ------- 4379523 GPs (27.0341/s) [rcu: g35803501 f0x0
> total-gps=8951168] n_max_cbs: 2585955
> ...
> TREE04.42 ------- 4383911 GPs (27.0612/s) [rcu: g35706861 f0x0
> total-gps=8927008] n_max_cbs: 2544071
> TREE04.43 ------- 4370806 GPs (26.9803/s) [rcu: g35793529 f0x0
> total-gps=8948675] n_max_cbs: 2490251
> ^[[033mWARNING: ^[[mAssertion failure in
> /home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.43/console.log
> TREE04.43
> ^[[033mWARNING: ^[[mSummary: Warnings: 1 Call Traces: 1
> TREE04.44 ------- 4379942 GPs (27.0367/s) [rcu: g35806165 f0x0
> total-gps=8951834] n_max_cbs: 1473528
> ...
> TREE04.95 ------- 4850446 GPs (29.941/s) [rcu: g38867685 f0x0
> total-gps=9717212] n_max_cbs: 2917447
> TREE04.96 ------- 4880570 GPs (30.127/s) [rcu: g39023821 f0x0
> total-gps=9756247] n_max_cbs: 1267969
> 2 runs with runtime errors.
>  --- Done at Fri Sep 30 09:35:31 PM EDT 2022 (1d 21:07:02) exitcode 2
> "
> 
> Quote from TREE04.11 console.log
> "
> ...
> [162001.301549] rcu_torture_fwd_prog n_max_cbs: 281053
> [162001.301994] rcu_torture_fwd_prog: Starting forward-progress test 0
> [162001.302531] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
> [162001.308449] rcu_torture_fwd_prog_cr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.352445] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
> [162002.360429] rcu_torture_fwd_prog_nr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.464428] rcu_torture_fwd_prog: tested 2104 tested_tries 2105
> [162002.464989] rcu-torture: rcu_torture_fwd_prog is stopping
> [162002.465518] rcu-torture: Stopping rcu_torture_writer task
> [162005.040419] rcu-torture: rtc: 0000000089bf6f56 ver: 555429 tfle: 0
> rta: 555430 rtaf: 0 rtf: 555420 rtmbe: 0 rtmbkf: 0/234319 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206274892 onoff:
> 184783/184783:184787/184787 1,2001:2,3303 25759026:47497333 (HZ=1000)
> barrier: 773866/773866:0 read-exits: 185056 nocb-toggles: 0:0
> [162005.042858] rcu-torture: Reader Pipe:  344206401977 936546 0 0 0 0 0 0 0 0 0
> [162005.043504] rcu-torture: Reader Batch:  344194661258 12677265 0 0
> 0 0 0 0 0 0 0
> [162005.044047] rcu-torture: Free-Block Circulation:  555429 555428
> 555427 555426 555425 555424 555423 555422 555421 555420 0
> [162005.044857] ??? Writer stall state RTWS_COND_SYNC_EXP(8) g30399832
> f0x0 ->state 0x2 cpu 1
> [162005.045474] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
> ->state: 0x402 ->rt_priority 0 delta ->gp_start 2575 ->gp_activity
> 2571 ->gp_req_activity 2575 ->gp_wake_time 2575 ->gp_wake_seq 30399828
> ->gp_seq 30399832 ->gp_seq_needed 30399832 ->gp_max 1118 ->gp_flags
> 0x0
> [162005.047265] rcu:    CB 1^0->2 KbclSW F3852 L3852 C5 ..... q0 S CPU 0
> [162005.047856] rcu:    CB 2^0->3 KbclSW F3747 L3747 C2 ..... q0 S CPU 0
> [162005.048415] rcu:    CB 3^0->-1 KbclSW F2583 L2583 C2 ..... q0 S CPU 0
> [162005.048992] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2042607 S CPU 0
> [162005.049569] rcu:    CB 4^4->5 KbclSW F3839 L3839 C6 ..... q0 S CPU 0
> [162005.050040] rcu:    CB 5^4->6 KbclSW F4449 L4449 C0 ..... q0 S CPU 0
> [162005.050515] rcu:    CB 6^4->7 KbclSW F3784 L3785 C1 ..... q0 S CPU 0
> [162005.050984] rcu:    CB 7^4->-1 KbclSW F3837 L3837 C7 ..... q0 S CPU 0
> [162005.051478] rcu: RCU callbacks invoked since boot: 1422160737
> [162005.051908] rcu-torture: rcu_torture_stats is stopping
> "
> 
> Quote from TREE04.43 console.log
> "
> ...
> [162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> [162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> 0 0 0 0 0 0
> [162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> 0 0 0 0 0 0 0
> [162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
> 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> [162003.266980] rcu-torture: rcu_torture_stats is stopping
> [162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
> [162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
> [162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
> 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
> = ffffffffa676d9a0
> [162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
> [162003.281563] mem_dump_obj(NULL): NULL pointer
> [162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
> starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
> [162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
> ffff965e01a58000 pointer offset 0 allocated at
> rcu_torture_cleanup+0x432/0x4e0
> [162003.284138]     kmem_cache_alloc+0x19d/0x1b0
> [162003.284522]     rcu_torture_cleanup+0x432/0x4e0
> [162003.284918]     torture_shutdown+0xce/0x1c0
> [162003.285285]     kthread+0xb7/0xe0
> [162003.285587]     ret_from_fork+0x22/0x30
> [162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
> ffff965e01a58000 pointer offset 8 allocated at
> rcu_torture_cleanup+0x432/0x4e0
> [162003.287045]     kmem_cache_alloc+0x19d/0x1b0
> [162003.287436]     rcu_torture_cleanup+0x432/0x4e0
> [162003.287835]     torture_shutdown+0xce/0x1c0
> [162003.288202]     kthread+0xb7/0xe0
> [162003.288510]     ret_from_fork+0x22/0x30
> [162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
> [162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
> 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
> [162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
> kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
> [162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
> kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
> [162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
> 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
> [162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
> region starting at 0xffffa0aa0002d000 allocated at
> rcu_torture_cleanup.cold+0x287/0x3ef
> [162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
> region starting at 0xffffa0aa0002d000 allocated at
> rcu_torture_cleanup.cold+0x287/0x3ef
> [162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> [162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> 0 0 0 0 0 0
> [162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> 0 0 0 0 0 0 0
> [162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
> 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> [162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
> nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
> stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
> n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
> read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
> [162003.304162] i2c i2c-0: shutdown
> [162003.304461] alarmtimer alarmtimer.0.auto: shutdown
> [162003.304878] psmouse serio1: shutdown
> ...
> "

Huh.  Shutdown writer stall on the first one, but I am not seeing the
but in the second one.  Was the "Call Trace" located somewhere else in
the console.log file?

I have not seen the shutdown writer stall lately, but such stalls have
flagged many sorts of bugs for me to fix in the past.


So maybe there is yet another bug that my testing is for whatever reason
not seeing.

> I remember if reverting the commits suggested by Frederic, the failure
> would occur within an hour.
> 
> But unfortunately, I did not keep the data. I will run it again and
> submit the data.

Sounds good!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 15:08                           ` Frederic Weisbecker
@ 2022-10-02 16:20                             ` Paul E. McKenney
  0 siblings, 0 replies; 61+ messages in thread
From: Paul E. McKenney @ 2022-10-02 16:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Pingfan Liu, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Sun, Oct 02, 2022 at 05:08:58PM +0200, Frederic Weisbecker wrote:
> On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > [...]
> > > > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > > > The test has run about 7 hours till now without any problem by the following command:
> > > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > > > >
> > > > > It seems promising.
> > > > >
> > > >
> > > > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > > > Make CPU-hotplug removal operations enable tick") reverted. It is
> > > > close to the end, but unfortunately it fails.
> > > > Quote from remote-log
> > > > "
> > > > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > > > total-gps=9011687] n_max_cbs: 4111392
> > > > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > > > total-gps=8907816] n_max_cbs: 2411104
> > > > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > > > QEMU killed
> > > > TREE04.59 no success message, 10547 successful version messages
> > > > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > > > ^[[033mWARNING: ^[[mAssertion failure in
> > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > > > TREE04.59
> > > > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > > > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > > > total-gps=8835575] n_max_cbs: 2329432
> > >
> > > First, thank you for running this!
> > >
> > > This is not the typical failure that we were seeing, which would show
> > > up as a 2.199.0-second RCU CPU stall during which time there would be
> > > no console messages.
> > >
> > > But please do let me know how continuing tests go!
> > >
> > 
> > This time, the same test environment except against v6.0-rc7 mainline,
> > also encountered the not typical failure.
> 
> Interesting, I'm trying to reproduce...

Very good, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 16:11                             ` Joel Fernandes
@ 2022-10-02 16:24                               ` Paul E. McKenney
  2022-10-02 16:30                                 ` Joel Fernandes
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-10-02 16:24 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Pingfan Liu, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
> 
> 
> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
> > On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>
> >> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>
> >>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>>
> >>> [...]
> >>>> "
> >>>>
> >>>> I have no idea whether this is related to the reverted commit.
> >>>>
> >>>
> >>> I have started another test against clean v6.0-rc7 to see whether this
> >>> is an issue with the mainline.
> >>
> >> I am not sure what exactly you are reverting (you could clarify that),
> > 
> > commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
> > But due to conflict, "git revert" can not work directly. So I applied
> > it by hand.
> > 
> >> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
> >> remember (and mentioned on IRC to others recently) that without this
> >> NOHZ_FULL has a hard time ending grace-periods because the forcing of
> >> tick is needed for this configuration if we are spinning in the kernel
> >> with the tick turned off. That seems to align with your TREE04
> >> (NOHZ_FULL) configuration.
> >>
> > 
> > Yes, that is the scenario.
> > 
> >> Also, the commit Frederic suggested to revert seems to be a cosmetic
> >> optimization in the interrupt-entry path. That should not change
> >> functionality I believe. So I did not fully follow why reverting that
> >> is relevant (maybe Frederic can clarify?).
> >>
> > 
> > Leave this question to Frederic.
> 
> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
> a functional change to CPU hotplug path.
> 
> Sorry for the noise.
> 
> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
> might jump into the investigation with you guys, but I have to make time for
> Lazy-RCU v7 next :)

One historical reason was that a nohz_full CPU could enter the kernel with
the tick still disabled, and stay that way indefinitely.  Among other
things, this can interfere with the grace-period wait in the offlining
code path.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 16:24                               ` Paul E. McKenney
@ 2022-10-02 16:30                                 ` Joel Fernandes
  2022-10-02 16:57                                   ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Joel Fernandes @ 2022-10-02 16:30 UTC (permalink / raw)
  To: paulmck
  Cc: Pingfan Liu, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld



> On Oct 2, 2022, at 12:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
>> 
>> 
>>> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
>>> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>> 
>>>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>> 
>>>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>>> 
>>>>> [...]
>>>>>> "
>>>>>> 
>>>>>> I have no idea whether this is related to the reverted commit.
>>>>>> 
>>>>> 
>>>>> I have started another test against clean v6.0-rc7 to see whether this
>>>>> is an issue with the mainline.
>>>> 
>>>> I am not sure what exactly you are reverting (you could clarify that),
>>> 
>>> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
>>> But due to conflict, "git revert" can not work directly. So I applied
>>> it by hand.
>>> 
>>>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
>>>> remember (and mentioned on IRC to others recently) that without this
>>>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
>>>> tick is needed for this configuration if we are spinning in the kernel
>>>> with the tick turned off. That seems to align with your TREE04
>>>> (NOHZ_FULL) configuration.
>>>> 
>>> 
>>> Yes, that is the scenario.
>>> 
>>>> Also, the commit Frederic suggested to revert seems to be a cosmetic
>>>> optimization in the interrupt-entry path. That should not change
>>>> functionality I believe. So I did not fully follow why reverting that
>>>> is relevant (maybe Frederic can clarify?).
>>>> 
>>> 
>>> Leave this question to Frederic.
>> 
>> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
>> a functional change to CPU hotplug path.
>> 
>> Sorry for the noise.
>> 
>> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
>> might jump into the investigation with you guys, but I have to make time for
>> Lazy-RCU v7 next :)
> 
> One historical reason was that a nohz_full CPU could enter the kernel with
> the tick still disabled, and stay that way indefinitely.  Among other
> things, this can interfere with the grace-period wait in the offlining
> code path.

Thanks for clarification/confirmation, that was my hunch as well. Does the emergency forcing on of the tick not suffice for that? If I understand you are sending an IPI to set the same tick dependency if grace periods are being held too long. Another potential fix instead of setting tick dependency in hotplug, then, could be to reduce the time before which emergency IPIs are sent to free up nohz_full CPUs, I think.

Thanks!

- Joel 

> 
>                            Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 16:30                                 ` Joel Fernandes
@ 2022-10-02 16:57                                   ` Paul E. McKenney
  2022-10-02 16:59                                     ` Joel Fernandes
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-10-02 16:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Pingfan Liu, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Sun, Oct 02, 2022 at 12:30:52PM -0400, Joel Fernandes wrote:
> 
> 
> > On Oct 2, 2022, at 12:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
> >> 
> >> 
> >>> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
> >>> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>> 
> >>>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>>> 
> >>>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>>>> 
> >>>>> [...]
> >>>>>> "
> >>>>>> 
> >>>>>> I have no idea whether this is related to the reverted commit.
> >>>>>> 
> >>>>> 
> >>>>> I have started another test against clean v6.0-rc7 to see whether this
> >>>>> is an issue with the mainline.
> >>>> 
> >>>> I am not sure what exactly you are reverting (you could clarify that),
> >>> 
> >>> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
> >>> But due to conflict, "git revert" can not work directly. So I applied
> >>> it by hand.
> >>> 
> >>>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
> >>>> remember (and mentioned on IRC to others recently) that without this
> >>>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
> >>>> tick is needed for this configuration if we are spinning in the kernel
> >>>> with the tick turned off. That seems to align with your TREE04
> >>>> (NOHZ_FULL) configuration.
> >>>> 
> >>> 
> >>> Yes, that is the scenario.
> >>> 
> >>>> Also, the commit Frederic suggested to revert seems to be a cosmetic
> >>>> optimization in the interrupt-entry path. That should not change
> >>>> functionality I believe. So I did not fully follow why reverting that
> >>>> is relevant (maybe Frederic can clarify?).
> >>>> 
> >>> 
> >>> Leave this question to Frederic.
> >> 
> >> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
> >> a functional change to CPU hotplug path.
> >> 
> >> Sorry for the noise.
> >> 
> >> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
> >> might jump into the investigation with you guys, but I have to make time for
> >> Lazy-RCU v7 next :)
> > 
> > One historical reason was that a nohz_full CPU could enter the kernel with
> > the tick still disabled, and stay that way indefinitely.  Among other
> > things, this can interfere with the grace-period wait in the offlining
> > code path.
> 
> Thanks for clarification/confirmation, that was my hunch as well. Does the emergency forcing on of the tick not suffice for that? If I understand you are sending an IPI to set the same tick dependency if grace periods are being held too long. Another potential fix instead of setting tick dependency in hotplug, then, could be to reduce the time before which emergency IPIs are sent to free up nohz_full CPUs, I think.

Why not try modifying it to work this way and seeing what (if anything)
breaks?  If nothing breaks, then there is still the option of digging
into the code.

In both cases, after Lazy-RCU v7, of course.  ;-)

								Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-02 16:57                                   ` Paul E. McKenney
@ 2022-10-02 16:59                                     ` Joel Fernandes
  0 siblings, 0 replies; 61+ messages in thread
From: Joel Fernandes @ 2022-10-02 16:59 UTC (permalink / raw)
  To: paulmck
  Cc: Pingfan Liu, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld



> On Oct 2, 2022, at 12:57 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Sun, Oct 02, 2022 at 12:30:52PM -0400, Joel Fernandes wrote:
>> 
>> 
>>>> On Oct 2, 2022, at 12:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>> 
>>> On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
>>>> 
>>>> 
>>>>> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
>>>>> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>> 
>>>>>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>>>>> 
>>>>>>> [...]
>>>>>>>> "
>>>>>>>> 
>>>>>>>> I have no idea whether this is related to the reverted commit.
>>>>>>>> 
>>>>>>> 
>>>>>>> I have started another test against clean v6.0-rc7 to see whether this
>>>>>>> is an issue with the mainline.
>>>>>> 
>>>>>> I am not sure what exactly you are reverting (you could clarify that),
>>>>> 
>>>>> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
>>>>> But due to conflict, "git revert" can not work directly. So I applied
>>>>> it by hand.
>>>>> 
>>>>>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
>>>>>> remember (and mentioned on IRC to others recently) that without this
>>>>>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
>>>>>> tick is needed for this configuration if we are spinning in the kernel
>>>>>> with the tick turned off. That seems to align with your TREE04
>>>>>> (NOHZ_FULL) configuration.
>>>>>> 
>>>>> 
>>>>> Yes, that is the scenario.
>>>>> 
>>>>>> Also, the commit Frederic suggested to revert seems to be a cosmetic
>>>>>> optimization in the interrupt-entry path. That should not change
>>>>>> functionality I believe. So I did not fully follow why reverting that
>>>>>> is relevant (maybe Frederic can clarify?).
>>>>>> 
>>>>> 
>>>>> Leave this question to Frederic.
>>>> 
>>>> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
>>>> a functional change to CPU hotplug path.
>>>> 
>>>> Sorry for the noise.
>>>> 
>>>> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
>>>> might jump into the investigation with you guys, but I have to make time for
>>>> Lazy-RCU v7 next :)
>>> 
>>> One historical reason was that a nohz_full CPU could enter the kernel with
>>> the tick still disabled, and stay that way indefinitely.  Among other
>>> things, this can interfere with the grace-period wait in the offlining
>>> code path.
>> 
>> Thanks for clarification/confirmation, that was my hunch as well. Does the emergency forcing on of the tick not suffice for that? If I understand you are sending an IPI to set the same tick dependency if grace periods are being held too long. Another potential fix instead of setting tick dependency in hotplug, then, could be to reduce the time before which emergency IPIs are sent to free up nohz_full CPUs, I think.
> 
> Why not try modifying it to work this way and seeing what (if anything)
> breaks?  If nothing breaks, then there is still the option of digging
> into the code.

Yes happy to try :)

> 
> In both cases, after Lazy-RCU v7, of course.  ;-)

Sounds good and thanks for understanding ;)

Thanks,

 - Joel

> 
>                                Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
       [not found]                             ` <CAFgQCTtgLfc0NeYqyWk4Ew-pA9rMREjRjWSnQhYLv-V5117s9Q@mail.gmail.com>
@ 2022-10-27 17:46                               ` Paul E. McKenney
  2022-10-31  3:24                                 ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-10-27 17:46 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> [...]
> 
> > >
> > > But unfortunately, I did not keep the data. I will run it again and
> > > submit the data.
> >
> 
> I have finished the test on a machine with two sockets and 256 cpus.
> The test runs against the kernel with three commits reverted.
>   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
>   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> IRQ entry")
>   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> 
> Summary from console.log
> "
>  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> Results directory:
> /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> 125h --bootargs rcutorture.onoff_interval=200
> rcutorture.onoff_holdoff=30 --configs 32*TREE04
> TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> TREE04 no success message, 2897 successful version messages
> Completed in 44512 vs. 450000
> TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> TREE04.10 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> TREE04.11 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> TREE04.12 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> TREE04.13 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> TREE04.14 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> TREE04.15 no success message, 2897 successful version messages
> Completed in 44510 vs. 450000
> TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> TREE04.16 no success message, 2897 successful version messages
> Completed in 44510 vs. 450000
> TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> TREE04.17 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> TREE04.18 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> TREE04.19 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> ...
> TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> TREE04.4 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> TREE04.5 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> TREE04.6 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> TREE04.7 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> TREE04.8 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> TREE04.9 no success message, 2897 successful version messages
> Completed in 44510 vs. 450000
> 32 runs with runtime errors.
>  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> "
> I have no idea about the test so just arbitrarily pick up the
> console.log of TREE04.10 as an example. Please get it from attachment.

Very good, thank you!

Could you please clearly indicate what you tested?  For example, if
you have an externally visible git tree, please point me at the tree
and the SHA-1.  Or send a patch series clearly indicating what it is
based on.

Then I can try a long run on a larger collection of systems.

If that works out, we can see about adjustments to mainline.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
       [not found]                             ` <CAFgQCTtNetv7v_Law=abPtngC8Gv6OGcGz9M_wWMxz_GAEWDUQ@mail.gmail.com>
@ 2022-10-27 18:13                               ` Paul E. McKenney
  2022-10-31  2:10                                 ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-10-27 18:13 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Oct 10, 2022 at 09:42:07AM +0800, Pingfan Liu wrote:
> On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> [...]
> 
> > >
> > > Quote from TREE04.43 console.log
> > > "
> > > ...
> > > [162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > [162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > 0 0 0 0 0 0
> > > [162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > 0 0 0 0 0 0 0
> > > [162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > [162003.266980] rcu-torture: rcu_torture_stats is stopping
> > > [162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
> > > [162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
> > > [162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
> > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
> > > = ffffffffa676d9a0
> > > [162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
> > > [162003.281563] mem_dump_obj(NULL): NULL pointer
> > > [162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
> > > starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
> > > [162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
> > > ffff965e01a58000 pointer offset 0 allocated at
> > > rcu_torture_cleanup+0x432/0x4e0
> > > [162003.284138]     kmem_cache_alloc+0x19d/0x1b0
> > > [162003.284522]     rcu_torture_cleanup+0x432/0x4e0
> > > [162003.284918]     torture_shutdown+0xce/0x1c0
> > > [162003.285285]     kthread+0xb7/0xe0
> > > [162003.285587]     ret_from_fork+0x22/0x30
> > > [162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
> > > ffff965e01a58000 pointer offset 8 allocated at
> > > rcu_torture_cleanup+0x432/0x4e0
> > > [162003.287045]     kmem_cache_alloc+0x19d/0x1b0
> > > [162003.287436]     rcu_torture_cleanup+0x432/0x4e0
> > > [162003.287835]     torture_shutdown+0xce/0x1c0
> > > [162003.288202]     kthread+0xb7/0xe0
> > > [162003.288510]     ret_from_fork+0x22/0x30
> > > [162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
> > > [162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
> > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
> > > [162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
> > > kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
> > > [162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
> > > kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
> > > [162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
> > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
> > > [162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
> > > region starting at 0xffffa0aa0002d000 allocated at
> > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > [162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
> > > region starting at 0xffffa0aa0002d000 allocated at
> > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > [162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > [162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > 0 0 0 0 0 0
> > > [162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > 0 0 0 0 0 0 0
> > > [162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > [162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
> > > nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> > > shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> > > fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> > > test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
> > > stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
> > > n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
> > > read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
> > > [162003.304162] i2c i2c-0: shutdown
> > > [162003.304461] alarmtimer alarmtimer.0.auto: shutdown
> > > [162003.304878] psmouse serio1: shutdown
> > > ...
> > > "
> >
> > Huh.  Shutdown writer stall on the first one, but I am not seeing the
> > but in the second one.  Was the "Call Trace" located somewhere else in
> > the console.log file?
> >
> > I have not seen the shutdown writer stall lately, but such stalls have
> > flagged many sorts of bugs for me to fix in the past.
> >
> >
> > So maybe there is yet another bug that my testing is for whatever reason
> > not seeing.
> 
> Sorry to reply late, I just came back from a long holiday.

Given my delay, I guess that I cannot complain...

> The attachment is the complete log of TREE04.43.

This happens because rcutorture hit OOM while doing RCU callback flood
testing.  This -might- be related to the changes, for example if timers
slowed down, thus slowing down RCU grace periods, thus allowing the
kthread doing the flooding exhaust memory.

Grace periods are progressing in the histogram:

[12077.630343] rcu_torture_fwd_cb_hist: Callback-invocation histogram 0 (duration 1321 jiffies): 1s/10: 75043:17 2s/10: 137085:35 3s/10: 143871:14 4s/10: 66038:26 5s/10: 51754:18 6s/10: 67000:16 7s/10: 54155:17 8s/10: 56749:23 9s/10: 120850:8 10s/10: 87789:17 11s/10: 111123:32 12s/10: 136322:50 13s/10: 108813:33 14s/10: 138489:1
[12077.632824] rcu: rcu_fwd_progress_check: GP age 17 jiffies

And there is this:

[12077.644478] rcu: RCU callbacks invoked since boot: 102942584
[12077.644974] rcu: rcu_fwd_progress_check: callbacks 0: 45 4: 7973717
[12077.695770] rcutorture_oom_notify: Freed 592481 RCU callbacks.

So another question is "how much memory was available to rcutorture?"

Less than 512MB could explain this.

(I do see this from time to time, but on single-CPU TREE09 runs.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-27 18:13                               ` Paul E. McKenney
@ 2022-10-31  2:10                                 ` Pingfan Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-10-31  2:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Oct 27, 2022 at 11:13:46AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 10, 2022 at 09:42:07AM +0800, Pingfan Liu wrote:
> > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> > [...]
> > 
> > > >
> > > > Quote from TREE04.43 console.log
> > > > "
> > > > ...
> > > > [162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > > [162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > > 0 0 0 0 0 0
> > > > [162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > > 0 0 0 0 0 0 0
> > > > [162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > > [162003.266980] rcu-torture: rcu_torture_stats is stopping
> > > > [162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
> > > > [162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
> > > > [162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
> > > > = ffffffffa676d9a0
> > > > [162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
> > > > [162003.281563] mem_dump_obj(NULL): NULL pointer
> > > > [162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
> > > > starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
> > > > [162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
> > > > ffff965e01a58000 pointer offset 0 allocated at
> > > > rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.284138]     kmem_cache_alloc+0x19d/0x1b0
> > > > [162003.284522]     rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.284918]     torture_shutdown+0xce/0x1c0
> > > > [162003.285285]     kthread+0xb7/0xe0
> > > > [162003.285587]     ret_from_fork+0x22/0x30
> > > > [162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
> > > > ffff965e01a58000 pointer offset 8 allocated at
> > > > rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.287045]     kmem_cache_alloc+0x19d/0x1b0
> > > > [162003.287436]     rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.287835]     torture_shutdown+0xce/0x1c0
> > > > [162003.288202]     kthread+0xb7/0xe0
> > > > [162003.288510]     ret_from_fork+0x22/0x30
> > > > [162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
> > > > [162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
> > > > [162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
> > > > kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
> > > > [162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
> > > > kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
> > > > [162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
> > > > [162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
> > > > region starting at 0xffffa0aa0002d000 allocated at
> > > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > > [162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
> > > > region starting at 0xffffa0aa0002d000 allocated at
> > > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > > [162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > > [162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > > 0 0 0 0 0 0
> > > > [162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > > 0 0 0 0 0 0 0
> > > > [162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > > [162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
> > > > nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> > > > shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> > > > fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> > > > test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
> > > > stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
> > > > n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
> > > > read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
> > > > [162003.304162] i2c i2c-0: shutdown
> > > > [162003.304461] alarmtimer alarmtimer.0.auto: shutdown
> > > > [162003.304878] psmouse serio1: shutdown
> > > > ...
> > > > "
> > >
> > > Huh.  Shutdown writer stall on the first one, but I am not seeing the
> > > but in the second one.  Was the "Call Trace" located somewhere else in
> > > the console.log file?
> > >
> > > I have not seen the shutdown writer stall lately, but such stalls have
> > > flagged many sorts of bugs for me to fix in the past.
> > >
> > >
> > > So maybe there is yet another bug that my testing is for whatever reason
> > > not seeing.
> > 
> > Sorry to reply late, I just came back from a long holiday.
> 
> Given my delay, I guess that I cannot complain...
> 

Thanks for insight to the log. I am a little astray in the whole torture
test. I will pay more time to study it.

> > The attachment is the complete log of TREE04.43.
> 
> This happens because rcutorture hit OOM while doing RCU callback flood
> testing.  This -might- be related to the changes, for example if timers
> slowed down, thus slowing down RCU grace periods, thus allowing the
> kthread doing the flooding exhaust memory.
> 
> Grace periods are progressing in the histogram:
> 
> [12077.630343] rcu_torture_fwd_cb_hist: Callback-invocation histogram 0 (duration 1321 jiffies): 1s/10: 75043:17 2s/10: 137085:35 3s/10: 143871:14 4s/10: 66038:26 5s/10: 51754:18 6s/10: 67000:16 7s/10: 54155:17 8s/10: 56749:23 9s/10: 120850:8 10s/10: 87789:17 11s/10: 111123:32 12s/10: 136322:50 13s/10: 108813:33 14s/10: 138489:1
> [12077.632824] rcu: rcu_fwd_progress_check: GP age 17 jiffies
> 
> And there is this:
> 
> [12077.644478] rcu: RCU callbacks invoked since boot: 102942584
> [12077.644974] rcu: rcu_fwd_progress_check: callbacks 0: 45 4: 7973717
> [12077.695770] rcutorture_oom_notify: Freed 592481 RCU callbacks.
> 
> So another question is "how much memory was available to rcutorture?"
> 
> Less than 512MB could explain this.
> 

Each machine has 262144 MB RAM, and runs 32 TREE04 instance. So each
instance has nearly 8192 MB RAM. (I do not run any other load)


Thanks,

	Pingfan

> (I do see this from time to time, but on single-CPU TREE09 runs.)
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-27 17:46                               ` Paul E. McKenney
@ 2022-10-31  3:24                                 ` Pingfan Liu
  2022-11-03 16:51                                   ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-10-31  3:24 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > [...]
> >
> > > >
> > > > But unfortunately, I did not keep the data. I will run it again and
> > > > submit the data.
> > >
> >
> > I have finished the test on a machine with two sockets and 256 cpus.
> > The test runs against the kernel with three commits reverted.
> >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > IRQ entry")
> >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> >
> > Summary from console.log
> > "
> >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > Results directory:
> > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > 125h --bootargs rcutorture.onoff_interval=200
> > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > TREE04 no success message, 2897 successful version messages
> > Completed in 44512 vs. 450000
> > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > TREE04.10 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > TREE04.11 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > TREE04.12 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > TREE04.13 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > TREE04.14 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > TREE04.15 no success message, 2897 successful version messages
> > Completed in 44510 vs. 450000
> > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > TREE04.16 no success message, 2897 successful version messages
> > Completed in 44510 vs. 450000
> > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > TREE04.17 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > TREE04.18 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > TREE04.19 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > ...
> > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > TREE04.4 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > TREE04.5 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > TREE04.6 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > TREE04.7 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > TREE04.8 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > TREE04.9 no success message, 2897 successful version messages
> > Completed in 44510 vs. 450000
> > 32 runs with runtime errors.
> >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > "
> > I have no idea about the test so just arbitrarily pick up the
> > console.log of TREE04.10 as an example. Please get it from attachment.
>
> Very good, thank you!
>
> Could you please clearly indicate what you tested?  For example, if
> you have an externally visible git tree, please point me at the tree
> and the SHA-1.  Or send a patch series clearly indicating what it is
> based on.
>

Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.

Please clone it from https://github.com/pfliu/linux.git  branch:
rcu#revert_tick_dep


> Then I can try a long run on a larger collection of systems.
>

Thank you very much.

> If that works out, we can see about adjustments to mainline.  ;-)
>

Eager to see.

Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-10-31  3:24                                 ` Pingfan Liu
@ 2022-11-03 16:51                                   ` Paul E. McKenney
  2022-11-07 16:07                                     ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-11-03 16:51 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > [...]
> > >
> > > > >
> > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > submit the data.
> > > >
> > >
> > > I have finished the test on a machine with two sockets and 256 cpus.
> > > The test runs against the kernel with three commits reverted.
> > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > IRQ entry")
> > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > >
> > > Summary from console.log
> > > "
> > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > Results directory:
> > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > 125h --bootargs rcutorture.onoff_interval=200
> > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > TREE04 no success message, 2897 successful version messages
> > > Completed in 44512 vs. 450000
> > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > TREE04.10 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > TREE04.11 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > TREE04.12 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > TREE04.13 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > TREE04.14 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > TREE04.15 no success message, 2897 successful version messages
> > > Completed in 44510 vs. 450000
> > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > TREE04.16 no success message, 2897 successful version messages
> > > Completed in 44510 vs. 450000
> > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > TREE04.17 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > TREE04.18 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > TREE04.19 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > ...
> > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > TREE04.4 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > TREE04.5 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > TREE04.6 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > TREE04.7 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > TREE04.8 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > TREE04.9 no success message, 2897 successful version messages
> > > Completed in 44510 vs. 450000
> > > 32 runs with runtime errors.
> > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > "
> > > I have no idea about the test so just arbitrarily pick up the
> > > console.log of TREE04.10 as an example. Please get it from attachment.
> >
> > Very good, thank you!
> >
> > Could you please clearly indicate what you tested?  For example, if
> > you have an externally visible git tree, please point me at the tree
> > and the SHA-1.  Or send a patch series clearly indicating what it is
> > based on.
> >
> 
> Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> 
> Please clone it from https://github.com/pfliu/linux.git  branch:
> rcu#revert_tick_dep

Thank you very much!

> > Then I can try a long run on a larger collection of systems.
> >
> 
> Thank you very much.
> 
> > If that works out, we can see about adjustments to mainline.  ;-)
> >
> 
> Eager to see.

I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
hours on v6.0, which precedes these commits, and everything passed.

I will run more, primarily on v6.0, but that is what I have thus far.
At the moment, I have some concerns about this change.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-03 16:51                                   ` Paul E. McKenney
@ 2022-11-07 16:07                                     ` Paul E. McKenney
  2022-11-09 18:55                                       ` Joel Fernandes
  2022-11-17 14:39                                       ` Frederic Weisbecker
  0 siblings, 2 replies; 61+ messages in thread
From: Paul E. McKenney @ 2022-11-07 16:07 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Frederic Weisbecker, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > [...]
> > > >
> > > > > >
> > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > submit the data.
> > > > >
> > > >
> > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > The test runs against the kernel with three commits reverted.
> > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > IRQ entry")
> > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > >
> > > > Summary from console.log
> > > > "
> > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > Results directory:
> > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > TREE04 no success message, 2897 successful version messages
> > > > Completed in 44512 vs. 450000
> > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > TREE04.10 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > TREE04.11 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > TREE04.12 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > TREE04.13 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > TREE04.14 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > TREE04.15 no success message, 2897 successful version messages
> > > > Completed in 44510 vs. 450000
> > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > TREE04.16 no success message, 2897 successful version messages
> > > > Completed in 44510 vs. 450000
> > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > TREE04.17 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > TREE04.18 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > TREE04.19 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > ...
> > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > TREE04.4 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > TREE04.5 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > TREE04.6 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > TREE04.7 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > TREE04.8 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > TREE04.9 no success message, 2897 successful version messages
> > > > Completed in 44510 vs. 450000
> > > > 32 runs with runtime errors.
> > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > "
> > > > I have no idea about the test so just arbitrarily pick up the
> > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > >
> > > Very good, thank you!
> > >
> > > Could you please clearly indicate what you tested?  For example, if
> > > you have an externally visible git tree, please point me at the tree
> > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > based on.
> > >
> > 
> > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > 
> > Please clone it from https://github.com/pfliu/linux.git  branch:
> > rcu#revert_tick_dep
> 
> Thank you very much!
> 
> > > Then I can try a long run on a larger collection of systems.
> > >
> > 
> > Thank you very much.
> > 
> > > If that works out, we can see about adjustments to mainline.  ;-)
> > >
> > 
> > Eager to see.
> 
> I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> hours on v6.0, which precedes these commits, and everything passed.
> 
> I will run more, primarily on v6.0, but that is what I have thus far.
> At the moment, I have some concerns about this change.

OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
looked at were RCU CPU stall warnings with timer failures.

This data suggests that the kernel is not yet ready for that commit
to be reverted.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-07 16:07                                     ` Paul E. McKenney
@ 2022-11-09 18:55                                       ` Joel Fernandes
  2022-11-18 12:08                                         ` Pingfan Liu
  2022-11-17 14:39                                       ` Frederic Weisbecker
  1 sibling, 1 reply; 61+ messages in thread
From: Joel Fernandes @ 2022-11-09 18:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Pingfan Liu, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > [...]
> > > > >
> > > > > > >
> > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > submit the data.
> > > > > >
> > > > >
> > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > The test runs against the kernel with three commits reverted.
> > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > IRQ entry")
> > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > >
> > > > > Summary from console.log
> > > > > "
> > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > Results directory:
> > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > TREE04 no success message, 2897 successful version messages
> > > > > Completed in 44512 vs. 450000
> > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > ...
> > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > 32 runs with runtime errors.
> > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > "
> > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > >
> > > > Very good, thank you!
> > > >
> > > > Could you please clearly indicate what you tested?  For example, if
> > > > you have an externally visible git tree, please point me at the tree
> > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > based on.
> > > >
> > > 
> > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > 
> > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > rcu#revert_tick_dep
> > 
> > Thank you very much!
> > 
> > > > Then I can try a long run on a larger collection of systems.
> > > >
> > > 
> > > Thank you very much.
> > > 
> > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > >
> > > 
> > > Eager to see.
> > 
> > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > hours on v6.0, which precedes these commits, and everything passed.
> > 
> > I will run more, primarily on v6.0, but that is what I have thus far.
> > At the moment, I have some concerns about this change.
> 
> OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> looked at were RCU CPU stall warnings with timer failures.
> 
> This data suggests that the kernel is not yet ready for that commit
> to be reverted.

Even if the tests pass, can we really survive with this patch
that he reverted?
https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7

If stop machine on a CPU spends a good amount of time in kernel mode, while a
grace period starts on another CPU, then we're kind of screwed if we don't
have the tick enabled right?

Or, did we make any changes to stop machine such that, that's no longer an
issue?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-07 16:07                                     ` Paul E. McKenney
  2022-11-09 18:55                                       ` Joel Fernandes
@ 2022-11-17 14:39                                       ` Frederic Weisbecker
  2022-11-18  1:45                                         ` Pingfan Liu
  1 sibling, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2022-11-17 14:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Pingfan Liu, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > hours on v6.0, which precedes these commits, and everything passed.
> > 
> > I will run more, primarily on v6.0, but that is what I have thus far.
> > At the moment, I have some concerns about this change.
> 
> OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> looked at were RCU CPU stall warnings with timer failures.
> 
> This data suggests that the kernel is not yet ready for that commit
> to be reverted.

But that branch has the three commits reverted:

1) tick: Detect and fix jiffies update stall
2) timers/nohz: Last resort update jiffies on nohz_full IRQ entry*
3) rcu: Make CPU-hotplug removal operations enable tick

Reverting all of them is expected to fail anyway.

What we would like to know is if reverting just 3) is fine. Because
1) and 2) are supposed to fix the underlying issue.

I personally didn't manage to trigger failures with just reverting 3)
after thousands hours. But it failed with reverting all of them.

Has someone managed to trigger a failure with only 3) reverted?

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-17 14:39                                       ` Frederic Weisbecker
@ 2022-11-18  1:45                                         ` Pingfan Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Pingfan Liu @ 2022-11-18  1:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Thu, Nov 17, 2022 at 10:39 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > hours on v6.0, which precedes these commits, and everything passed.
> > >
> > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > At the moment, I have some concerns about this change.
> >
> > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > looked at were RCU CPU stall warnings with timer failures.
> >
> > This data suggests that the kernel is not yet ready for that commit
> > to be reverted.
>
> But that branch has the three commits reverted:
>
> 1) tick: Detect and fix jiffies update stall
> 2) timers/nohz: Last resort update jiffies on nohz_full IRQ entry*
> 3) rcu: Make CPU-hotplug removal operations enable tick
>
> Reverting all of them is expected to fail anyway.
>
> What we would like to know is if reverting just 3) is fine. Because
> 1) and 2) are supposed to fix the underlying issue.
>
> I personally didn't manage to trigger failures with just reverting 3)
> after thousands hours. But it failed with reverting all of them.
>
> Has someone managed to trigger a failure with only 3) reverted?
>

Oh, a long long thread hides the history!   Once it was done [1].  But
I have not kept the data (IIRC, it occupied ~1GB).

If the information and attachment in [1] are not enough for you, I am
happy to redo the test.

Thanks,

    Pingfan

[1]: https://lore.kernel.org/all/CAFgQCTtLm-JXRyQfKo6-+P00SShVGujZGau+khmtCe1AiRodQA@mail.gmail.com/

> Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-09 18:55                                       ` Joel Fernandes
@ 2022-11-18 12:08                                         ` Pingfan Liu
  2022-11-18 23:30                                           ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-11-18 12:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > >
> > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > submit the data.
> > > > > > >
> > > > > >
> > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > The test runs against the kernel with three commits reverted.
> > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > IRQ entry")
> > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > >
> > > > > > Summary from console.log
> > > > > > "
> > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > Results directory:
> > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > Completed in 44512 vs. 450000
> > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > Completed in 44510 vs. 450000
> > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > Completed in 44510 vs. 450000
> > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > ...
> > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > Completed in 44510 vs. 450000
> > > > > > 32 runs with runtime errors.
> > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > "
> > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > >
> > > > > Very good, thank you!
> > > > >
> > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > you have an externally visible git tree, please point me at the tree
> > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > based on.
> > > > >
> > > >
> > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > >
> > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > rcu#revert_tick_dep
> > >
> > > Thank you very much!
> > >
> > > > > Then I can try a long run on a larger collection of systems.
> > > > >
> > > >
> > > > Thank you very much.
> > > >
> > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > >
> > > >
> > > > Eager to see.
> > >
> > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > hours on v6.0, which precedes these commits, and everything passed.
> > >
> > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > At the moment, I have some concerns about this change.
> >
> > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > looked at were RCU CPU stall warnings with timer failures.
> >
> > This data suggests that the kernel is not yet ready for that commit
> > to be reverted.
>
> Even if the tests pass, can we really survive with this patch
> that he reverted?
> https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
>
> If stop machine on a CPU spends a good amount of time in kernel mode, while a
> grace period starts on another CPU, then we're kind of screwed if we don't
> have the tick enabled right?
>

In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
can serve this purpose.

Thanks,

    Pingfan

> Or, did we make any changes to stop machine such that, that's no longer an
> issue?
>
> thanks,
>
>  - Joel
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-18 12:08                                         ` Pingfan Liu
@ 2022-11-18 23:30                                           ` Paul E. McKenney
  2022-11-21  3:48                                             ` Pingfan Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Paul E. McKenney @ 2022-11-18 23:30 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Joel Fernandes, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Fri, Nov 18, 2022 at 08:08:35PM +0800, Pingfan Liu wrote:
> On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > >
> > > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > > submit the data.
> > > > > > > >
> > > > > > >
> > > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > > The test runs against the kernel with three commits reverted.
> > > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > > IRQ entry")
> > > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > > >
> > > > > > > Summary from console.log
> > > > > > > "
> > > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > > Results directory:
> > > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > > Completed in 44512 vs. 450000
> > > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > > Completed in 44510 vs. 450000
> > > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > > Completed in 44510 vs. 450000
> > > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > ...
> > > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > > Completed in 44510 vs. 450000
> > > > > > > 32 runs with runtime errors.
> > > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > > "
> > > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > > >
> > > > > > Very good, thank you!
> > > > > >
> > > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > > you have an externally visible git tree, please point me at the tree
> > > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > > based on.
> > > > > >
> > > > >
> > > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > > >
> > > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > > rcu#revert_tick_dep
> > > >
> > > > Thank you very much!
> > > >
> > > > > > Then I can try a long run on a larger collection of systems.
> > > > > >
> > > > >
> > > > > Thank you very much.
> > > > >
> > > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > > >
> > > > >
> > > > > Eager to see.
> > > >
> > > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > > hours on v6.0, which precedes these commits, and everything passed.
> > > >
> > > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > > At the moment, I have some concerns about this change.
> > >
> > > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > > looked at were RCU CPU stall warnings with timer failures.
> > >
> > > This data suggests that the kernel is not yet ready for that commit
> > > to be reverted.
> >
> > Even if the tests pass, can we really survive with this patch
> > that he reverted?
> > https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
> >
> > If stop machine on a CPU spends a good amount of time in kernel mode, while a
> > grace period starts on another CPU, then we're kind of screwed if we don't
> > have the tick enabled right?
> 
> In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
> can serve this purpose.

There are a lot of things that can go wrong in this scenario.  Does that
added rcu_momentary_dyntick_idle() cover everything that still needs to
be covered?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-18 23:30                                           ` Paul E. McKenney
@ 2022-11-21  3:48                                             ` Pingfan Liu
  2022-11-21 17:14                                               ` Paul E. McKenney
  0 siblings, 1 reply; 61+ messages in thread
From: Pingfan Liu @ 2022-11-21  3:48 UTC (permalink / raw)
  To: paulmck
  Cc: Joel Fernandes, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Sat, Nov 19, 2022 at 7:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Nov 18, 2022 at 08:08:35PM +0800, Pingfan Liu wrote:
> > On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > > > submit the data.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > > > The test runs against the kernel with three commits reverted.
> > > > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > > > IRQ entry")
> > > > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > > > >
> > > > > > > > Summary from console.log
> > > > > > > > "
> > > > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > > > Results directory:
> > > > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > > > Completed in 44512 vs. 450000
> > > > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > ...
> > > > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > 32 runs with runtime errors.
> > > > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > > > "
> > > > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > > > >
> > > > > > > Very good, thank you!
> > > > > > >
> > > > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > > > you have an externally visible git tree, please point me at the tree
> > > > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > > > based on.
> > > > > > >
> > > > > >
> > > > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > > > >
> > > > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > > > rcu#revert_tick_dep
> > > > >
> > > > > Thank you very much!
> > > > >
> > > > > > > Then I can try a long run on a larger collection of systems.
> > > > > > >
> > > > > >
> > > > > > Thank you very much.
> > > > > >
> > > > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > > > >
> > > > > >
> > > > > > Eager to see.
> > > > >
> > > > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > > > hours on v6.0, which precedes these commits, and everything passed.
> > > > >
> > > > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > > > At the moment, I have some concerns about this change.
> > > >
> > > > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > > > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > > > looked at were RCU CPU stall warnings with timer failures.
> > > >
> > > > This data suggests that the kernel is not yet ready for that commit
> > > > to be reverted.
> > >
> > > Even if the tests pass, can we really survive with this patch
> > > that he reverted?
> > > https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
> > >
> > > If stop machine on a CPU spends a good amount of time in kernel mode, while a
> > > grace period starts on another CPU, then we're kind of screwed if we don't
> > > have the tick enabled right?
> >
> > In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
> > can serve this purpose.
>
> There are a lot of things that can go wrong in this scenario.  Does that
> added rcu_momentary_dyntick_idle() cover everything that still needs to
> be covered?
>

Absorbed by the context "while a grace period starts on another CPU",
I assumed it is from the RCU perspective. Then I think every non-idle
cpu can report its quiescent state in that case.

But from a system perspective, I think that it is promising but not
sure. And that is what the TREE04 torture test tries to verify.

Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
  2022-11-21  3:48                                             ` Pingfan Liu
@ 2022-11-21 17:14                                               ` Paul E. McKenney
  0 siblings, 0 replies; 61+ messages in thread
From: Paul E. McKenney @ 2022-11-21 17:14 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Joel Fernandes, Frederic Weisbecker, rcu, David Woodhouse,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jason A. Donenfeld

On Mon, Nov 21, 2022 at 11:48:29AM +0800, Pingfan Liu wrote:
> On Sat, Nov 19, 2022 at 7:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Nov 18, 2022 at 08:08:35PM +0800, Pingfan Liu wrote:
> > > On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > [...]
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > > > > submit the data.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > > > > The test runs against the kernel with three commits reverted.
> > > > > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > > > > IRQ entry")
> > > > > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > > > > >
> > > > > > > > > Summary from console.log
> > > > > > > > > "
> > > > > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > > > > Results directory:
> > > > > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44512 vs. 450000
> > > > > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > ...
> > > > > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > > 32 runs with runtime errors.
> > > > > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > > > > "
> > > > > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > > > > >
> > > > > > > > Very good, thank you!
> > > > > > > >
> > > > > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > > > > you have an externally visible git tree, please point me at the tree
> > > > > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > > > > based on.
> > > > > > > >
> > > > > > >
> > > > > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > > > > >
> > > > > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > > > > rcu#revert_tick_dep
> > > > > >
> > > > > > Thank you very much!
> > > > > >
> > > > > > > > Then I can try a long run on a larger collection of systems.
> > > > > > > >
> > > > > > >
> > > > > > > Thank you very much.
> > > > > > >
> > > > > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > > > > >
> > > > > > >
> > > > > > > Eager to see.
> > > > > >
> > > > > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > > > > hours on v6.0, which precedes these commits, and everything passed.
> > > > > >
> > > > > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > > > > At the moment, I have some concerns about this change.
> > > > >
> > > > > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > > > > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > > > > looked at were RCU CPU stall warnings with timer failures.
> > > > >
> > > > > This data suggests that the kernel is not yet ready for that commit
> > > > > to be reverted.
> > > >
> > > > Even if the tests pass, can we really survive with this patch
> > > > that he reverted?
> > > > https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
> > > >
> > > > If stop machine on a CPU spends a good amount of time in kernel mode, while a
> > > > grace period starts on another CPU, then we're kind of screwed if we don't
> > > > have the tick enabled right?
> > >
> > > In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
> > > can serve this purpose.
> >
> > There are a lot of things that can go wrong in this scenario.  Does that
> > added rcu_momentary_dyntick_idle() cover everything that still needs to
> > be covered?
> 
> Absorbed by the context "while a grace period starts on another CPU",
> I assumed it is from the RCU perspective. Then I think every non-idle
> cpu can report its quiescent state in that case.
> 
> But from a system perspective, I think that it is promising but not
> sure. And that is what the TREE04 torture test tries to verify.

The specific case I was thinking of is when all but the incoming
CPU are spinning with interrupts disabled, and the incoming CPU is
getting hammered with scheduler-tick interrupts.  In that case, the RCU
grace-period kthread won't be running anyway.  So there will be nothing
to notice the calls to rcu_momentary_dyntick_idle().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2022-11-21 17:17 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15  5:58 [PATCHv2 0/3] rcu: Enhance the capability to cope with concurrent cpu offlining/onlining Pingfan Liu
2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
2022-09-15  6:11   ` Pingfan Liu
2022-09-16 14:52   ` Frederic Weisbecker
2022-09-19 10:24     ` Pingfan Liu
2022-09-19 10:51       ` Frederic Weisbecker
2022-09-20  3:45         ` Pingfan Liu
2022-09-20  9:23           ` Frederic Weisbecker
2022-10-01  2:26             ` Joel Fernandes
2022-10-02 12:34               ` Pingfan Liu
2022-10-02 15:52                 ` Joel Fernandes
2022-09-20 10:31   ` Frederic Weisbecker
2022-09-21 11:56     ` Pingfan Liu
2022-09-15  5:58 ` [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining Pingfan Liu
2022-09-16 14:23   ` Frederic Weisbecker
2022-09-19  4:33     ` Pingfan Liu
2022-09-19 10:34       ` Frederic Weisbecker
2022-09-20  3:16         ` Pingfan Liu
2022-09-20  9:00           ` Frederic Weisbecker
2022-09-20  9:38   ` Frederic Weisbecker
2022-09-21 11:48     ` Pingfan Liu
2022-09-15  5:58 ` [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining Pingfan Liu
2022-09-16 13:42   ` Frederic Weisbecker
2022-09-20  7:26     ` Pingfan Liu
2022-09-20  9:46       ` Frederic Weisbecker
2022-09-20 19:13         ` Paul E. McKenney
2022-09-22  9:29           ` Pingfan Liu
2022-09-22 13:54             ` Paul E. McKenney
2022-09-23 22:13               ` Frederic Weisbecker
2022-09-26  6:34               ` Pingfan Liu
2022-09-26 22:23                 ` Paul E. McKenney
2022-09-27  9:59                   ` Pingfan Liu
2022-09-29  8:19                     ` Pingfan Liu
2022-09-29  8:20                       ` Pingfan Liu
2022-09-30 13:04                         ` Joel Fernandes
2022-10-02 14:06                           ` Pingfan Liu
2022-10-02 16:11                             ` Joel Fernandes
2022-10-02 16:24                               ` Paul E. McKenney
2022-10-02 16:30                                 ` Joel Fernandes
2022-10-02 16:57                                   ` Paul E. McKenney
2022-10-02 16:59                                     ` Joel Fernandes
2022-09-30 15:44                       ` Paul E. McKenney
2022-10-02 13:29                         ` Pingfan Liu
2022-10-02 15:08                           ` Frederic Weisbecker
2022-10-02 16:20                             ` Paul E. McKenney
2022-10-02 16:20                           ` Paul E. McKenney
     [not found]                             ` <CAFgQCTtgLfc0NeYqyWk4Ew-pA9rMREjRjWSnQhYLv-V5117s9Q@mail.gmail.com>
2022-10-27 17:46                               ` Paul E. McKenney
2022-10-31  3:24                                 ` Pingfan Liu
2022-11-03 16:51                                   ` Paul E. McKenney
2022-11-07 16:07                                     ` Paul E. McKenney
2022-11-09 18:55                                       ` Joel Fernandes
2022-11-18 12:08                                         ` Pingfan Liu
2022-11-18 23:30                                           ` Paul E. McKenney
2022-11-21  3:48                                             ` Pingfan Liu
2022-11-21 17:14                                               ` Paul E. McKenney
2022-11-17 14:39                                       ` Frederic Weisbecker
2022-11-18  1:45                                         ` Pingfan Liu
     [not found]                             ` <CAFgQCTtNetv7v_Law=abPtngC8Gv6OGcGz9M_wWMxz_GAEWDUQ@mail.gmail.com>
2022-10-27 18:13                               ` Paul E. McKenney
2022-10-31  2:10                                 ` Pingfan Liu
2022-09-26 16:13   ` Joel Fernandes
2022-09-27  9:42     ` Pingfan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.