linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
@ 2018-09-10 13:56 Sebastian Andrzej Siewior
  2018-09-11 16:05 ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-09-10 13:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Boqun Feng, Paul E. McKenney, Peter Zijlstra, Aneesh Kumar K.V,
	tglx, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

It was possible that sync_rcu_exp_select_cpus() enqueued something on
CPU0 while CPU0 was offline. Such a work item wouldn't be processed
until CPU0 gets back online. This problem was addressed in commit
fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
don't think the issue fully addressed.

Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
remains online between looking at cpu_online_mask and invoking
queue_work_on() on CPU1.

Use cpus_read_lock() to ensure that `cpu' is not going down between
looking at cpu_online_mask at invoking queue_work_on() and waiting for
its completion. It is added around the loop + flush_work() which is
similar to work_on_cpu_safe() (and we can have multiple jobs running on
NUMA systems).

Fixes: fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being
		      offline")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/rcu/tree_exp.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 01b6ddeb4f050..a104cf91e6b90 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -479,6 +479,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 	sync_exp_reset_tree(rsp);
 	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("select"));
 
+	cpus_read_lock();
 	/* Schedule work for each leaf rcu_node structure. */
 	rcu_for_each_leaf_node(rsp, rnp) {
 		rnp->exp_need_flush = false;
@@ -493,13 +494,11 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 			continue;
 		}
 		INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
-		preempt_disable();
 		cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
 		/* If all offline, queue the work on an unbound CPU. */
 		if (unlikely(cpu > rnp->grphi))
 			cpu = WORK_CPU_UNBOUND;
 		queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
-		preempt_enable();
 		rnp->exp_need_flush = true;
 	}
 
@@ -507,6 +506,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 	rcu_for_each_leaf_node(rsp, rnp)
 		if (rnp->exp_need_flush)
 			flush_work(&rnp->rew.rew_work);
+	cpus_read_unlock();
 }
 
 static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
-- 
2.19.0.rc2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-09-10 13:56 [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask Sebastian Andrzej Siewior
@ 2018-09-11 16:05 ` Paul E. McKenney
  2018-09-11 16:21   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2018-09-11 16:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Boqun Feng, Peter Zijlstra, Aneesh Kumar K.V, tglx,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

On Mon, Sep 10, 2018 at 03:56:16PM +0200, Sebastian Andrzej Siewior wrote:
> It was possible that sync_rcu_exp_select_cpus() enqueued something on
> CPU0 while CPU0 was offline. Such a work item wouldn't be processed
> until CPU0 gets back online. This problem was addressed in commit
> fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
> don't think the issue fully addressed.
> 
> Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
> on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
> remains online between looking at cpu_online_mask and invoking
> queue_work_on() on CPU1.
> 
> Use cpus_read_lock() to ensure that `cpu' is not going down between
> looking at cpu_online_mask at invoking queue_work_on() and waiting for
> its completion. It is added around the loop + flush_work() which is
> similar to work_on_cpu_safe() (and we can have multiple jobs running on
> NUMA systems).

Is this experimental or theoretical?  If theoretical, the counter-theory
is that the stop-machine processing prevents any of the cpu_online_mask
bits from changing, though, yes, we would like to get rid of the
stop-machine processing.  So either way, yes, the current state could
use some improvement.

But one problem with the patch below is that sync_rcu_exp_select_cpus()
can be called while the cpu_hotplug_lock is write-held.  Or is that
somehow OK these days?  Assuming not, how about the (untested) patch
below?

							Thanx, Paul

> Fixes: fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being
> 		      offline")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/rcu/tree_exp.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 01b6ddeb4f050..a104cf91e6b90 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -479,6 +479,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  	sync_exp_reset_tree(rsp);
>  	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("select"));
> 
> +	cpus_read_lock();
>  	/* Schedule work for each leaf rcu_node structure. */
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		rnp->exp_need_flush = false;
> @@ -493,13 +494,11 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  			continue;
>  		}
>  		INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
> -		preempt_disable();
>  		cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
>  		/* If all offline, queue the work on an unbound CPU. */
>  		if (unlikely(cpu > rnp->grphi))
>  			cpu = WORK_CPU_UNBOUND;
>  		queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
> -		preempt_enable();
>  		rnp->exp_need_flush = true;
>  	}
> 
> @@ -507,6 +506,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  	rcu_for_each_leaf_node(rsp, rnp)
>  		if (rnp->exp_need_flush)
>  			flush_work(&rnp->rew.rew_work);
> +	cpus_read_unlock();
>  }
> 
>  static void synchronize_sched_expedited_wait(struct rcu_state *rsp)

commit 5214cbbfe6a5d6b92c76c4e411a049fe57245d4a
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Sep 11 08:57:48 2018 -0700

    rcu: Stop expedited grace periods from relying on stop-machine
    
    The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
    to prevent the cpu_online_mask from changing.  However, this relies on
    the stop-machine mechanism in the CPU-hotplug offline code, which is not
    desirable (it would be good to someday remove the stop-machine mechanism).
    
    This commit therefore instead uses the relevant leaf rcu_node structure's
    ->ffmask, which has a bit set for all CPUs that are fully functional.
    A given CPU's bit is cleared very early during offline processing by
    rcutree_offline_cpu() and set very late during online processsing by
    rcutree_online_cpu().  Therefore, if a CPU's bit is set in this mask, and
    preemption is disabled, we have to be before the synchronize_sched() in
    the CPU-hotplug offline code, which means that the CPU is guaranteed to be
    workqueue-ready throughout the duration of the enclosing preempt_disable()
    region of code.
    
    This also has the side-effect of using WORK_CPU_UNBOUND if all the CPUs for
    this leaf rcu_node structure are offline, which is an acceptable difference
    in behavior.
    
    Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 8d18c1014e2b..e669ccf3751b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -450,10 +450,12 @@ static void sync_rcu_exp_select_cpus(smp_call_func_t func)
 		}
 		INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
 		preempt_disable();
-		cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
+		cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
 		/* If all offline, queue the work on an unbound CPU. */
-		if (unlikely(cpu > rnp->grphi))
+		if (unlikely(cpu > rnp->grphi - rnp->grplo))
 			cpu = WORK_CPU_UNBOUND;
+		else
+			cpu += rnp->grplo;
 		queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
 		preempt_enable();
 		rnp->exp_need_flush = true;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-09-11 16:05 ` Paul E. McKenney
@ 2018-09-11 16:21   ` Sebastian Andrzej Siewior
  2018-09-11 17:02     ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-09-11 16:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Boqun Feng, Peter Zijlstra, Aneesh Kumar K.V, tglx,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

On 2018-09-11 09:05:32 [-0700], Paul E. McKenney wrote:
> On Mon, Sep 10, 2018 at 03:56:16PM +0200, Sebastian Andrzej Siewior wrote:
> > It was possible that sync_rcu_exp_select_cpus() enqueued something on
> > CPU0 while CPU0 was offline. Such a work item wouldn't be processed
> > until CPU0 gets back online. This problem was addressed in commit
> > fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
> > don't think the issue fully addressed.
> > 
> > Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
> > on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
> > remains online between looking at cpu_online_mask and invoking
> > queue_work_on() on CPU1.
> > 
> > Use cpus_read_lock() to ensure that `cpu' is not going down between
> > looking at cpu_online_mask at invoking queue_work_on() and waiting for
> > its completion. It is added around the loop + flush_work() which is
> > similar to work_on_cpu_safe() (and we can have multiple jobs running on
> > NUMA systems).
> 
> Is this experimental or theoretical?

theoretical. I saw that hunk on RT and I can't have queue_work() within
a preempt_disable() section here.

> If theoretical, the counter-theory
> is that the stop-machine processing prevents any of the cpu_online_mask
> bits from changing, though, yes, we would like to get rid of the
> stop-machine processing.  So either way, yes, the current state could
> use some improvement.
> 
> But one problem with the patch below is that sync_rcu_exp_select_cpus()
> can be called while the cpu_hotplug_lock is write-held.  Or is that
> somehow OK these days?  

depends. Is it okay to wait until the write-lock is dropped? If it is,
then it is okay. If not…

> Assuming not, how about the (untested) patch
> below?

Doesn't work for me because it is still within the preempt-disable
section :/.
Would it work to use WORK_CPU_UNBOUND? As far as I understand it, the
CPU number does not matter, you just want to spread it across multiple
CPUs in the NUMA case.

> 							Thanx, Paul
> 
> commit 5214cbbfe6a5d6b92c76c4e411a049fe57245d4a
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Tue Sep 11 08:57:48 2018 -0700
> 
>     rcu: Stop expedited grace periods from relying on stop-machine
>     
>     The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
>     to prevent the cpu_online_mask from changing.  However, this relies on
>     the stop-machine mechanism in the CPU-hotplug offline code, which is not
>     desirable (it would be good to someday remove the stop-machine mechanism).

not that I tested it, but I still don't understand how a
preempt_disable() section on CPU1 can ensure that CPU3 won't go down. Is
there some code that invokes stop_cpus() for each CPU or so?

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-09-11 16:21   ` Sebastian Andrzej Siewior
@ 2018-09-11 17:02     ` Paul E. McKenney
  2018-09-19 20:55       ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2018-09-11 17:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Boqun Feng, Peter Zijlstra, Aneesh Kumar K.V, tglx,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, tj

On Tue, Sep 11, 2018 at 06:21:42PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-09-11 09:05:32 [-0700], Paul E. McKenney wrote:
> > On Mon, Sep 10, 2018 at 03:56:16PM +0200, Sebastian Andrzej Siewior wrote:
> > > It was possible that sync_rcu_exp_select_cpus() enqueued something on
> > > CPU0 while CPU0 was offline. Such a work item wouldn't be processed
> > > until CPU0 gets back online. This problem was addressed in commit
> > > fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
> > > don't think the issue fully addressed.
> > > 
> > > Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
> > > on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
> > > remains online between looking at cpu_online_mask and invoking
> > > queue_work_on() on CPU1.
> > > 
> > > Use cpus_read_lock() to ensure that `cpu' is not going down between
> > > looking at cpu_online_mask at invoking queue_work_on() and waiting for
> > > its completion. It is added around the loop + flush_work() which is
> > > similar to work_on_cpu_safe() (and we can have multiple jobs running on
> > > NUMA systems).
> > 
> > Is this experimental or theoretical?
> 
> theoretical. I saw that hunk on RT and I can't have queue_work() within
> a preempt_disable() section here.

OK, I feel better now.  ;-)

> > If theoretical, the counter-theory
> > is that the stop-machine processing prevents any of the cpu_online_mask
> > bits from changing, though, yes, we would like to get rid of the
> > stop-machine processing.  So either way, yes, the current state could
> > use some improvement.
> > 
> > But one problem with the patch below is that sync_rcu_exp_select_cpus()
> > can be called while the cpu_hotplug_lock is write-held.  Or is that
> > somehow OK these days?  
> 
> depends. Is it okay to wait until the write-lock is dropped? If it is,
> then it is okay. If not…

The problem is that people wait for grace periods within CPU hotplug
notifiers, so the lock won't be dropped until long after that notifier
returns.

> > Assuming not, how about the (untested) patch
> > below?
> 
> Doesn't work for me because it is still within the preempt-disable
> section :/.
> Would it work to use WORK_CPU_UNBOUND? As far as I understand it, the
> CPU number does not matter, you just want to spread it across multiple
> CPUs in the NUMA case.

Locality is a good thing, but yes, something like this?

	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && /* or whatever it is called */
	    unlikely(cpu > rnp->grphi - rnp->grplo))

Another approach that might be better longer term would be to have a
workqueue interface that treats the specified CPU as a suggestion,
and silently switches to WORK_CPU_UNBOUND if there is any problem
whatsoever with the specified CPU.  Tejun, Lai, thoughts?

> > commit 5214cbbfe6a5d6b92c76c4e411a049fe57245d4a
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue Sep 11 08:57:48 2018 -0700
> > 
> >     rcu: Stop expedited grace periods from relying on stop-machine
> >     
> >     The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
> >     to prevent the cpu_online_mask from changing.  However, this relies on
> >     the stop-machine mechanism in the CPU-hotplug offline code, which is not
> >     desirable (it would be good to someday remove the stop-machine mechanism).
> 
> not that I tested it, but I still don't understand how a
> preempt_disable() section on CPU1 can ensure that CPU3 won't go down. Is
> there some code that invokes stop_cpus() for each CPU or so?

Yes, there is a stop_machine_cpuslocked() in takedown_cpu().

There is also a synchronize_rcu_mult(call_rcu, call_rcu_sched) in
sched_cpu_deactivate().

The old code relies on the stop_machine_cpuslocked() and my proposed
patch relies on the synchronize_rcu_mult().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-09-11 17:02     ` Paul E. McKenney
@ 2018-09-19 20:55       ` Tejun Heo
  2018-09-19 22:11         ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2018-09-19 20:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Sebastian Andrzej Siewior, linux-kernel, Boqun Feng,
	Peter Zijlstra, Aneesh Kumar K.V, tglx, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan

Hello,

On Tue, Sep 11, 2018 at 10:02:22AM -0700, Paul E. McKenney wrote:
> > Doesn't work for me because it is still within the preempt-disable
> > section :/.
> > Would it work to use WORK_CPU_UNBOUND? As far as I understand it, the
> > CPU number does not matter, you just want to spread it across multiple
> > CPUs in the NUMA case.
> 
> Locality is a good thing, but yes, something like this?
> 
> 	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && /* or whatever it is called */
> 	    unlikely(cpu > rnp->grphi - rnp->grplo))
> 
> Another approach that might be better longer term would be to have a
> workqueue interface that treats the specified CPU as a suggestion,
> and silently switches to WORK_CPU_UNBOUND if there is any problem
> whatsoever with the specified CPU.  Tejun, Lai, thoughts?

Unbound workqueue is NUMA-affine by default, so using it by default
might not harm anything.  Also, per-cpu work items get unbound from
the cpu if the cpu goes down while the work item is running or queued,
so it might just work already.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-09-19 20:55       ` Tejun Heo
@ 2018-09-19 22:11         ` Paul E. McKenney
  2018-10-12 18:41           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2018-09-19 22:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sebastian Andrzej Siewior, linux-kernel, Boqun Feng,
	Peter Zijlstra, Aneesh Kumar K.V, tglx, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan

On Wed, Sep 19, 2018 at 01:55:21PM -0700, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 11, 2018 at 10:02:22AM -0700, Paul E. McKenney wrote:
> > > Doesn't work for me because it is still within the preempt-disable
> > > section :/.
> > > Would it work to use WORK_CPU_UNBOUND? As far as I understand it, the
> > > CPU number does not matter, you just want to spread it across multiple
> > > CPUs in the NUMA case.
> > 
> > Locality is a good thing, but yes, something like this?
> > 
> > 	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && /* or whatever it is called */
> > 	    unlikely(cpu > rnp->grphi - rnp->grplo))
> > 
> > Another approach that might be better longer term would be to have a
> > workqueue interface that treats the specified CPU as a suggestion,
> > and silently switches to WORK_CPU_UNBOUND if there is any problem
> > whatsoever with the specified CPU.  Tejun, Lai, thoughts?
> 
> Unbound workqueue is NUMA-affine by default, so using it by default
> might not harm anything.

OK, so the above workaround would function correctly on -rt, thank you!

Sebastian, is there a counterpart to CONFIG_PREEMPT_RT already in
mainline?  If so, I would be happy to make mainline safe for -rt.

>                           Also, per-cpu work items get unbound from
> the cpu if the cpu goes down while the work item is running or queued,
> so it might just work already.

There are race conditions where the work item is queued at an inopportune
time during the offline process, resulting in a splat, hence the need
for a check with preemption disabled in order to synchronize with the
synchronize_sched() in the offline process.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-09-19 22:11         ` Paul E. McKenney
@ 2018-10-12 18:41           ` Sebastian Andrzej Siewior
  2018-10-13 13:48             ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-12 18:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tejun Heo, linux-kernel, Boqun Feng, Peter Zijlstra,
	Aneesh Kumar K.V, tglx, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On 2018-09-19 15:11:40 [-0700], Paul E. McKenney wrote:
> On Wed, Sep 19, 2018 at 01:55:21PM -0700, Tejun Heo wrote:
> > Unbound workqueue is NUMA-affine by default, so using it by default
> > might not harm anything.
> 
> OK, so the above workaround would function correctly on -rt, thank you!
> 
> Sebastian, is there a counterpart to CONFIG_PREEMPT_RT already in
> mainline?  If so, I would be happy to make mainline safe for -rt.

Now that I stumbled upon it again, I noticed that I never replied here.
Sorry for that.

Let me summarize:
sync_rcu_exp_select_cpus() used
	queue_work_on(rnp->grplo, rcu_par_gp_wq, &rnp->rew.rew_work);

which was changed in commit fcc6354365015 ("rcu: Make expedited GPs
handle CPU 0 being offline"). The commit claims that this is needed in
case CPU0 is offline so it tries to find another CPU starting with the
possible offline CPU. It might cross to another NUMA node but that is
not really a problem, it just tries to remain on the "local NUMA node".

After that commit, the code invokes queue_work_on() within a
preempt_disable() section because it can't use cpus_read_lock() to
ensure that the CPU won't go offline in the middle of testing (and
preempt_disable() does the trick).
For RT reasons I would like to get rid of queue_work_on() within the
preempt_disable() section.
Tejun said that enqueueing an item on an unbound workqueue is NUMA
affine.

I figured out that enqueueing an item on an offline CPU is not a problem
and it will pop up on a "random" CPU which means it will be carried out
asap and will not wait until the CPU gets back online. Therefore I don't
understand the commit fcc6354365015.

May I suggest the following change? It will enqueue the work item on
the first CPU on the NUMA node and the "unbound" part of the work queue
ensures that a CPU of that NUMA node will perform the work.
This is mostly a revert of fcc6354365015 except that the workqueue
gained the WQ_UNBOUND flag.

------------------>8----

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1369f76..94d6c50c4e796 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4162,7 +4162,7 @@ void __init rcu_init(void)
 	/* Create workqueue for expedited GPs and for Tree SRCU. */
 	rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0);
 	WARN_ON(!rcu_gp_wq);
-	rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
+	rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
 	WARN_ON(!rcu_par_gp_wq);
 }
 
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 0b2c2ad69629c..a0486414edb40 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -472,7 +472,6 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
 static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 				     smp_call_func_t func)
 {
-	int cpu;
 	struct rcu_node *rnp;
 
 	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("reset"));
@@ -494,13 +493,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 			continue;
 		}
 		INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
-		preempt_disable();
-		cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
-		/* If all offline, queue the work on an unbound CPU. */
-		if (unlikely(cpu > rnp->grphi))
-			cpu = WORK_CPU_UNBOUND;
-		queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
-		preempt_enable();
+		queue_work_on(rnp->grplo, rcu_par_gp_wq, &rnp->rew.rew_work);
 		rnp->exp_need_flush = true;
 	}
 

> 
> 							Thanx, Paul

Sebastian

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-10-12 18:41           ` Sebastian Andrzej Siewior
@ 2018-10-13 13:48             ` Paul E. McKenney
  2018-10-15 14:42               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2018-10-13 13:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tejun Heo, linux-kernel, Boqun Feng, Peter Zijlstra,
	Aneesh Kumar K.V, tglx, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Fri, Oct 12, 2018 at 08:41:15PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-09-19 15:11:40 [-0700], Paul E. McKenney wrote:
> > On Wed, Sep 19, 2018 at 01:55:21PM -0700, Tejun Heo wrote:
> > > Unbound workqueue is NUMA-affine by default, so using it by default
> > > might not harm anything.
> > 
> > OK, so the above workaround would function correctly on -rt, thank you!
> > 
> > Sebastian, is there a counterpart to CONFIG_PREEMPT_RT already in
> > mainline?  If so, I would be happy to make mainline safe for -rt.
> 
> Now that I stumbled upon it again, I noticed that I never replied here.
> Sorry for that.
> 
> Let me summarize:
> sync_rcu_exp_select_cpus() used
> 	queue_work_on(rnp->grplo, rcu_par_gp_wq, &rnp->rew.rew_work);
> 
> which was changed in commit fcc6354365015 ("rcu: Make expedited GPs
> handle CPU 0 being offline"). The commit claims that this is needed in
> case CPU0 is offline so it tries to find another CPU starting with the
> possible offline CPU. It might cross to another NUMA node but that is
> not really a problem, it just tries to remain on the "local NUMA node".
> 
> After that commit, the code invokes queue_work_on() within a
> preempt_disable() section because it can't use cpus_read_lock() to
> ensure that the CPU won't go offline in the middle of testing (and
> preempt_disable() does the trick).
> For RT reasons I would like to get rid of queue_work_on() within the
> preempt_disable() section.
> Tejun said that enqueueing an item on an unbound workqueue is NUMA
> affine.
> 
> I figured out that enqueueing an item on an offline CPU is not a problem
> and it will pop up on a "random" CPU which means it will be carried out
> asap and will not wait until the CPU gets back online. Therefore I don't
> understand the commit fcc6354365015.
> 
> May I suggest the following change? It will enqueue the work item on
> the first CPU on the NUMA node and the "unbound" part of the work queue
> ensures that a CPU of that NUMA node will perform the work.
> This is mostly a revert of fcc6354365015 except that the workqueue
> gained the WQ_UNBOUND flag.

My concern would be that it would queue it by default for the current
CPU, which would serialize the processing, losing the concurrency of
grace-period initialization.  But that was a long time ago, and perhaps
workqueues have changed.  So, have you tried using rcuperf to test the
update performance on a large system?

If this change does not impact performance on an rcuperf test, why not
send me a formal patch with Signed-off-by and commit log (including
performance testing results)?  I will then apply it, it will be exposed
to 0day and eventually -next testing, and if no problems arise, it will go
to mainline, perhaps as soon as the merge window after the upcoming one.

Fair enough?

							Thanx, Paul

> ------------------>8----
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0b760c1369f76..94d6c50c4e796 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4162,7 +4162,7 @@ void __init rcu_init(void)
>  	/* Create workqueue for expedited GPs and for Tree SRCU. */
>  	rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0);
>  	WARN_ON(!rcu_gp_wq);
> -	rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
> +	rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
>  	WARN_ON(!rcu_par_gp_wq);
>  }
>  
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 0b2c2ad69629c..a0486414edb40 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -472,7 +472,6 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
>  static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  				     smp_call_func_t func)
>  {
> -	int cpu;
>  	struct rcu_node *rnp;
>  
>  	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("reset"));
> @@ -494,13 +493,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  			continue;
>  		}
>  		INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
> -		preempt_disable();
> -		cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
> -		/* If all offline, queue the work on an unbound CPU. */
> -		if (unlikely(cpu > rnp->grphi))
> -			cpu = WORK_CPU_UNBOUND;
> -		queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
> -		preempt_enable();
> +		queue_work_on(rnp->grplo, rcu_par_gp_wq, &rnp->rew.rew_work);
>  		rnp->exp_need_flush = true;
>  	}
>  
> 
> > 
> > 							Thanx, Paul
> 
> Sebastian
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-10-13 13:48             ` Paul E. McKenney
@ 2018-10-15 14:42               ` Sebastian Andrzej Siewior
  2018-10-15 15:07                 ` Boqun Feng
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-15 14:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tejun Heo, linux-kernel, Boqun Feng, Peter Zijlstra,
	Aneesh Kumar K.V, tglx, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On 2018-10-13 06:48:13 [-0700], Paul E. McKenney wrote:
> 
> My concern would be that it would queue it by default for the current
> CPU, which would serialize the processing, losing the concurrency of
> grace-period initialization.  But that was a long time ago, and perhaps
> workqueues have changed. 

but the code here is always using the first CPU of a NUMA node or did I
miss something?

> So, have you tried using rcuperf to test the
> update performance on a large system?

Something like this:
| tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuperf --configs TREE
| ----Start batch 1: Mon Oct 15 12:46:13 CEST 2018
| TREE 142: Starting build. …
| …
| Average grace-period duration: 19952.7 microseconds
| Minimum grace-period duration: 9004.51
| 50th percentile grace-period duration: 19998.3
| 90th percentile grace-period duration: 24994.4
| 99th percentile grace-period duration: 30002.1
| Maximum grace-period duration: 42998.2
| Grace periods: 6560 Batches: 209 Ratio: 31.3876
| Computed from rcuperf printk output.
| Completed in 27 vs. 1800
| 
| Average grace-period duration: 18700 microseconds
| Minimum grace-period duration: 7069.2
| 50th percentile grace-period duration: 18987.5
| 90th percentile grace-period duration: 22997
| 99th percentile grace-period duration: 28944.7
| Maximum grace-period duration: 36994.5
| Grace periods: 6551 Batches: 209 Ratio: 31.3445
| Computed from rcuperf printk output.
| Completed in 27 vs. 1800

two runs patched followed by two runs without the patched:
| Average grace-period duration: 19423.3 microseconds
| Minimum grace-period duration: 8006.93
| 50th percentile grace-period duration: 19002.8
| 90th percentile grace-period duration: 23997.5
| 99th percentile grace-period duration: 29995.7
| Maximum grace-period duration: 37997.9
| Grace periods: 6526 Batches: 208 Ratio: 31.375
| Computed from rcuperf printk output.
| Completed in 27 vs. 1800
| 
| Average grace-period duration: 18822.4 microseconds
| Minimum grace-period duration: 8348.15
| 50th percentile grace-period duration: 18996.9
| 90th percentile grace-period duration: 23000
| 99th percentile grace-period duration: 27999.5
| Maximum grace-period duration: 39001.9
| Grace periods: 6540 Batches: 209 Ratio: 31.2919
| Computed from rcuperf printk output.
| Completed in 27 vs. 1800

I think difference might come from cpufreq on the host. But in general,
this is what you have been asking for or did you want to see something
on the host (or an additional argument to the script)?

> If this change does not impact performance on an rcuperf test, why not
> send me a formal patch with Signed-off-by and commit log (including
> performance testing results)?  I will then apply it, it will be exposed
> to 0day and eventually -next testing, and if no problems arise, it will go
> to mainline, perhaps as soon as the merge window after the upcoming one.
> 
> Fair enough?
sure.

> 							Thanx, Paul
> 

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-10-15 14:42               ` Sebastian Andrzej Siewior
@ 2018-10-15 15:07                 ` Boqun Feng
  2018-10-15 15:09                   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2018-10-15 15:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Paul E. McKenney, Tejun Heo, linux-kernel, Peter Zijlstra,
	Aneesh Kumar K.V, tglx, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 3584 bytes --]

Hi, Sebastian

On Mon, Oct 15, 2018 at 04:42:17PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-10-13 06:48:13 [-0700], Paul E. McKenney wrote:
> > 
> > My concern would be that it would queue it by default for the current
> > CPU, which would serialize the processing, losing the concurrency of
> > grace-period initialization.  But that was a long time ago, and perhaps
> > workqueues have changed. 
> 
> but the code here is always using the first CPU of a NUMA node or did I
> miss something?
> 

The thing is the original way is to pick one CPU for a *RCU* node to
run the grace-period work, but with your proposal, if a RCU node is
smaller than a NUMA node (having fewer CPUs), we could end up having two
grace-period works running on one CPU. I think that's Paul's concern.

Regards,
Boqun

> > So, have you tried using rcuperf to test the
> > update performance on a large system?
> 
> Something like this:
> | tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuperf --configs TREE
> | ----Start batch 1: Mon Oct 15 12:46:13 CEST 2018
> | TREE 142: Starting build. …
> | …
> | Average grace-period duration: 19952.7 microseconds
> | Minimum grace-period duration: 9004.51
> | 50th percentile grace-period duration: 19998.3
> | 90th percentile grace-period duration: 24994.4
> | 99th percentile grace-period duration: 30002.1
> | Maximum grace-period duration: 42998.2
> | Grace periods: 6560 Batches: 209 Ratio: 31.3876
> | Computed from rcuperf printk output.
> | Completed in 27 vs. 1800
> | 
> | Average grace-period duration: 18700 microseconds
> | Minimum grace-period duration: 7069.2
> | 50th percentile grace-period duration: 18987.5
> | 90th percentile grace-period duration: 22997
> | 99th percentile grace-period duration: 28944.7
> | Maximum grace-period duration: 36994.5
> | Grace periods: 6551 Batches: 209 Ratio: 31.3445
> | Computed from rcuperf printk output.
> | Completed in 27 vs. 1800
> 
> two runs patched followed by two runs without the patched:
> | Average grace-period duration: 19423.3 microseconds
> | Minimum grace-period duration: 8006.93
> | 50th percentile grace-period duration: 19002.8
> | 90th percentile grace-period duration: 23997.5
> | 99th percentile grace-period duration: 29995.7
> | Maximum grace-period duration: 37997.9
> | Grace periods: 6526 Batches: 208 Ratio: 31.375
> | Computed from rcuperf printk output.
> | Completed in 27 vs. 1800
> | 
> | Average grace-period duration: 18822.4 microseconds
> | Minimum grace-period duration: 8348.15
> | 50th percentile grace-period duration: 18996.9
> | 90th percentile grace-period duration: 23000
> | 99th percentile grace-period duration: 27999.5
> | Maximum grace-period duration: 39001.9
> | Grace periods: 6540 Batches: 209 Ratio: 31.2919
> | Computed from rcuperf printk output.
> | Completed in 27 vs. 1800
> 
> I think difference might come from cpufreq on the host. But in general,
> this is what you have been asking for or did you want to see something
> on the host (or an additional argument to the script)?
> 
> > If this change does not impact performance on an rcuperf test, why not
> > send me a formal patch with Signed-off-by and commit log (including
> > performance testing results)?  I will then apply it, it will be exposed
> > to 0day and eventually -next testing, and if no problems arise, it will go
> > to mainline, perhaps as soon as the merge window after the upcoming one.
> > 
> > Fair enough?
> sure.
> 
> > 							Thanx, Paul
> > 
> 
> Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-10-15 15:07                 ` Boqun Feng
@ 2018-10-15 15:09                   ` Sebastian Andrzej Siewior
  2018-10-15 15:33                     ` Boqun Feng
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-15 15:09 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Tejun Heo, linux-kernel, Peter Zijlstra,
	Aneesh Kumar K.V, tglx, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On 2018-10-15 23:07:15 [+0800], Boqun Feng wrote:
> Hi, Sebastian
Hi Boqun,

> On Mon, Oct 15, 2018 at 04:42:17PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2018-10-13 06:48:13 [-0700], Paul E. McKenney wrote:
> > > 
> > > My concern would be that it would queue it by default for the current
> > > CPU, which would serialize the processing, losing the concurrency of
> > > grace-period initialization.  But that was a long time ago, and perhaps
> > > workqueues have changed. 
> > 
> > but the code here is always using the first CPU of a NUMA node or did I
> > miss something?
> > 
> 
> The thing is the original way is to pick one CPU for a *RCU* node to
> run the grace-period work, but with your proposal, if a RCU node is
> smaller than a NUMA node (having fewer CPUs), we could end up having two
> grace-period works running on one CPU. I think that's Paul's concern.

Ah. Okay. From what I observed, the RCU nodes and NUMA nodes were 1:1
here. Noted.
Given that I can enqueue a work item on an offlined CPU I don't see why
commit fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being
offline") should make a difference. Any objections to just revert it?

> Regards,
> Boqun

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-10-15 15:09                   ` Sebastian Andrzej Siewior
@ 2018-10-15 15:33                     ` Boqun Feng
  2018-10-15 16:36                       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2018-10-15 15:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Paul E. McKenney, Tejun Heo, linux-kernel, Peter Zijlstra,
	Aneesh Kumar K.V, tglx, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 2091 bytes --]

On Mon, Oct 15, 2018 at 05:09:03PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-10-15 23:07:15 [+0800], Boqun Feng wrote:
> > Hi, Sebastian
> Hi Boqun,
> 
> > On Mon, Oct 15, 2018 at 04:42:17PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2018-10-13 06:48:13 [-0700], Paul E. McKenney wrote:
> > > > 
> > > > My concern would be that it would queue it by default for the current
> > > > CPU, which would serialize the processing, losing the concurrency of
> > > > grace-period initialization.  But that was a long time ago, and perhaps
> > > > workqueues have changed. 
> > > 
> > > but the code here is always using the first CPU of a NUMA node or did I
> > > miss something?
> > > 
> > 
> > The thing is the original way is to pick one CPU for a *RCU* node to
> > run the grace-period work, but with your proposal, if a RCU node is
> > smaller than a NUMA node (having fewer CPUs), we could end up having two
> > grace-period works running on one CPU. I think that's Paul's concern.
> 
> Ah. Okay. From what I observed, the RCU nodes and NUMA nodes were 1:1
> here. Noted.

Ok, in that case, there should be no significant performance difference.

> Given that I can enqueue a work item on an offlined CPU I don't see why
> commit fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being
> offline") should make a difference. Any objections to just revert it?

Well, that commit is trying to avoid queue a work on an offlined CPU,
because according to workqueue API, it's the users' responsibility to
make sure the CPU is online when a work item enqueued. So there is a
difference ;-)

But I don't have any objection to revert it with your proposal, since
yours is more simple and straight-forward, and doesn't perform worse if
NUMA nodes and RCU nodes have one-to-one corresponding.

Besides, I think even if we observe some performance difference in the
future, the best way to solve that is to make workqueue have a more
fine-grained affine group than a NUMA node.

Regards,
Boqun

> 
> > Regards,
> > Boqun
> 
> Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask
  2018-10-15 15:33                     ` Boqun Feng
@ 2018-10-15 16:36                       ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2018-10-15 16:36 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Sebastian Andrzej Siewior, Tejun Heo, linux-kernel,
	Peter Zijlstra, Aneesh Kumar K.V, tglx, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, riel

On Mon, Oct 15, 2018 at 11:33:48PM +0800, Boqun Feng wrote:
> On Mon, Oct 15, 2018 at 05:09:03PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2018-10-15 23:07:15 [+0800], Boqun Feng wrote:
> > > Hi, Sebastian
> > Hi Boqun,
> > 
> > > On Mon, Oct 15, 2018 at 04:42:17PM +0200, Sebastian Andrzej Siewior wrote:
> > > > On 2018-10-13 06:48:13 [-0700], Paul E. McKenney wrote:
> > > > > 
> > > > > My concern would be that it would queue it by default for the current
> > > > > CPU, which would serialize the processing, losing the concurrency of
> > > > > grace-period initialization.  But that was a long time ago, and perhaps
> > > > > workqueues have changed. 
> > > > 
> > > > but the code here is always using the first CPU of a NUMA node or did I
> > > > miss something?
> > > > 
> > > 
> > > The thing is the original way is to pick one CPU for a *RCU* node to
> > > run the grace-period work, but with your proposal, if a RCU node is
> > > smaller than a NUMA node (having fewer CPUs), we could end up having two
> > > grace-period works running on one CPU. I think that's Paul's concern.
> > 
> > Ah. Okay. From what I observed, the RCU nodes and NUMA nodes were 1:1
> > here. Noted.
> 
> Ok, in that case, there should be no significant performance difference.
> 
> > Given that I can enqueue a work item on an offlined CPU I don't see why
> > commit fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being
> > offline") should make a difference. Any objections to just revert it?
> 
> Well, that commit is trying to avoid queue a work on an offlined CPU,
> because according to workqueue API, it's the users' responsibility to
> make sure the CPU is online when a work item enqueued. So there is a
> difference ;-)
> 
> But I don't have any objection to revert it with your proposal, since
> yours is more simple and straight-forward, and doesn't perform worse if
> NUMA nodes and RCU nodes have one-to-one corresponding.
> 
> Besides, I think even if we observe some performance difference in the
> future, the best way to solve that is to make workqueue have a more
> fine-grained affine group than a NUMA node.

Please keep in mind that there are computer systems out there with NUMA
topologies that are completely incompatible with RCU's rcu_node tree
structure.  According to Rik van Riel (CCed), there are even systems
out there where CPU 0 is on socket 0, CPU 1 on socket 1, and so on,
round-robining across the sockets.

The system that convinced me that the additional constraints on
the workqueue's CPU had CPUs 0-7 on one socket and CPUs 8-15 on the
second, and with CPUs 0-15 sharing the same leaf rcu_node structure.
Unfortunately, I no longer have useful access to this system (dead disk
drive, apparently).

I am not saying that Sebastian's approach is bad, rather that it does
need to be tested on a variety of systems.

							Thanx, Paul

> Regards,
> Boqun
> 
> > 
> > > Regards,
> > > Boqun
> > 
> > Sebastian



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-10-15 16:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10 13:56 [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online_mask Sebastian Andrzej Siewior
2018-09-11 16:05 ` Paul E. McKenney
2018-09-11 16:21   ` Sebastian Andrzej Siewior
2018-09-11 17:02     ` Paul E. McKenney
2018-09-19 20:55       ` Tejun Heo
2018-09-19 22:11         ` Paul E. McKenney
2018-10-12 18:41           ` Sebastian Andrzej Siewior
2018-10-13 13:48             ` Paul E. McKenney
2018-10-15 14:42               ` Sebastian Andrzej Siewior
2018-10-15 15:07                 ` Boqun Feng
2018-10-15 15:09                   ` Sebastian Andrzej Siewior
2018-10-15 15:33                     ` Boqun Feng
2018-10-15 16:36                       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).