linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* srcu: use cpu_online() instead custom check
@ 2018-11-01 23:12 Paul E. McKenney
  2018-11-08 16:38 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-01 23:12 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users, linux-kernel

> The current check via srcu_online is slightly racy because after looking
> at srcu_online there could be an interrupt that interrupted us long
> enough until the CPU we checked against went offline.

I don't see how this can happen, even in -rt.  The call to
srcu_offline_cpu() happens very early in the CPU removal process,
which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched)
in sched_cpu_deactivate() would wait for the interrupt to complete.
And for the enclosing preempt_disable region to complete.

Or is getting rid of that preempt_disable region the real reason for
this change?

> An alternative would be to hold the hotplug rwsem (so the CPUs don't
> change their state) and then check based on cpu_online() if we queue it
> on a specific CPU or not. queue_work_on() itself can handle if something
> is enqueued on an offline CPU but a timer which is enqueued on an offline
> CPU won't fire until the CPU is back online.
> 
> I am not sure if the removal in rcu_init() is okay or not. I assume that
> SRCU won't enqueue a work item before SRCU is up and ready.

That was the case before the current merge window, but use of call_srcu()
by tracing means that SRCU needs to be able to deal with call_srcu()
long before any initialization has happened.  The actual callbacks
won't be invoked until much later, after the scheduler and workqueues
are completely up and running, but call_srcu() can be invoked very early.

But I am not seeing any removal in rcu_init() in this patch, so I might
be missing something.

							Thanx, Paul

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 6c9866a854b1..3428a40a813e 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -38,6 +38,7 @@
>  #include <linux/delay.h>
>  #include <linux/module.h>
>  #include <linux/srcu.h>
> +#include <linux/cpu.h>
>  
>  #include "rcu.h"
>  #include "rcu_segcblist.h"
> @@ -458,21 +459,6 @@ static void srcu_gp_start(struct srcu_struct *sp)
>  	WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
>  }
>  
> -/*
> - * Track online CPUs to guide callback workqueue placement.
> - */
> -DEFINE_PER_CPU(bool, srcu_online);
> -
> -void srcu_online_cpu(unsigned int cpu)
> -{
> -	WRITE_ONCE(per_cpu(srcu_online, cpu), true);
> -}
> -
> -void srcu_offline_cpu(unsigned int cpu)
> -{
> -	WRITE_ONCE(per_cpu(srcu_online, cpu), false);
> -}
> -
>  /*
>   * Place the workqueue handler on the specified CPU if online, otherwise
>   * just run it whereever.  This is useful for placing workqueue handlers
> @@ -484,12 +470,12 @@ static bool srcu_queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
>  {
>  	bool ret;
>  
> -	preempt_disable();
> -	if (READ_ONCE(per_cpu(srcu_online, cpu)))
> +	cpus_read_lock();
> +	if (cpu_online(cpu))
>  		ret = queue_delayed_work_on(cpu, wq, dwork, delay);
>  	else
>  		ret = queue_delayed_work(wq, dwork, delay);
> -	preempt_enable();
> +	cpus_read_unlock();
>  	return ret;
>  }
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6868ef417e9f..e2e68250009b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3767,8 +3767,6 @@ int rcutree_online_cpu(unsigned int cpu)
>  		rnp->ffmask |= rdp->grpmask;
>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	}
> -	if (IS_ENABLED(CONFIG_TREE_SRCU))
> -		srcu_online_cpu(cpu);
>  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>  		return 0; /* Too early in boot for scheduler work. */
>  	sync_sched_exp_online_cleanup(cpu);
> @@ -3796,8 +3794,6 @@ int rcutree_offline_cpu(unsigned int cpu)
>  	}
>  
>  	rcutree_affinity_setting(cpu, cpu);
> -	if (IS_ENABLED(CONFIG_TREE_SRCU))
> -		srcu_offline_cpu(cpu);
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: srcu: use cpu_online() instead custom check
  2018-11-01 23:12 srcu: use cpu_online() instead custom check Paul E. McKenney
@ 2018-11-08 16:38 ` Sebastian Andrzej Siewior
  2018-11-08 17:10   ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-08 16:38 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-rt-users, linux-kernel

On 2018-11-01 16:12:28 [-0700], Paul E. McKenney wrote:
> > The current check via srcu_online is slightly racy because after looking
> > at srcu_online there could be an interrupt that interrupted us long
> > enough until the CPU we checked against went offline.
> 
> I don't see how this can happen, even in -rt.  The call to
> srcu_offline_cpu() happens very early in the CPU removal process,
> which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched)
> in sched_cpu_deactivate() would wait for the interrupt to complete.
> And for the enclosing preempt_disable region to complete.

Is this again a hidden RCU detail that preempt_disable() on CPU4 is
enough to ensure that CPU2 does not get marked offline between?

> Or is getting rid of that preempt_disable region the real reason for
> this change?

Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
But looking further, that preempt_disable() while looking at online CPUs
didn't look good.

> > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > change their state) and then check based on cpu_online() if we queue it
> > on a specific CPU or not. queue_work_on() itself can handle if something
> > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > CPU won't fire until the CPU is back online.
> > 
> > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > SRCU won't enqueue a work item before SRCU is up and ready.
> 
> That was the case before the current merge window, but use of call_srcu()
> by tracing means that SRCU needs to be able to deal with call_srcu()
> long before any initialization has happened.  The actual callbacks
> won't be invoked until much later, after the scheduler and workqueues
> are completely up and running, but call_srcu() can be invoked very early.
> 
> But I am not seeing any removal in rcu_init() in this patch, so I might
> be missing something.

The description is not up-to-date. There was this hunk:
|@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
|       for_each_online_cpu(cpu) {
|               rcutree_prepare_cpu(cpu);
|               rcu_cpu_starting(cpu);
|-              if (IS_ENABLED(CONFIG_TREE_SRCU))
|-                      srcu_online_cpu(cpu);
|       }
| }

which got removed in v4.16.

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: srcu: use cpu_online() instead custom check
  2018-11-08 16:38 ` Sebastian Andrzej Siewior
@ 2018-11-08 17:10   ` Paul E. McKenney
  2018-11-08 17:46     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-08 17:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, linux-kernel

On Thu, Nov 08, 2018 at 05:38:51PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-11-01 16:12:28 [-0700], Paul E. McKenney wrote:
> > > The current check via srcu_online is slightly racy because after looking
> > > at srcu_online there could be an interrupt that interrupted us long
> > > enough until the CPU we checked against went offline.
> > 
> > I don't see how this can happen, even in -rt.  The call to
> > srcu_offline_cpu() happens very early in the CPU removal process,
> > which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched)
> > in sched_cpu_deactivate() would wait for the interrupt to complete.
> > And for the enclosing preempt_disable region to complete.
> 
> Is this again a hidden RCU detail that preempt_disable() on CPU4 is
> enough to ensure that CPU2 does not get marked offline between?

The call_rcu_sched parameter to synchronize_rcu_mult() makes this work.
This synchronize_rcu_mult() call is in sched_cpu_deactivate(), so it
is a hidden sched/RCU detail, I guess.

Or am I missing the point of your question?

> > Or is getting rid of that preempt_disable region the real reason for
> > this change?
> 
> Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
> But looking further, that preempt_disable() while looking at online CPUs
> didn't look good.

That is why it is invoked from the very early CPU-hotplug notifier.  That
early in the process, the preempt_disable() does prevent the current CPU
from being taken offline twice:  Once due to synchronize_rcu_mult(), and
once due to the stop-machine call.

> > > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > > change their state) and then check based on cpu_online() if we queue it
> > > on a specific CPU or not. queue_work_on() itself can handle if something
> > > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > > CPU won't fire until the CPU is back online.
> > > 
> > > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > > SRCU won't enqueue a work item before SRCU is up and ready.
> > 
> > That was the case before the current merge window, but use of call_srcu()
> > by tracing means that SRCU needs to be able to deal with call_srcu()
> > long before any initialization has happened.  The actual callbacks
> > won't be invoked until much later, after the scheduler and workqueues
> > are completely up and running, but call_srcu() can be invoked very early.
> > 
> > But I am not seeing any removal in rcu_init() in this patch, so I might
> > be missing something.
> 
> The description is not up-to-date. There was this hunk:
> |@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
> |       for_each_online_cpu(cpu) {
> |               rcutree_prepare_cpu(cpu);
> |               rcu_cpu_starting(cpu);
> |-              if (IS_ENABLED(CONFIG_TREE_SRCU))
> |-                      srcu_online_cpu(cpu);
> |       }
> | }
> 
> which got removed in v4.16.

Ah!  Here is the current rcu_init() code:

	for_each_online_cpu(cpu) {
		rcutree_prepare_cpu(cpu);
		rcu_cpu_starting(cpu);
		rcutree_online_cpu(cpu);
	}

And rcutree_online_cpu() calls srcu_online_cpu() when CONFIG_TREE_SRCU
is enabled, so no need for the direct call from rcu_init().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: srcu: use cpu_online() instead custom check
  2018-11-08 17:10   ` Paul E. McKenney
@ 2018-11-08 17:46     ` Sebastian Andrzej Siewior
  2018-11-08 18:05       ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-08 17:46 UTC (permalink / raw)
  To: Paul E. McKenney, tglx; +Cc: linux-rt-users, linux-kernel

On 2018-11-08 09:10:24 [-0800], Paul E. McKenney wrote:
> > Is this again a hidden RCU detail that preempt_disable() on CPU4 is
> > enough to ensure that CPU2 does not get marked offline between?
> 
> The call_rcu_sched parameter to synchronize_rcu_mult() makes this work.
> This synchronize_rcu_mult() call is in sched_cpu_deactivate(), so it
> is a hidden sched/RCU detail, I guess.
> 
> Or am I missing the point of your question?

No, this answers it.

> > > Or is getting rid of that preempt_disable region the real reason for
> > > this change?
> > 
> > Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
> > But looking further, that preempt_disable() while looking at online CPUs
> > didn't look good.
> 
> That is why it is invoked from the very early CPU-hotplug notifier.  That
> early in the process, the preempt_disable() does prevent the current CPU
> from being taken offline twice:  Once due to synchronize_rcu_mult(), and
> once due to the stop-machine call.

:)
 
> > The description is not up-to-date. There was this hunk:
> > |@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
> > |       for_each_online_cpu(cpu) {
> > |               rcutree_prepare_cpu(cpu);
> > |               rcu_cpu_starting(cpu);
> > |-              if (IS_ENABLED(CONFIG_TREE_SRCU))
> > |-                      srcu_online_cpu(cpu);
> > |       }
> > | }
> > 
> > which got removed in v4.16.
> 
> Ah!  Here is the current rcu_init() code:
> 
> 	for_each_online_cpu(cpu) {
> 		rcutree_prepare_cpu(cpu);
> 		rcu_cpu_starting(cpu);
> 		rcutree_online_cpu(cpu);
> 	}
> 
> And rcutree_online_cpu() calls srcu_online_cpu() when CONFIG_TREE_SRCU
> is enabled, so no need for the direct call from rcu_init().

So if a CPU goes down, the timer gets migrated to another CPU. If the
CPU is already offline the timer can be programmed and nothing happens.
If timer_add_on() would return an error we could have fallback code.
Looking at the users of queue_delayed_work_on() there are only two using
it really (the others are using smp_processor_id()) and one of them is
using get_online_cpus().
It does not look like there a lot of users affected. Would be reasonable
to avoid adding timers to offlined CPUs?

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: srcu: use cpu_online() instead custom check
  2018-11-08 17:46     ` Sebastian Andrzej Siewior
@ 2018-11-08 18:05       ` Paul E. McKenney
  2018-11-08 18:16         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-08 18:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: tglx, linux-rt-users, linux-kernel

On Thu, Nov 08, 2018 at 06:46:55PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-11-08 09:10:24 [-0800], Paul E. McKenney wrote:
> > > Is this again a hidden RCU detail that preempt_disable() on CPU4 is
> > > enough to ensure that CPU2 does not get marked offline between?
> > 
> > The call_rcu_sched parameter to synchronize_rcu_mult() makes this work.
> > This synchronize_rcu_mult() call is in sched_cpu_deactivate(), so it
> > is a hidden sched/RCU detail, I guess.
> > 
> > Or am I missing the point of your question?
> 
> No, this answers it.
> 
> > > > Or is getting rid of that preempt_disable region the real reason for
> > > > this change?
> > > 
> > > Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
> > > But looking further, that preempt_disable() while looking at online CPUs
> > > didn't look good.
> > 
> > That is why it is invoked from the very early CPU-hotplug notifier.  That
> > early in the process, the preempt_disable() does prevent the current CPU
> > from being taken offline twice:  Once due to synchronize_rcu_mult(), and
> > once due to the stop-machine call.
> 
> :)
>  
> > > The description is not up-to-date. There was this hunk:
> > > |@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
> > > |       for_each_online_cpu(cpu) {
> > > |               rcutree_prepare_cpu(cpu);
> > > |               rcu_cpu_starting(cpu);
> > > |-              if (IS_ENABLED(CONFIG_TREE_SRCU))
> > > |-                      srcu_online_cpu(cpu);
> > > |       }
> > > | }
> > > 
> > > which got removed in v4.16.
> > 
> > Ah!  Here is the current rcu_init() code:
> > 
> > 	for_each_online_cpu(cpu) {
> > 		rcutree_prepare_cpu(cpu);
> > 		rcu_cpu_starting(cpu);
> > 		rcutree_online_cpu(cpu);
> > 	}
> > 
> > And rcutree_online_cpu() calls srcu_online_cpu() when CONFIG_TREE_SRCU
> > is enabled, so no need for the direct call from rcu_init().
> 
> So if a CPU goes down, the timer gets migrated to another CPU. If the
> CPU is already offline the timer can be programmed and nothing happens.
> If timer_add_on() would return an error we could have fallback code.
> Looking at the users of queue_delayed_work_on() there are only two using
> it really (the others are using smp_processor_id()) and one of them is
> using get_online_cpus().
> It does not look like there a lot of users affected. Would be reasonable
> to avoid adding timers to offlined CPUs?

Just to make sure I understand, this is the call to queue_delayed_work_on()
from srcu_queue_delayed_work_on(), right?

And if I am guessing correctly, you would like to get rid of the
constraint requiring CPUHP_RCUTREE_PREP to precede CPUHP_TIMERS_PREPARE?
If so, the swait_event_idle_timeout_exclusive() in rcu_gp_fqs_loop()
in kernel/rcu/tree.c also requires this ordering.  There are probably
other pieces of code needing this.

Plus the reason for running this on a specific CPU is that the workqueue
item is processing that CPU's per-CPU variables, including invoking that
CPU's callbacks.  The item is srcu_invoke_callbacks().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: srcu: use cpu_online() instead custom check
  2018-11-08 18:05       ` Paul E. McKenney
@ 2018-11-08 18:16         ` Sebastian Andrzej Siewior
  2018-11-08 18:48           ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-08 18:16 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: tglx, linux-rt-users, linux-kernel

On 2018-11-08 10:05:17 [-0800], Paul E. McKenney wrote:
> Just to make sure I understand, this is the call to queue_delayed_work_on()
> from srcu_queue_delayed_work_on(), right?

correct.

> And if I am guessing correctly, you would like to get rid of the
> constraint requiring CPUHP_RCUTREE_PREP to precede CPUHP_TIMERS_PREPARE?

no, my problem is the preempt_disable() around queue_delayed_work_on().
If the CPUs goes offline _after_ queue_delayed_work_on() then the timer
gets migrated and work item should show up on another CPU. 
If the CPU is offline at queue_delayed_work_on() time then the timer
gets enqueued and won't fire until the CPU is back online and I *think*
that is the reason behind this "is CPU online" check.

> If so, the swait_event_idle_timeout_exclusive() in rcu_gp_fqs_loop()
> in kernel/rcu/tree.c also requires this ordering.  There are probably
> other pieces of code needing this.
> 
> Plus the reason for running this on a specific CPU is that the workqueue
> item is processing that CPU's per-CPU variables, including invoking that
> CPU's callbacks.  The item is srcu_invoke_callbacks().

The SRCU callback is invoking per-CPU variables? Like this_cpu_ptr()?
But if the CPU is offline then you fallback to queue_delayed_work()?

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: srcu: use cpu_online() instead custom check
  2018-11-08 18:16         ` Sebastian Andrzej Siewior
@ 2018-11-08 18:48           ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-08 18:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: tglx, linux-rt-users, linux-kernel

On Thu, Nov 08, 2018 at 07:16:30PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-11-08 10:05:17 [-0800], Paul E. McKenney wrote:
> > Just to make sure I understand, this is the call to queue_delayed_work_on()
> > from srcu_queue_delayed_work_on(), right?
> 
> correct.
> 
> > And if I am guessing correctly, you would like to get rid of the
> > constraint requiring CPUHP_RCUTREE_PREP to precede CPUHP_TIMERS_PREPARE?
> 
> no, my problem is the preempt_disable() around queue_delayed_work_on().
> If the CPUs goes offline _after_ queue_delayed_work_on() then the timer
> gets migrated and work item should show up on another CPU. 
> If the CPU is offline at queue_delayed_work_on() time then the timer
> gets enqueued and won't fire until the CPU is back online and I *think*
> that is the reason behind this "is CPU online" check.

The main reason for the "is CPU online" check was that workqueues would
very rarely splat when I tried running without it.  I did report this
to Tejun.  You could try just calling queue_delayed_work_on() without
the check, but this is a 10s of hours rcutorture splat if I remember
correctly.

> > If so, the swait_event_idle_timeout_exclusive() in rcu_gp_fqs_loop()
> > in kernel/rcu/tree.c also requires this ordering.  There are probably
> > other pieces of code needing this.
> > 
> > Plus the reason for running this on a specific CPU is that the workqueue
> > item is processing that CPU's per-CPU variables, including invoking that
> > CPU's callbacks.  The item is srcu_invoke_callbacks().
> 
> The SRCU callback is invoking per-CPU variables? Like this_cpu_ptr()?
> But if the CPU is offline then you fallback to queue_delayed_work()?

Yes, yes, and yes.  ;-)

The callbacks are queued on a per-CPU basis.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-08 18:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01 23:12 srcu: use cpu_online() instead custom check Paul E. McKenney
2018-11-08 16:38 ` Sebastian Andrzej Siewior
2018-11-08 17:10   ` Paul E. McKenney
2018-11-08 17:46     ` Sebastian Andrzej Siewior
2018-11-08 18:05       ` Paul E. McKenney
2018-11-08 18:16         ` Sebastian Andrzej Siewior
2018-11-08 18:48           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).