* srcu: use cpu_online() instead custom check
@ 2018-11-01 23:12 Paul E. McKenney
2018-11-08 16:38 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-01 23:12 UTC (permalink / raw)
To: bigeasy; +Cc: linux-rt-users, linux-kernel
> The current check via srcu_online is slightly racy because after looking
> at srcu_online there could be an interrupt that interrupted us long
> enough until the CPU we checked against went offline.
I don't see how this can happen, even in -rt. The call to
srcu_offline_cpu() happens very early in the CPU removal process,
which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched)
in sched_cpu_deactivate() would wait for the interrupt to complete.
And for the enclosing preempt_disable region to complete.
Or is getting rid of that preempt_disable region the real reason for
this change?
> An alternative would be to hold the hotplug rwsem (so the CPUs don't
> change their state) and then check based on cpu_online() if we queue it
> on a specific CPU or not. queue_work_on() itself can handle if something
> is enqueued on an offline CPU but a timer which is enqueued on an offline
> CPU won't fire until the CPU is back online.
>
> I am not sure if the removal in rcu_init() is okay or not. I assume that
> SRCU won't enqueue a work item before SRCU is up and ready.
That was the case before the current merge window, but use of call_srcu()
by tracing means that SRCU needs to be able to deal with call_srcu()
long before any initialization has happened. The actual callbacks
won't be invoked until much later, after the scheduler and workqueues
are completely up and running, but call_srcu() can be invoked very early.
But I am not seeing any removal in rcu_init() in this patch, so I might
be missing something.
Thanx, Paul
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 6c9866a854b1..3428a40a813e 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -38,6 +38,7 @@
> #include <linux/delay.h>
> #include <linux/module.h>
> #include <linux/srcu.h>
> +#include <linux/cpu.h>
>
> #include "rcu.h"
> #include "rcu_segcblist.h"
> @@ -458,21 +459,6 @@ static void srcu_gp_start(struct srcu_struct *sp)
> WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
> }
>
> -/*
> - * Track online CPUs to guide callback workqueue placement.
> - */
> -DEFINE_PER_CPU(bool, srcu_online);
> -
> -void srcu_online_cpu(unsigned int cpu)
> -{
> - WRITE_ONCE(per_cpu(srcu_online, cpu), true);
> -}
> -
> -void srcu_offline_cpu(unsigned int cpu)
> -{
> - WRITE_ONCE(per_cpu(srcu_online, cpu), false);
> -}
> -
> /*
> * Place the workqueue handler on the specified CPU if online, otherwise
> * just run it whereever. This is useful for placing workqueue handlers
> @@ -484,12 +470,12 @@ static bool srcu_queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
> {
> bool ret;
>
> - preempt_disable();
> - if (READ_ONCE(per_cpu(srcu_online, cpu)))
> + cpus_read_lock();
> + if (cpu_online(cpu))
> ret = queue_delayed_work_on(cpu, wq, dwork, delay);
> else
> ret = queue_delayed_work(wq, dwork, delay);
> - preempt_enable();
> + cpus_read_unlock();
> return ret;
> }
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6868ef417e9f..e2e68250009b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3767,8 +3767,6 @@ int rcutree_online_cpu(unsigned int cpu)
> rnp->ffmask |= rdp->grpmask;
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }
> - if (IS_ENABLED(CONFIG_TREE_SRCU))
> - srcu_online_cpu(cpu);
> if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> return 0; /* Too early in boot for scheduler work. */
> sync_sched_exp_online_cleanup(cpu);
> @@ -3796,8 +3794,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> }
>
> rcutree_affinity_setting(cpu, cpu);
> - if (IS_ENABLED(CONFIG_TREE_SRCU))
> - srcu_offline_cpu(cpu);
> return 0;
> }
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: srcu: use cpu_online() instead custom check
2018-11-01 23:12 srcu: use cpu_online() instead custom check Paul E. McKenney
@ 2018-11-08 16:38 ` Sebastian Andrzej Siewior
2018-11-08 17:10 ` Paul E. McKenney
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-08 16:38 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: linux-rt-users, linux-kernel
On 2018-11-01 16:12:28 [-0700], Paul E. McKenney wrote:
> > The current check via srcu_online is slightly racy because after looking
> > at srcu_online there could be an interrupt that interrupted us long
> > enough until the CPU we checked against went offline.
>
> I don't see how this can happen, even in -rt. The call to
> srcu_offline_cpu() happens very early in the CPU removal process,
> which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched)
> in sched_cpu_deactivate() would wait for the interrupt to complete.
> And for the enclosing preempt_disable region to complete.
Is this again a hidden RCU detail that preempt_disable() on CPU4 is
enough to ensure that CPU2 does not get marked offline between?
> Or is getting rid of that preempt_disable region the real reason for
> this change?
Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
But looking further, that preempt_disable() while looking at online CPUs
didn't look good.
> > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > change their state) and then check based on cpu_online() if we queue it
> > on a specific CPU or not. queue_work_on() itself can handle if something
> > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > CPU won't fire until the CPU is back online.
> >
> > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > SRCU won't enqueue a work item before SRCU is up and ready.
>
> That was the case before the current merge window, but use of call_srcu()
> by tracing means that SRCU needs to be able to deal with call_srcu()
> long before any initialization has happened. The actual callbacks
> won't be invoked until much later, after the scheduler and workqueues
> are completely up and running, but call_srcu() can be invoked very early.
>
> But I am not seeing any removal in rcu_init() in this patch, so I might
> be missing something.
The description is not up-to-date. There was this hunk:
|@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
| for_each_online_cpu(cpu) {
| rcutree_prepare_cpu(cpu);
| rcu_cpu_starting(cpu);
|- if (IS_ENABLED(CONFIG_TREE_SRCU))
|- srcu_online_cpu(cpu);
| }
| }
which got removed in v4.16.
> Thanx, Paul
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: srcu: use cpu_online() instead custom check
2018-11-08 16:38 ` Sebastian Andrzej Siewior
@ 2018-11-08 17:10 ` Paul E. McKenney
2018-11-08 17:46 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-08 17:10 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, linux-kernel
On Thu, Nov 08, 2018 at 05:38:51PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-11-01 16:12:28 [-0700], Paul E. McKenney wrote:
> > > The current check via srcu_online is slightly racy because after looking
> > > at srcu_online there could be an interrupt that interrupted us long
> > > enough until the CPU we checked against went offline.
> >
> > I don't see how this can happen, even in -rt. The call to
> > srcu_offline_cpu() happens very early in the CPU removal process,
> > which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched)
> > in sched_cpu_deactivate() would wait for the interrupt to complete.
> > And for the enclosing preempt_disable region to complete.
>
> Is this again a hidden RCU detail that preempt_disable() on CPU4 is
> enough to ensure that CPU2 does not get marked offline between?
The call_rcu_sched parameter to synchronize_rcu_mult() makes this work.
This synchronize_rcu_mult() call is in sched_cpu_deactivate(), so it
is a hidden sched/RCU detail, I guess.
Or am I missing the point of your question?
> > Or is getting rid of that preempt_disable region the real reason for
> > this change?
>
> Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
> But looking further, that preempt_disable() while looking at online CPUs
> didn't look good.
That is why it is invoked from the very early CPU-hotplug notifier. That
early in the process, the preempt_disable() does prevent the current CPU
from being taken offline twice: Once due to synchronize_rcu_mult(), and
once due to the stop-machine call.
> > > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > > change their state) and then check based on cpu_online() if we queue it
> > > on a specific CPU or not. queue_work_on() itself can handle if something
> > > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > > CPU won't fire until the CPU is back online.
> > >
> > > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > > SRCU won't enqueue a work item before SRCU is up and ready.
> >
> > That was the case before the current merge window, but use of call_srcu()
> > by tracing means that SRCU needs to be able to deal with call_srcu()
> > long before any initialization has happened. The actual callbacks
> > won't be invoked until much later, after the scheduler and workqueues
> > are completely up and running, but call_srcu() can be invoked very early.
> >
> > But I am not seeing any removal in rcu_init() in this patch, so I might
> > be missing something.
>
> The description is not up-to-date. There was this hunk:
> |@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
> | for_each_online_cpu(cpu) {
> | rcutree_prepare_cpu(cpu);
> | rcu_cpu_starting(cpu);
> |- if (IS_ENABLED(CONFIG_TREE_SRCU))
> |- srcu_online_cpu(cpu);
> | }
> | }
>
> which got removed in v4.16.
Ah! Here is the current rcu_init() code:
for_each_online_cpu(cpu) {
rcutree_prepare_cpu(cpu);
rcu_cpu_starting(cpu);
rcutree_online_cpu(cpu);
}
And rcutree_online_cpu() calls srcu_online_cpu() when CONFIG_TREE_SRCU
is enabled, so no need for the direct call from rcu_init().
Thanx, Paul
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: srcu: use cpu_online() instead custom check
2018-11-08 17:10 ` Paul E. McKenney
@ 2018-11-08 17:46 ` Sebastian Andrzej Siewior
2018-11-08 18:05 ` Paul E. McKenney
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-08 17:46 UTC (permalink / raw)
To: Paul E. McKenney, tglx; +Cc: linux-rt-users, linux-kernel
On 2018-11-08 09:10:24 [-0800], Paul E. McKenney wrote:
> > Is this again a hidden RCU detail that preempt_disable() on CPU4 is
> > enough to ensure that CPU2 does not get marked offline between?
>
> The call_rcu_sched parameter to synchronize_rcu_mult() makes this work.
> This synchronize_rcu_mult() call is in sched_cpu_deactivate(), so it
> is a hidden sched/RCU detail, I guess.
>
> Or am I missing the point of your question?
No, this answers it.
> > > Or is getting rid of that preempt_disable region the real reason for
> > > this change?
> >
> > Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
> > But looking further, that preempt_disable() while looking at online CPUs
> > didn't look good.
>
> That is why it is invoked from the very early CPU-hotplug notifier. That
> early in the process, the preempt_disable() does prevent the current CPU
> from being taken offline twice: Once due to synchronize_rcu_mult(), and
> once due to the stop-machine call.
:)
> > The description is not up-to-date. There was this hunk:
> > |@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
> > | for_each_online_cpu(cpu) {
> > | rcutree_prepare_cpu(cpu);
> > | rcu_cpu_starting(cpu);
> > |- if (IS_ENABLED(CONFIG_TREE_SRCU))
> > |- srcu_online_cpu(cpu);
> > | }
> > | }
> >
> > which got removed in v4.16.
>
> Ah! Here is the current rcu_init() code:
>
> for_each_online_cpu(cpu) {
> rcutree_prepare_cpu(cpu);
> rcu_cpu_starting(cpu);
> rcutree_online_cpu(cpu);
> }
>
> And rcutree_online_cpu() calls srcu_online_cpu() when CONFIG_TREE_SRCU
> is enabled, so no need for the direct call from rcu_init().
So if a CPU goes down, the timer gets migrated to another CPU. If the
CPU is already offline the timer can be programmed and nothing happens.
If timer_add_on() would return an error we could have fallback code.
Looking at the users of queue_delayed_work_on() there are only two using
it really (the others are using smp_processor_id()) and one of them is
using get_online_cpus().
It does not look like there a lot of users affected. Would be reasonable
to avoid adding timers to offlined CPUs?
> Thanx, Paul
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: srcu: use cpu_online() instead custom check
2018-11-08 17:46 ` Sebastian Andrzej Siewior
@ 2018-11-08 18:05 ` Paul E. McKenney
2018-11-08 18:16 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-08 18:05 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: tglx, linux-rt-users, linux-kernel
On Thu, Nov 08, 2018 at 06:46:55PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-11-08 09:10:24 [-0800], Paul E. McKenney wrote:
> > > Is this again a hidden RCU detail that preempt_disable() on CPU4 is
> > > enough to ensure that CPU2 does not get marked offline between?
> >
> > The call_rcu_sched parameter to synchronize_rcu_mult() makes this work.
> > This synchronize_rcu_mult() call is in sched_cpu_deactivate(), so it
> > is a hidden sched/RCU detail, I guess.
> >
> > Or am I missing the point of your question?
>
> No, this answers it.
>
> > > > Or is getting rid of that preempt_disable region the real reason for
> > > > this change?
> > >
> > > Well, that preempt_disable() + queue_(delayed_)work() does not work -RT.
> > > But looking further, that preempt_disable() while looking at online CPUs
> > > didn't look good.
> >
> > That is why it is invoked from the very early CPU-hotplug notifier. That
> > early in the process, the preempt_disable() does prevent the current CPU
> > from being taken offline twice: Once due to synchronize_rcu_mult(), and
> > once due to the stop-machine call.
>
> :)
>
> > > The description is not up-to-date. There was this hunk:
> > > |@@ -4236,8 +4232,6 @@ void __init rcu_init(void)
> > > | for_each_online_cpu(cpu) {
> > > | rcutree_prepare_cpu(cpu);
> > > | rcu_cpu_starting(cpu);
> > > |- if (IS_ENABLED(CONFIG_TREE_SRCU))
> > > |- srcu_online_cpu(cpu);
> > > | }
> > > | }
> > >
> > > which got removed in v4.16.
> >
> > Ah! Here is the current rcu_init() code:
> >
> > for_each_online_cpu(cpu) {
> > rcutree_prepare_cpu(cpu);
> > rcu_cpu_starting(cpu);
> > rcutree_online_cpu(cpu);
> > }
> >
> > And rcutree_online_cpu() calls srcu_online_cpu() when CONFIG_TREE_SRCU
> > is enabled, so no need for the direct call from rcu_init().
>
> So if a CPU goes down, the timer gets migrated to another CPU. If the
> CPU is already offline the timer can be programmed and nothing happens.
> If timer_add_on() would return an error we could have fallback code.
> Looking at the users of queue_delayed_work_on() there are only two using
> it really (the others are using smp_processor_id()) and one of them is
> using get_online_cpus().
> It does not look like there a lot of users affected. Would be reasonable
> to avoid adding timers to offlined CPUs?
Just to make sure I understand, this is the call to queue_delayed_work_on()
from srcu_queue_delayed_work_on(), right?
And if I am guessing correctly, you would like to get rid of the
constraint requiring CPUHP_RCUTREE_PREP to precede CPUHP_TIMERS_PREPARE?
If so, the swait_event_idle_timeout_exclusive() in rcu_gp_fqs_loop()
in kernel/rcu/tree.c also requires this ordering. There are probably
other pieces of code needing this.
Plus the reason for running this on a specific CPU is that the workqueue
item is processing that CPU's per-CPU variables, including invoking that
CPU's callbacks. The item is srcu_invoke_callbacks().
Thanx, Paul
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: srcu: use cpu_online() instead custom check
2018-11-08 18:05 ` Paul E. McKenney
@ 2018-11-08 18:16 ` Sebastian Andrzej Siewior
2018-11-08 18:48 ` Paul E. McKenney
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-08 18:16 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: tglx, linux-rt-users, linux-kernel
On 2018-11-08 10:05:17 [-0800], Paul E. McKenney wrote:
> Just to make sure I understand, this is the call to queue_delayed_work_on()
> from srcu_queue_delayed_work_on(), right?
correct.
> And if I am guessing correctly, you would like to get rid of the
> constraint requiring CPUHP_RCUTREE_PREP to precede CPUHP_TIMERS_PREPARE?
no, my problem is the preempt_disable() around queue_delayed_work_on().
If the CPUs goes offline _after_ queue_delayed_work_on() then the timer
gets migrated and work item should show up on another CPU.
If the CPU is offline at queue_delayed_work_on() time then the timer
gets enqueued and won't fire until the CPU is back online and I *think*
that is the reason behind this "is CPU online" check.
> If so, the swait_event_idle_timeout_exclusive() in rcu_gp_fqs_loop()
> in kernel/rcu/tree.c also requires this ordering. There are probably
> other pieces of code needing this.
>
> Plus the reason for running this on a specific CPU is that the workqueue
> item is processing that CPU's per-CPU variables, including invoking that
> CPU's callbacks. The item is srcu_invoke_callbacks().
The SRCU callback is invoking per-CPU variables? Like this_cpu_ptr()?
But if the CPU is offline then you fallback to queue_delayed_work()?
> Thanx, Paul
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: srcu: use cpu_online() instead custom check
2018-11-08 18:16 ` Sebastian Andrzej Siewior
@ 2018-11-08 18:48 ` Paul E. McKenney
0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2018-11-08 18:48 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: tglx, linux-rt-users, linux-kernel
On Thu, Nov 08, 2018 at 07:16:30PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-11-08 10:05:17 [-0800], Paul E. McKenney wrote:
> > Just to make sure I understand, this is the call to queue_delayed_work_on()
> > from srcu_queue_delayed_work_on(), right?
>
> correct.
>
> > And if I am guessing correctly, you would like to get rid of the
> > constraint requiring CPUHP_RCUTREE_PREP to precede CPUHP_TIMERS_PREPARE?
>
> no, my problem is the preempt_disable() around queue_delayed_work_on().
> If the CPUs goes offline _after_ queue_delayed_work_on() then the timer
> gets migrated and work item should show up on another CPU.
> If the CPU is offline at queue_delayed_work_on() time then the timer
> gets enqueued and won't fire until the CPU is back online and I *think*
> that is the reason behind this "is CPU online" check.
The main reason for the "is CPU online" check was that workqueues would
very rarely splat when I tried running without it. I did report this
to Tejun. You could try just calling queue_delayed_work_on() without
the check, but this is a 10s of hours rcutorture splat if I remember
correctly.
> > If so, the swait_event_idle_timeout_exclusive() in rcu_gp_fqs_loop()
> > in kernel/rcu/tree.c also requires this ordering. There are probably
> > other pieces of code needing this.
> >
> > Plus the reason for running this on a specific CPU is that the workqueue
> > item is processing that CPU's per-CPU variables, including invoking that
> > CPU's callbacks. The item is srcu_invoke_callbacks().
>
> The SRCU callback is invoking per-CPU variables? Like this_cpu_ptr()?
> But if the CPU is offline then you fallback to queue_delayed_work()?
Yes, yes, and yes. ;-)
The callbacks are queued on a per-CPU basis.
Thanx, Paul
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-11-08 18:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01 23:12 srcu: use cpu_online() instead custom check Paul E. McKenney
2018-11-08 16:38 ` Sebastian Andrzej Siewior
2018-11-08 17:10 ` Paul E. McKenney
2018-11-08 17:46 ` Sebastian Andrzej Siewior
2018-11-08 18:05 ` Paul E. McKenney
2018-11-08 18:16 ` Sebastian Andrzej Siewior
2018-11-08 18:48 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).