All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] (Was: sched: start stopper early)
@ 2015-10-10 18:52 Oleg Nesterov
  2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Oleg Nesterov @ 2015-10-10 18:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

To avoid the confusion, this has nothing to do with "stop_machine"
changes we discuss in another thread, but

On 10/09, Oleg Nesterov wrote:
>
> >  	case CPU_ONLINE:
> > +		stop_machine_unpark(cpu);
> >  		/*
> >  		 * At this point a starting CPU has marked itself as online via
> >  		 * set_cpu_online(). But it might not yet have marked itself
> > @@ -5337,7 +5340,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
> >  		 * Thus, fall-through and help the starting CPU along.
> >  		 */
> >  	case CPU_DOWN_FAILED:
> > -		set_cpu_active((long)hcpu, true);
> > +		set_cpu_active(cpu, true);
>
> On a second thought, we can't do this (and your initial change has
> the same problem).
>
> We can not wakeup it before set_cpu_active(). This can lead to the
> same problem fixed by dd9d3843755da95f6 "sched: Fix cpu_active_mask/
> cpu_online_mask race".

OTOH, I don't understand why do we actually need this fix... Or, iow
I don't really understand the cpu_active() checks in select_fallback_rq().

Looks like we have some strange arch/ which has the "unsafe" online &&
!active window, but then it is not clear why it is safe to mark it active
in sched_cpu_active(CPU_ONLINE). Confused.

And I am even more confused by the fact that select_fallback_rq()
checks cpu_active(), but select_task_rq() doesn't. This can't be right
in any case.

Oleg.


 kernel/sched/core.c |   41 +++++++++++++++++++++--------------------
 1 files changed, 21 insertions(+), 20 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-10 18:52 [PATCH 0/3] (Was: sched: start stopper early) Oleg Nesterov
@ 2015-10-10 18:53 ` Oleg Nesterov
  2015-10-11 18:04   ` Oleg Nesterov
  2015-10-12 12:16   ` Peter Zijlstra
  2015-10-10 18:53 ` [PATCH 2/3] sched: change select_fallback_rq() to use for_each_cpu_and() Oleg Nesterov
  2015-10-10 18:53 ` [PATCH 3/3] sched: don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS Oleg Nesterov
  2 siblings, 2 replies; 12+ messages in thread
From: Oleg Nesterov @ 2015-10-10 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

I do not understand the cpu_active() check in select_fallback_rq().
x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix
cpu_active_mask/cpu_online_mask race" documents the fact that on any
architecture we can ignore !active starting from CPU_ONLINE stage.

But any possible reason why do we need this check in "fallback" must
equally apply to select_task_rq().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sched/core.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5fe9086..a2ef0cf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1297,6 +1297,11 @@ void kick_process(struct task_struct *p)
 }
 EXPORT_SYMBOL_GPL(kick_process);
 
+static inline bool cpu_allowed(int cpu)
+{
+	return cpu_online(cpu) && cpu_active(cpu);
+}
+
 /*
  * ->cpus_allowed is protected by both rq->lock and p->pi_lock
  */
@@ -1317,9 +1322,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 
 		/* Look for allowed, online CPU in same node. */
 		for_each_cpu(dest_cpu, nodemask) {
-			if (!cpu_online(dest_cpu))
-				continue;
-			if (!cpu_active(dest_cpu))
+			if (!cpu_allowed(dest_cpu))
 				continue;
 			if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 				return dest_cpu;
@@ -1329,9 +1332,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 	for (;;) {
 		/* Any allowed, online CPU? */
 		for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) {
-			if (!cpu_online(dest_cpu))
-				continue;
-			if (!cpu_active(dest_cpu))
+			if (!cpu_allowed(dest_cpu))
 				continue;
 			goto out;
 		}
@@ -1390,7 +1391,7 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	 *   not worry about this generic constraint ]
 	 */
 	if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) ||
-		     !cpu_online(cpu)))
+		     !cpu_allowed(cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);
 
 	return cpu;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] sched: change select_fallback_rq() to use for_each_cpu_and()
  2015-10-10 18:52 [PATCH 0/3] (Was: sched: start stopper early) Oleg Nesterov
  2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
@ 2015-10-10 18:53 ` Oleg Nesterov
  2015-10-10 18:53 ` [PATCH 3/3] sched: don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS Oleg Nesterov
  2 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2015-10-10 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

We can make "cpumask *nodemask" more local and use for_each_cpu_and()
to simplify this code a little bit.

And NUMA_NO_NODE looks better than "-1".

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sched/core.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a2ef0cf..e4fa6be 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1308,24 +1308,22 @@ static inline bool cpu_allowed(int cpu)
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
 	int nid = cpu_to_node(cpu);
-	const struct cpumask *nodemask = NULL;
 	enum { cpuset, possible, fail } state = cpuset;
 	int dest_cpu;
 
 	/*
 	 * If the node that the cpu is on has been offlined, cpu_to_node()
-	 * will return -1. There is no cpu on the node, and we should
-	 * select the cpu on the other node.
+	 * will return NUMA_NO_NODE. There is no cpu on the node, and we
+	 * should select the cpu on the other node.
 	 */
-	if (nid != -1) {
-		nodemask = cpumask_of_node(nid);
+	if (nid != NUMA_NO_NODE) {
+		const struct cpumask *nodemask = cpumask_of_node(nid);
 
 		/* Look for allowed, online CPU in same node. */
-		for_each_cpu(dest_cpu, nodemask) {
+		for_each_cpu_and(dest_cpu, nodemask, tsk_cpus_allowed(p)) {
 			if (!cpu_allowed(dest_cpu))
 				continue;
-			if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
-				return dest_cpu;
+			return dest_cpu;
 		}
 	}
 
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] sched: don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS
  2015-10-10 18:52 [PATCH 0/3] (Was: sched: start stopper early) Oleg Nesterov
  2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
  2015-10-10 18:53 ` [PATCH 2/3] sched: change select_fallback_rq() to use for_each_cpu_and() Oleg Nesterov
@ 2015-10-10 18:53 ` Oleg Nesterov
  2015-10-20  9:35   ` [tip:sched/core] sched: Don't scan all-offline -> cpus_allowed " tip-bot for Oleg Nesterov
  2 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2015-10-10 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

If CONFIG_CPUSETS=n then "case cpuset" changes the state and runs
the already failed for_each_cpu() loop again for no reason.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sched/core.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e4fa6be..b421e24 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1335,13 +1335,15 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			goto out;
 		}
 
+		/* No more Mr. Nice Guy. */
 		switch (state) {
 		case cpuset:
-			/* No more Mr. Nice Guy. */
-			cpuset_cpus_allowed_fallback(p);
-			state = possible;
-			break;
-
+			if (IS_ENABLED(CONFIG_CPUSETS)) {
+				cpuset_cpus_allowed_fallback(p);
+				state = possible;
+				break;
+			}
+			/* fall-through */
 		case possible:
 			do_set_cpus_allowed(p, cpu_possible_mask);
 			state = fail;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
@ 2015-10-11 18:04   ` Oleg Nesterov
  2015-10-11 18:57     ` Thomas Gleixner
  2015-10-12 12:16   ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2015-10-11 18:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

On 10/10, Oleg Nesterov wrote:
>
> I do not understand the cpu_active() check in select_fallback_rq().
> x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix
> cpu_active_mask/cpu_online_mask race" documents the fact that on any
> architecture we can ignore !active starting from CPU_ONLINE stage.
>
> But any possible reason why do we need this check in "fallback" must
> equally apply to select_task_rq().

And I still think this is true, select_task_rq() and select_fallback_rq()
should use the same check in any case...

> +static inline bool cpu_allowed(int cpu)
> +{
> +	return cpu_online(cpu) && cpu_active(cpu);
> +}
...
> @@ -1390,7 +1391,7 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
>  	 *   not worry about this generic constraint ]
>  	 */
>  	if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) ||
> -		     !cpu_online(cpu)))
> +		     !cpu_allowed(cpu)))
>  		cpu = select_fallback_rq(task_cpu(p), p);

But as Fengguang reports (thanks a lot!) this change is wrong. It leads
to another BUG_ON(td->cpu != smp_processor_id()) before ht->park(td->cpu)
in smpboot_thread_fn().

I should have realized this. smpboot_park_threads() is called after
CPU_DOWN_PREPARE. And this can break other PF_NO_SETAFFINITY threads.

Perhaps I am totally confused, but to me this looks like another
indication that select_fallback_rq() should not check cpu_active(),
or at least this needs some changes...

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-11 18:04   ` Oleg Nesterov
@ 2015-10-11 18:57     ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2015-10-11 18:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, heiko.carstens, Tejun Heo, Ingo Molnar,
	Rik van Riel, Vitaly Kuznetsov, linux-kernel

On Sun, 11 Oct 2015, Oleg Nesterov wrote:
> On 10/10, Oleg Nesterov wrote:
> >
> > I do not understand the cpu_active() check in select_fallback_rq().
> > x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix
> > cpu_active_mask/cpu_online_mask race" documents the fact that on any
> > architecture we can ignore !active starting from CPU_ONLINE stage.
> >
> > But any possible reason why do we need this check in "fallback" must
> > equally apply to select_task_rq().
> 
> And I still think this is true, select_task_rq() and select_fallback_rq()
> should use the same check in any case...
> 
> > +static inline bool cpu_allowed(int cpu)
> > +{
> > +	return cpu_online(cpu) && cpu_active(cpu);
> > +}
> ...
> > @@ -1390,7 +1391,7 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
> >  	 *   not worry about this generic constraint ]
> >  	 */
> >  	if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) ||
> > -		     !cpu_online(cpu)))
> > +		     !cpu_allowed(cpu)))
> >  		cpu = select_fallback_rq(task_cpu(p), p);
> 
> But as Fengguang reports (thanks a lot!) this change is wrong. It leads
> to another BUG_ON(td->cpu != smp_processor_id()) before ht->park(td->cpu)
> in smpboot_thread_fn().
> 
> I should have realized this. smpboot_park_threads() is called after
> CPU_DOWN_PREPARE. And this can break other PF_NO_SETAFFINITY threads.
> 
> Perhaps I am totally confused, but to me this looks like another
> indication that select_fallback_rq() should not check cpu_active(),
> or at least this needs some changes...

Well, the real issue is that the bringup and teardown of cpus is
completely assymetric. And as long as we do not fix that, we'll run
into that kind of trouble over and over. We just add more duct tape to
the cpu hotplug code.

The solution to the problem at hand is to have two separate decisions
for threads to be scheduled on a upcoming or downgoing cpu.

We need to allow per cpu kernel threads to be scheduled after the
initial bringup is done and on teardown we must allow them to be
scheduled to the point where the cpu is actually not longer capable to
do so.

For everything else the decision must be at the end of the
bringup. From that point on random threads can be scheduled on the
cpu. On teardown we need to prevent that as the first thing.

We are currently resurrecting the hotplug revamp patch series, which I
sent out a couple of years ago. The goal of this is to make it
completely symmetric and some more to address exactly this kind of
trouble.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
  2015-10-11 18:04   ` Oleg Nesterov
@ 2015-10-12 12:16   ` Peter Zijlstra
  2015-10-12 17:34     ` Oleg Nesterov
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2015-10-12 12:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

On Sat, Oct 10, 2015 at 08:53:09PM +0200, Oleg Nesterov wrote:
> I do not understand the cpu_active() check in select_fallback_rq().
> x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix
> cpu_active_mask/cpu_online_mask race" documents the fact that on any
> architecture we can ignore !active starting from CPU_ONLINE stage.
> 
> But any possible reason why do we need this check in "fallback" must
> equally apply to select_task_rq().

So the reason, from vague memory, is that we want to allow per-cpu
threads to start/stop before/after active.

active 'should' really only govern load-balancer bits or so.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-12 12:16   ` Peter Zijlstra
@ 2015-10-12 17:34     ` Oleg Nesterov
  2015-10-14 15:00       ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2015-10-12 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

On 10/12, Peter Zijlstra wrote:
>
> On Sat, Oct 10, 2015 at 08:53:09PM +0200, Oleg Nesterov wrote:
> > I do not understand the cpu_active() check in select_fallback_rq().
> > x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix
> > cpu_active_mask/cpu_online_mask race" documents the fact that on any
> > architecture we can ignore !active starting from CPU_ONLINE stage.
> >
> > But any possible reason why do we need this check in "fallback" must
> > equally apply to select_task_rq().
>
> So the reason, from vague memory, is that we want to allow per-cpu
> threads to start/stop before/after active.

I simply can't understand... To me it looks as if we can simply remove
the cpu_active() check in select_fallback_rq().

If we race with cpu_down(), cpu_active() is cleared by sched_cpu_inactive()
which is CPU_PRI_SCHED_INACTIVE  = INT_MIN + 1 priority, so it seems that
only cpuset_cpu_inactive() can be called after that and this check is
obviously racy anyway.

As for cpu_up(), I do not see any arch which does set_cpu_active(true),
they all do set_cpu_online(true) which also marks it active.

So why we can't simply remove select_fallback_rq()->cpu_active() ?

> active 'should' really only govern load-balancer bits or so.

OK, I don't understand the usage of cpu_active_mask in kernel/sched/,
and of course I could easily miss something else. But I doubt very
much this check can save us from something bad simply because it is
racy.

Yes, we also have synchronize_sched() after CPU_DOWN_PREPARE, but
the only thing we do before stop_machine() is smpboot_park_threads().
So this can help (say) set_cpus_allowed_ptr() which uses active_mask,
but I don't see how this can connect to ttwu paths.

And again. If there is some reason we can not do this, say, because
ipi to non-active CPU can trigger a warning, or something else, then
we can hit the same problem because select_task_rq() does not check
cpu_active(). The kernel threads like stoppers/workers are probably
fine in any case, but userspace can trigger the race with cpu_up/down.


In short, I am all confused ;)

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-12 17:34     ` Oleg Nesterov
@ 2015-10-14 15:00       ` Peter Zijlstra
  2015-10-14 20:05         ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2015-10-14 15:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

On Mon, Oct 12, 2015 at 07:34:02PM +0200, Oleg Nesterov wrote:
> On 10/12, Peter Zijlstra wrote:
> >
> > On Sat, Oct 10, 2015 at 08:53:09PM +0200, Oleg Nesterov wrote:
> > > I do not understand the cpu_active() check in select_fallback_rq().
> > > x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix
> > > cpu_active_mask/cpu_online_mask race" documents the fact that on any
> > > architecture we can ignore !active starting from CPU_ONLINE stage.
> > >
> > > But any possible reason why do we need this check in "fallback" must
> > > equally apply to select_task_rq().
> >
> > So the reason, from vague memory, is that we want to allow per-cpu
> > threads to start/stop before/after active.
> 
> I simply can't understand... To me it looks as if we can simply remove
> the cpu_active() check in select_fallback_rq().
> 
> If we race with cpu_down(), cpu_active() is cleared by sched_cpu_inactive()
> which is CPU_PRI_SCHED_INACTIVE  = INT_MIN + 1 priority, so it seems that
> only cpuset_cpu_inactive() can be called after that and this check is
> obviously racy anyway.

Racy yes; however select_task_rq() is called with locks held, therefore
preemption disabled. This serializes us against the sync_sched() in
cpu_down().

And note that smpboot_park_threads() runs until after the sync_sched().
So you cannot add cpu_active() to select_task_rq() for it must allow
per-cpu tasks to remain on the cpu.

Also, I think smpboot_park_threads() is called too early, we can still
run random tasks at that point which might rely on having the per-cpu
tasks around. But we cannot run it later because once the stopper thread
runs, the per-cpu threads cannot schedule to park themselves :/

Lets keep an eye on Thomas' rework to see if this gets sorted.

> So why we can't simply remove select_fallback_rq()->cpu_active() ?

It would allow migration onto a CPU that's known to be going away. That
doesn't make sense.

> > active 'should' really only govern load-balancer bits or so.
> 
> OK, I don't understand the usage of cpu_active_mask in kernel/sched/,
> and of course I could easily miss something else. But I doubt very
> much this check can save us from something bad simply because it is
> racy.

But safely so; if we observe active, it must stay 'active' until we
enable preemption again.

The whole point of active is to limit the load-balancer; such that we
can rebuild the sched-domains after the fact. We used to have to rebuild
to the sched-domains early, which got into trouble (forgot details,
again).

And we had to have a new mask, because online was too late, there was
(forgot details) a state where we were still online but new are not
welcome.

Also, it makes sense to stop accepting new tasks before you take out the
old ones.

> Yes, we also have synchronize_sched() after CPU_DOWN_PREPARE, but
> the only thing we do before stop_machine() is smpboot_park_threads().
> So this can help (say) set_cpus_allowed_ptr() which uses active_mask,
> but I don't see how this can connect to ttwu paths.

Say you have a task A running on CPU4, it goes to sleep. We unplug CPU4.
We then commence unplugging CPU3, concurrently we wake A. A finds that
its old CPU (4) is no longer online and ends up in fallback looking for
a new one.

Without the cpu_active() test in fallback, we could place A on CPU3,
which is on its way down, just not quite dead.

Even if this is not strictly buggy its a daft thing to do and tempting
fate.

> And again. If there is some reason we can not do this, say, because
> ipi to non-active CPU can trigger a warning, or something else, then
> we can hit the same problem because select_task_rq() does not check
> cpu_active(). The kernel threads like stoppers/workers are probably
> fine in any case, but userspace can trigger the race with cpu_up/down.

So the only 'race' is observing active while we're stuck in
sync_sched(), which is meant to be safe. It also provides us with a
known spot after which we're sure no new tasks will come onto our dying
CPU.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-14 15:00       ` Peter Zijlstra
@ 2015-10-14 20:05         ` Oleg Nesterov
  2015-10-14 20:35           ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2015-10-14 20:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

On 10/14, Peter Zijlstra wrote:
>
> On Mon, Oct 12, 2015 at 07:34:02PM +0200, Oleg Nesterov wrote:
> >
> > I simply can't understand... To me it looks as if we can simply remove
> > the cpu_active() check in select_fallback_rq().
> >
> > If we race with cpu_down(), cpu_active() is cleared by sched_cpu_inactive()
> > which is CPU_PRI_SCHED_INACTIVE  = INT_MIN + 1 priority, so it seems that
> > only cpuset_cpu_inactive() can be called after that and this check is
> > obviously racy anyway.
>
> Racy yes; however select_task_rq() is called with locks held, therefore
> preemption disabled. This serializes us against the sync_sched() in
> cpu_down().

Yes, I even mentioned this below. And this only means that if you see
cpu_active() == T you know that smpboot_park_threads() can't be called
until preempt_disable().

cpu_active() can become false right after the check, preempt_disable()
can't protect from __cpu_notify(CPU_DOWN_PREPARE).

> And note that smpboot_park_threads() runs until after the sync_sched().
> So you cannot add cpu_active() to select_task_rq() for it must allow
> per-cpu tasks to remain on the cpu.

Yes, yes, this is why this patch triggers BUG_ON() before ->park() in
smpboot_thread_fn().

> > So why we can't simply remove select_fallback_rq()->cpu_active() ?
>
> It would allow migration onto a CPU that's known to be going away. That
> doesn't make sense.

But this is not that bad? This is very unlikely and CPU_DYING will
migrate the woken task again.

> > OK, I don't understand the usage of cpu_active_mask in kernel/sched/,
> > and of course I could easily miss something else. But I doubt very
> > much this check can save us from something bad simply because it is
> > racy.
>
> But safely so; if we observe active, it must stay 'active' until we
> enable preemption again.

I don't undertand what "stay active" actually means here. cpu_active()
is not stable. But probably this doesn't matter.

> > Yes, we also have synchronize_sched() after CPU_DOWN_PREPARE, but
> > the only thing we do before stop_machine() is smpboot_park_threads().
> > So this can help (say) set_cpus_allowed_ptr() which uses active_mask,
> > but I don't see how this can connect to ttwu paths.
>
> Say you have a task A running on CPU4, it goes to sleep. We unplug CPU4.
> We then commence unplugging CPU3, concurrently we wake A. A finds that
> its old CPU (4) is no longer online and ends up in fallback looking for
> a new one.
>
> Without the cpu_active() test in fallback, we could place A on CPU3,
> which is on its way down, just not quite dead.

The same can happen with the cpu_active() check we currently have.
And note again that CPU_PRI_SCHED_INACTIVE == INT_MIN + 1. When
sched_cpu_inactive() clears cpu_active() all other callbacks (except
cpuset_cpu_inactive() were already fired.

> Even if this is not strictly buggy its a daft thing to do and tempting
> fate.

See above...

And if we remove this check from select_fallback_rq() we can revert
dd9d3843755da95f6 "sched: Fix cpu_active_mask/cpu_online_mask race".

And revert 875ebe94 "powerpc/smp: Wait until secondaries are active & online".
And a1307bba "s390/smp: wait until secondaries are active & online".

> > And again. If there is some reason we can not do this, say, because
> > ipi to non-active CPU can trigger a warning, or something else, then
> > we can hit the same problem because select_task_rq() does not check
> > cpu_active(). The kernel threads like stoppers/workers are probably
> > fine in any case, but userspace can trigger the race with cpu_up/down.
>
> So the only 'race' is observing active while we're stuck in
> sync_sched(),

Why? select_task_rq() will see cpu_online() == T after sync_sched().

> which is meant to be safe.

Yes, because that damn cpu_active() check doesn't look strictly necessary ;)
Or I misunderstood.

> It also provides us with a
> known spot after which we're sure no new tasks will come onto our dying
> CPU.

You mean that ttwu(task) can't migrate this task to the dying CPU
after synchronize_rcu() and before stop_machine(), this is true.

OK, lets keep this check if you dislike the idea to remove it. But to me
the very fact that select_task_rq() and select_fallback_rq() use different
checks looks just wrong. Even if everything happens to work.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
  2015-10-14 20:05         ` Oleg Nesterov
@ 2015-10-14 20:35           ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-10-14 20:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Vitaly Kuznetsov, linux-kernel

On Wed, Oct 14, 2015 at 10:05:16PM +0200, Oleg Nesterov wrote:
> Yes, because that damn cpu_active() check doesn't look strictly necessary ;)
> Or I misunderstood.

How about we sit down and have a hard look after Thomas is done
revamping hotplug? I don't want to go pour over hotplug code that is
guaranteed to change.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip:sched/core] sched: Don't scan all-offline -> cpus_allowed twice if !CONFIG_CPUSETS
  2015-10-10 18:53 ` [PATCH 3/3] sched: don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS Oleg Nesterov
@ 2015-10-20  9:35   ` tip-bot for Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-10-20  9:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, tj, riel, linux-kernel, hpa, paulmck, peterz, tglx,
	vkuznets, torvalds, akpm, oleg, mingo

Commit-ID:  e73e85f0593832aa583b252f9a16cf90ed6d30fa
Gitweb:     http://git.kernel.org/tip/e73e85f0593832aa583b252f9a16cf90ed6d30fa
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Sat, 10 Oct 2015 20:53:15 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 20 Oct 2015 10:25:57 +0200

sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS

If CONFIG_CPUSETS=n then "case cpuset" changes the state and runs
the already failed for_each_cpu() loop again for no reason.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151010185315.GA24100@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a7b368e..b4d263d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1580,13 +1580,15 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			goto out;
 		}
 
+		/* No more Mr. Nice Guy. */
 		switch (state) {
 		case cpuset:
-			/* No more Mr. Nice Guy. */
-			cpuset_cpus_allowed_fallback(p);
-			state = possible;
-			break;
-
+			if (IS_ENABLED(CONFIG_CPUSETS)) {
+				cpuset_cpus_allowed_fallback(p);
+				state = possible;
+				break;
+			}
+			/* fall-through */
 		case possible:
 			do_set_cpus_allowed(p, cpu_possible_mask);
 			state = fail;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-10-20  9:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-10 18:52 [PATCH 0/3] (Was: sched: start stopper early) Oleg Nesterov
2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
2015-10-11 18:04   ` Oleg Nesterov
2015-10-11 18:57     ` Thomas Gleixner
2015-10-12 12:16   ` Peter Zijlstra
2015-10-12 17:34     ` Oleg Nesterov
2015-10-14 15:00       ` Peter Zijlstra
2015-10-14 20:05         ` Oleg Nesterov
2015-10-14 20:35           ` Peter Zijlstra
2015-10-10 18:53 ` [PATCH 2/3] sched: change select_fallback_rq() to use for_each_cpu_and() Oleg Nesterov
2015-10-10 18:53 ` [PATCH 3/3] sched: don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS Oleg Nesterov
2015-10-20  9:35   ` [tip:sched/core] sched: Don't scan all-offline -> cpus_allowed " tip-bot for Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.