* [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
@ 2010-03-15 9:10 Oleg Nesterov
2010-03-24 15:41 ` Peter Zijlstra
2010-04-02 19:12 ` [tip:sched/core] sched: move_task_off_dead_cpu(): Take " tip-bot for Oleg Nesterov
0 siblings, 2 replies; 7+ messages in thread
From: Oleg Nesterov @ 2010-03-15 9:10 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie,
Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/sched.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
--- 34-rc1/kernel/sched.c~2_MTODC_TAKE_RQ_LOCK 2010-03-15 09:40:16.000000000 +0100
+++ 34-rc1/kernel/sched.c 2010-03-15 09:40:31.000000000 +0100
@@ -5509,29 +5509,29 @@ static int migration_thread(void *data)
}
#ifdef CONFIG_HOTPLUG_CPU
-
-static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu)
-{
- int ret;
-
- local_irq_disable();
- ret = __migrate_task(p, src_cpu, dest_cpu);
- local_irq_enable();
- return ret;
-}
-
/*
* Figure out where task on dead CPU should go, use force if necessary.
*/
static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
{
- int dest_cpu;
-
+ struct rq *rq = cpu_rq(dead_cpu);
+ int needs_cpu, dest_cpu;
+ unsigned long flags;
again:
- dest_cpu = select_fallback_rq(dead_cpu, p);
+ local_irq_save(flags);
+
+ raw_spin_lock(&rq->lock);
+ needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
+ if (needs_cpu)
+ dest_cpu = select_fallback_rq(dead_cpu, p);
+ raw_spin_unlock(&rq->lock);
/* It can have affinity changed while we were choosing. */
- if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
+ if (needs_cpu)
+ needs_cpu = !__migrate_task(p, dead_cpu, dest_cpu);
+ local_irq_restore(flags);
+
+ if (unlikely(needs_cpu))
goto again;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-15 9:10 [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq() Oleg Nesterov
@ 2010-03-24 15:41 ` Peter Zijlstra
2010-03-24 16:07 ` Oleg Nesterov
2010-04-02 19:12 ` [tip:sched/core] sched: move_task_off_dead_cpu(): Take " tip-bot for Oleg Nesterov
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2010-03-24 15:41 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo,
linux-kernel
On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> {
> + struct rq *rq = cpu_rq(dead_cpu);
> + int needs_cpu, dest_cpu;
> + unsigned long flags;
> again:
> + local_irq_save(flags);
> +
> + raw_spin_lock(&rq->lock);
> + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
^
kernel/sched.c:5445: warning: ‘dest_cpu’ may be used uninitialized in this function
> + if (needs_cpu)
> + dest_cpu = select_fallback_rq(dead_cpu, p);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 15:41 ` Peter Zijlstra
@ 2010-03-24 16:07 ` Oleg Nesterov
2010-03-24 16:17 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2010-03-24 16:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo,
linux-kernel
On 03/24, Peter Zijlstra wrote:
>
> On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> > static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> > {
> > + struct rq *rq = cpu_rq(dead_cpu);
> > + int needs_cpu, dest_cpu;
> > + unsigned long flags;
> > again:
> > + local_irq_save(flags);
> > +
> > + raw_spin_lock(&rq->lock);
> > + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
>
> ^
> kernel/sched.c:5445: warning: ‘dest_cpu’ may be used uninitialized in this function
Hmm. looks like my gcc is more friendly...
OK. certainly I'll send the updated patch, if this series passes
your review otherwise.
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 16:07 ` Oleg Nesterov
@ 2010-03-24 16:17 ` Peter Zijlstra
2010-03-24 16:33 ` Oleg Nesterov
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2010-03-24 16:17 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo,
linux-kernel
On Wed, 2010-03-24 at 17:07 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> > > static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> > > {
> > > + struct rq *rq = cpu_rq(dead_cpu);
> > > + int needs_cpu, dest_cpu;
> > > + unsigned long flags;
> > > again:
> > > + local_irq_save(flags);
> > > +
> > > + raw_spin_lock(&rq->lock);
> > > + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
> >
> > ^
> > kernel/sched.c:5445: warning: ‘dest_cpu’ may be used uninitialized in this function
>
> Hmm. looks like my gcc is more friendly...
Hrm, that and I'm apparently unable to read, it said dest_cpu, not
dead_cpu.. a well, I'll slam an __maybe_unused in.
> OK. certainly I'll send the updated patch, if this series passes
> your review otherwise.
Yeah, you made a few good points in 0/6, am now staring at the code on
how to close those holes, hope to post something sensible soon.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 16:17 ` Peter Zijlstra
@ 2010-03-24 16:33 ` Oleg Nesterov
2010-03-26 9:06 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2010-03-24 16:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo,
linux-kernel
On 03/24, Peter Zijlstra wrote:
>
> Yeah, you made a few good points in 0/6, am now staring at the code on
> how to close those holes, hope to post something sensible soon.
Yes, great.
Speaking of 0/6, I forgot to ask a couple more question...
try_to_wake_up() does task_rq_lock() which checks TASK_WAKING. Perhaps
it shouldn't ? I mean, perhaps try_to_wake_up() can take rq->lock without
checking task->state. It can never race with the owner of TASK_WAKING,
before anything else we check "p->state & state".
And. Without the change above, any owner of TASK_WAKING must disable
preemption and clear irqs.
What do you think?
And a stupid question. While doing these changes I was really, really
puzzled by task_rq_lock() which does
local_irq_save(*flags);
rq = task_rq(p);
raw_spin_lock(&rq->lock);
to the point, I even tried to read the comment which says:
Note the ordering: we can safely lookup the task_rq without
explicitly disabling preemption.
Could you please explain what does this mean? IOW, why can't we do
rq = task_rq(p);
raw_spin_lock_irqsave(&rq->lock, flags);
instead?
Of course, this doesn't really matter, but I'd like to understand
what I have missed here.
Thanks,
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 16:33 ` Oleg Nesterov
@ 2010-03-26 9:06 ` Peter Zijlstra
0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2010-03-26 9:06 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo,
linux-kernel
On Wed, 2010-03-24 at 17:33 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > Yeah, you made a few good points in 0/6, am now staring at the code on
> > how to close those holes, hope to post something sensible soon.
>
> Yes, great.
>
> Speaking of 0/6, I forgot to ask a couple more question...
>
> try_to_wake_up() does task_rq_lock() which checks TASK_WAKING. Perhaps
> it shouldn't ? I mean, perhaps try_to_wake_up() can take rq->lock without
> checking task->state. It can never race with the owner of TASK_WAKING,
> before anything else we check "p->state & state".
You're right, but creating a special task_rq_lock() for ttwu() went a
little far, but now that we can remove all that again, this too should
be good again.
> And a stupid question. While doing these changes I was really, really
> puzzled by task_rq_lock() which does
>
> local_irq_save(*flags);
> rq = task_rq(p);
> raw_spin_lock(&rq->lock);
>
> to the point, I even tried to read the comment which says:
>
> Note the ordering: we can safely lookup the task_rq without
> explicitly disabling preemption.
>
> Could you please explain what does this mean? IOW, why can't we do
>
> rq = task_rq(p);
> raw_spin_lock_irqsave(&rq->lock, flags);
>
> instead?
I'm not sure why that is the case, v2.6.14:kernel/sched.c already has
that. Ingo can you remember any reason for this or should we change the
code like Oleg suggests?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:sched/core] sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()
2010-03-15 9:10 [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq() Oleg Nesterov
2010-03-24 15:41 ` Peter Zijlstra
@ 2010-04-02 19:12 ` tip-bot for Oleg Nesterov
1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Oleg Nesterov @ 2010-04-02 19:12 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, oleg, tglx, mingo
Commit-ID: 1445c08d06c5594895b4fae952ef8a457e89c390
Gitweb: http://git.kernel.org/tip/1445c08d06c5594895b4fae952ef8a457e89c390
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Mon, 15 Mar 2010 10:10:10 +0100
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 2 Apr 2010 20:12:01 +0200
sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091010.GA9131@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 30 +++++++++++++++---------------
1 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index c0b3ebc..27774b5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5448,29 +5448,29 @@ static int migration_thread(void *data)
}
#ifdef CONFIG_HOTPLUG_CPU
-
-static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu)
-{
- int ret;
-
- local_irq_disable();
- ret = __migrate_task(p, src_cpu, dest_cpu);
- local_irq_enable();
- return ret;
-}
-
/*
* Figure out where task on dead CPU should go, use force if necessary.
*/
static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
{
- int dest_cpu;
-
+ struct rq *rq = cpu_rq(dead_cpu);
+ int needs_cpu, uninitialized_var(dest_cpu);
+ unsigned long flags;
again:
- dest_cpu = select_fallback_rq(dead_cpu, p);
+ local_irq_save(flags);
+
+ raw_spin_lock(&rq->lock);
+ needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
+ if (needs_cpu)
+ dest_cpu = select_fallback_rq(dead_cpu, p);
+ raw_spin_unlock(&rq->lock);
/* It can have affinity changed while we were choosing. */
- if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
+ if (needs_cpu)
+ needs_cpu = !__migrate_task(p, dead_cpu, dest_cpu);
+ local_irq_restore(flags);
+
+ if (unlikely(needs_cpu))
goto again;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-04-02 19:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-15 9:10 [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq() Oleg Nesterov
2010-03-24 15:41 ` Peter Zijlstra
2010-03-24 16:07 ` Oleg Nesterov
2010-03-24 16:17 ` Peter Zijlstra
2010-03-24 16:33 ` Oleg Nesterov
2010-03-26 9:06 ` Peter Zijlstra
2010-04-02 19:12 ` [tip:sched/core] sched: move_task_off_dead_cpu(): Take " tip-bot for Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).