From: Nick Piggin <piggin@cyberone.com.au>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Anton Blanchard <anton@samba.org>, Ingo Molnar <mingo@redhat.com>,
"Martin J. Bligh" <mbligh@aracnet.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
Mark Wong <markw@osdl.org>, John Hawkes <hawkes@sgi.com>
Subject: Re: [CFT][RFC] HT scheduler
Date: Fri, 19 Dec 2003 15:57:13 +1100 [thread overview]
Message-ID: <3FE28529.1010003@cyberone.com.au> (raw)
In-Reply-To: <3FDE3EF7.7000001@cyberone.com.au>
[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]
Nick Piggin wrote:
>
>
> Rusty Russell wrote:
>
>>
>> A few things need work:
>>
>> 1) There's a race between sys_sched_setaffinity() and
>> sched_migrate_task() (this is nothing to do with your patch).
>>
>
> Yep. They should both take the task's runqueue lock.
Easier said than done... anyway, how does this patch look?
It should also cure a possible and not entirely unlikely use
after free of the task_struct in sched_migrate_task on NUMA
AFAIKS.
Patch is on top of a few other changes so might not apply, just
for review. I'll release a new rollup with this included soon.
>
>
>>
>> 2) Please change those #defines into an enum for idle (patch follows,
>> untested but trivial)
>>
>
> Thanks, I'll take the patch.
done
>
>>
>> 3) conditional locking in load_balance is v. bad idea.
>>
>
> Yeah... I'm thinking about this. I don't think it should be too hard
> to break out the shared portion.
done
>
>
>>
>> 4) load_balance returns "(!failed && !balanced)", but callers stop
>> calling it when it returns true. Why not simply return "balanced",
>> or at least "balanced && !failed"?
>>
>>
>
> No, the idle balancer stops calling it when it returns true, the periodic
> balancer sets idle to 0 when it returns true.
>
> !balanced && !failed means it has moved a task.
>
> I'll either comment that, or return it in a more direct way.
>
done
[-- Attachment #2: sched-migrate-affinity-race.patch --]
[-- Type: text/plain, Size: 4423 bytes --]
Prevents a race where sys_sched_setaffinity can race with sched_migrate_task
and cause sched_migrate_task to restore an invalid cpu mask.
linux-2.6-npiggin/kernel/sched.c | 89 +++++++++++++++++++++++++++++----------
1 files changed, 68 insertions(+), 21 deletions(-)
diff -puN kernel/sched.c~sched-migrate-affinity-race kernel/sched.c
--- linux-2.6/kernel/sched.c~sched-migrate-affinity-race 2003-12-19 14:45:58.000000000 +1100
+++ linux-2.6-npiggin/kernel/sched.c 2003-12-19 15:19:30.000000000 +1100
@@ -947,6 +947,9 @@ static inline void double_rq_unlock(runq
}
#ifdef CONFIG_NUMA
+
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags);
+
/*
* If dest_cpu is allowed for this process, migrate the task to it.
* This is accomplished by forcing the cpu_allowed mask to only
@@ -955,16 +958,43 @@ static inline void double_rq_unlock(runq
*/
static void sched_migrate_task(task_t *p, int dest_cpu)
{
- cpumask_t old_mask;
+ runqueue_t *rq;
+ unsigned long flags;
+ cpumask_t old_mask, new_mask = cpumask_of_cpu(dest_cpu);
+ rq = task_rq_lock(p, &flags);
old_mask = p->cpus_allowed;
- if (!cpu_isset(dest_cpu, old_mask))
+ if (!cpu_isset(dest_cpu, old_mask)) {
+ task_rq_unlock(rq, &flags);
return;
+ }
+
+ get_task_struct(p);
+
/* force the process onto the specified CPU */
- set_cpus_allowed(p, cpumask_of_cpu(dest_cpu));
+ if (__set_cpus_allowed(p, new_mask, &flags) < 0)
+ goto out;
+
+ /* __set_cpus_allowed unlocks the runqueue */
+ rq = task_rq_lock(p, &flags);
+ if (unlikely(p->cpus_allowed != new_mask)) {
+ /*
+ * We have raced with another set_cpus_allowed.
+ * old_mask is invalid and we needn't move the
+ * task back.
+ */
+ task_rq_unlock(rq, &flags);
+ goto out;
+ }
+
+ /*
+ * restore the cpus allowed mask. old_mask must be valid because
+ * p->cpus_allowed is a subset of old_mask.
+ */
+ __set_cpus_allowed(p, old_mask, &flags);
- /* restore the cpus allowed mask */
- set_cpus_allowed(p, old_mask);
+out:
+ put_task_struct(p);
}
/*
@@ -2603,31 +2633,27 @@ typedef struct {
} migration_req_t;
/*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely. The
- * call is not atomic; no spinlocks may be held.
+ * See comment for set_cpus_allowed. calling rules are different:
+ * the task's runqueue lock must be held, and __set_cpus_allowed
+ * will return with the runqueue unlocked.
*/
-int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags)
{
- unsigned long flags;
migration_req_t req;
- runqueue_t *rq;
+ runqueue_t *rq = task_rq(p);
- if (any_online_cpu(new_mask) == NR_CPUS)
+ if (any_online_cpu(new_mask) == NR_CPUS) {
+ task_rq_unlock(rq, flags);
return -EINVAL;
+ }
- rq = task_rq_lock(p, &flags);
p->cpus_allowed = new_mask;
/*
* Can the task run on the task's current CPU? If not then
* migrate the thread off to a proper CPU.
*/
if (cpu_isset(task_cpu(p), new_mask)) {
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, flags);
return 0;
}
/*
@@ -2636,18 +2662,39 @@ int set_cpus_allowed(task_t *p, cpumask_
*/
if (!p->array && !task_running(rq, p)) {
set_task_cpu(p, any_online_cpu(p->cpus_allowed));
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, flags);
return 0;
}
+
init_completion(&req.done);
req.task = p;
list_add(&req.list, &rq->migration_queue);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, flags);
wake_up_process(rq->migration_thread);
-
wait_for_completion(&req.done);
+
return 0;
+
+}
+
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely. The
+ * call is not atomic; no spinlocks may be held.
+ */
+int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+{
+ unsigned long flags;
+ runqueue_t *rq;
+
+ rq = task_rq_lock(p, &flags);
+
+ return __set_cpus_allowed(p, new_mask, &flags);
}
EXPORT_SYMBOL_GPL(set_cpus_allowed);
_
next prev parent reply other threads:[~2003-12-19 4:57 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-12-08 4:25 [PATCH][RFC] make cpu_sibling_map a cpumask_t Nick Piggin
2003-12-08 15:59 ` Anton Blanchard
2003-12-08 23:08 ` Nick Piggin
2003-12-09 0:14 ` Anton Blanchard
2003-12-11 4:25 ` [CFT][RFC] HT scheduler Nick Piggin
2003-12-11 7:24 ` Nick Piggin
2003-12-11 8:57 ` Nick Piggin
2003-12-11 11:52 ` William Lee Irwin III
2003-12-11 13:09 ` Nick Piggin
2003-12-11 13:23 ` William Lee Irwin III
2003-12-11 13:30 ` Nick Piggin
2003-12-11 13:32 ` William Lee Irwin III
2003-12-11 15:30 ` Nick Piggin
2003-12-11 15:38 ` William Lee Irwin III
2003-12-11 15:51 ` Nick Piggin
2003-12-11 15:56 ` William Lee Irwin III
2003-12-11 16:37 ` Nick Piggin
2003-12-11 16:40 ` William Lee Irwin III
2003-12-12 1:52 ` [PATCH] improve rwsem scalability (was Re: [CFT][RFC] HT scheduler) Nick Piggin
2003-12-12 2:02 ` Nick Piggin
2003-12-12 9:41 ` Ingo Molnar
2003-12-13 0:07 ` Nick Piggin
2003-12-14 0:44 ` Nick Piggin
2003-12-17 5:27 ` Nick Piggin
2003-12-19 11:52 ` Nick Piggin
2003-12-19 15:06 ` Martin J. Bligh
2003-12-20 0:08 ` Nick Piggin
2003-12-12 0:58 ` [CFT][RFC] HT scheduler Rusty Russell
2003-12-11 10:01 ` Rhino
2003-12-11 8:14 ` Nick Piggin
2003-12-11 16:49 ` Rhino
2003-12-11 15:16 ` Nick Piggin
2003-12-11 11:40 ` William Lee Irwin III
2003-12-11 17:05 ` Rhino
2003-12-11 15:17 ` William Lee Irwin III
2003-12-11 16:28 ` Kevin P. Fleming
2003-12-11 16:41 ` Nick Piggin
2003-12-12 2:24 ` Rusty Russell
2003-12-12 7:00 ` Nick Piggin
2003-12-12 7:23 ` Rusty Russell
2003-12-13 6:43 ` Nick Piggin
2003-12-14 1:35 ` bill davidsen
2003-12-14 2:18 ` Nick Piggin
2003-12-14 4:32 ` Jamie Lokier
2003-12-14 9:40 ` Nick Piggin
2003-12-14 10:46 ` Arjan van de Ven
2003-12-16 17:46 ` Bill Davidsen
2003-12-16 18:22 ` Linus Torvalds
2003-12-17 0:24 ` Davide Libenzi
2003-12-17 0:41 ` Linus Torvalds
2003-12-17 0:54 ` Davide Libenzi
2003-12-16 17:34 ` Bill Davidsen
2003-12-15 5:53 ` Rusty Russell
2003-12-15 23:08 ` Nick Piggin
2003-12-19 4:57 ` Nick Piggin [this message]
2003-12-19 5:13 ` Nick Piggin
2003-12-20 2:43 ` Rusty Russell
2003-12-21 2:56 ` Nick Piggin
2004-01-03 18:57 ` Bill Davidsen
2003-12-15 20:21 ` Zwane Mwaikambo
2003-12-15 23:20 ` Nick Piggin
2003-12-16 0:11 ` Zwane Mwaikambo
2003-12-12 8:59 ` Nick Piggin
2003-12-12 15:14 ` Martin J. Bligh
2003-12-08 19:44 ` [PATCH][RFC] make cpu_sibling_map a cpumask_t James Cleverdon
2003-12-08 20:38 ` Ingo Molnar
2003-12-08 20:51 ` Zwane Mwaikambo
2003-12-08 20:55 ` Ingo Molnar
2003-12-08 23:17 ` Nick Piggin
2003-12-08 23:36 ` Ingo Molnar
2003-12-08 23:58 ` Nick Piggin
2003-12-08 23:46 ` Rusty Russell
2003-12-09 13:36 ` Nick Piggin
2003-12-11 21:41 ` bill davidsen
[not found] <20031213022038.300B22C2C1@lists.samba.org.suse.lists.linux.kernel>
[not found] ` <3FDAB517.4000309@cyberone.com.au.suse.lists.linux.kernel>
[not found] ` <brgeo7$huv$1@gatekeeper.tmr.com.suse.lists.linux.kernel>
[not found] ` <3FDBC876.3020603@cyberone.com.au.suse.lists.linux.kernel>
[not found] ` <20031214043245.GC21241@mail.shareable.org.suse.lists.linux.kernel>
[not found] ` <3FDC3023.9030708@cyberone.com.au.suse.lists.linux.kernel>
[not found] ` <1071398761.5233.1.camel@laptop.fenrus.com.suse.lists.linux.kernel>
2003-12-14 16:26 ` [CFT][RFC] HT scheduler Andi Kleen
2003-12-14 16:54 ` Arjan van de Ven
[not found] <200312161127.13691.habanero@us.ibm.com>
2003-12-16 17:37 ` Andrew Theurer
2003-12-17 2:41 ` Nick Piggin
2003-12-16 19:03 Nakajima, Jun
2003-12-17 0:38 Nakajima, Jun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3FE28529.1010003@cyberone.com.au \
--to=piggin@cyberone.com.au \
--cc=anton@samba.org \
--cc=hawkes@sgi.com \
--cc=jun.nakajima@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=markw@osdl.org \
--cc=mbligh@aracnet.com \
--cc=mingo@redhat.com \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).