linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <piggin@cyberone.com.au>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Anton Blanchard <anton@samba.org>, Ingo Molnar <mingo@redhat.com>,
	"Martin J. Bligh" <mbligh@aracnet.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	Mark Wong <markw@osdl.org>, John Hawkes <hawkes@sgi.com>
Subject: Re: [CFT][RFC] HT scheduler
Date: Fri, 19 Dec 2003 15:57:13 +1100	[thread overview]
Message-ID: <3FE28529.1010003@cyberone.com.au> (raw)
In-Reply-To: <3FDE3EF7.7000001@cyberone.com.au>

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]



Nick Piggin wrote:

>
>
> Rusty Russell wrote:
>
>>
>> A few things need work:
>>
>> 1) There's a race between sys_sched_setaffinity() and
>>   sched_migrate_task() (this is nothing to do with your patch).
>>
>
> Yep. They should both take the task's runqueue lock. 


Easier said than done... anyway, how does this patch look?
It should also cure a possible and not entirely unlikely use
after free of the task_struct in sched_migrate_task on NUMA
AFAIKS.

Patch is on top of a few other changes so might not apply, just
for review. I'll release a new rollup with this included soon.

>
>
>>
>> 2) Please change those #defines into an enum for idle (patch follows,
>>   untested but trivial)
>>
>
> Thanks, I'll take the patch.


done

>
>>
>> 3) conditional locking in load_balance is v. bad idea.
>>
>
> Yeah... I'm thinking about this. I don't think it should be too hard
> to break out the shared portion. 


done

>
>
>>
>> 4) load_balance returns "(!failed && !balanced)", but callers stop
>>   calling it when it returns true.  Why not simply return "balanced",
>>   or at least "balanced && !failed"?
>>
>>
>
> No, the idle balancer stops calling it when it returns true, the periodic
> balancer sets idle to 0 when it returns true.
>
> !balanced && !failed means it has moved a task.
>
> I'll either comment that, or return it in a more direct way.
>

done



[-- Attachment #2: sched-migrate-affinity-race.patch --]
[-- Type: text/plain, Size: 4423 bytes --]


Prevents a race where sys_sched_setaffinity can race with sched_migrate_task
and cause sched_migrate_task to restore an invalid cpu mask.


 linux-2.6-npiggin/kernel/sched.c |   89 +++++++++++++++++++++++++++++----------
 1 files changed, 68 insertions(+), 21 deletions(-)

diff -puN kernel/sched.c~sched-migrate-affinity-race kernel/sched.c
--- linux-2.6/kernel/sched.c~sched-migrate-affinity-race	2003-12-19 14:45:58.000000000 +1100
+++ linux-2.6-npiggin/kernel/sched.c	2003-12-19 15:19:30.000000000 +1100
@@ -947,6 +947,9 @@ static inline void double_rq_unlock(runq
 }
 
 #ifdef CONFIG_NUMA
+
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags);
+
 /*
  * If dest_cpu is allowed for this process, migrate the task to it.
  * This is accomplished by forcing the cpu_allowed mask to only
@@ -955,16 +958,43 @@ static inline void double_rq_unlock(runq
  */
 static void sched_migrate_task(task_t *p, int dest_cpu)
 {
-	cpumask_t old_mask;
+	runqueue_t *rq;
+	unsigned long flags;
+	cpumask_t old_mask, new_mask = cpumask_of_cpu(dest_cpu);
 
+	rq = task_rq_lock(p, &flags);
 	old_mask = p->cpus_allowed;
-	if (!cpu_isset(dest_cpu, old_mask))
+	if (!cpu_isset(dest_cpu, old_mask)) {
+		task_rq_unlock(rq, &flags);
 		return;
+	}
+
+	get_task_struct(p);
+
 	/* force the process onto the specified CPU */
-	set_cpus_allowed(p, cpumask_of_cpu(dest_cpu));
+	if (__set_cpus_allowed(p, new_mask, &flags) < 0)
+		goto out;
+
+	/* __set_cpus_allowed unlocks the runqueue */
+	rq = task_rq_lock(p, &flags);
+	if (unlikely(p->cpus_allowed != new_mask)) {
+		/*
+		 * We have raced with another set_cpus_allowed.
+		 * old_mask is invalid and we needn't move the
+		 * task back.
+		 */
+		task_rq_unlock(rq, &flags);
+		goto out;
+	}
+
+	/*
+	 * restore the cpus allowed mask. old_mask must be valid because
+	 * p->cpus_allowed is a subset of old_mask.
+	 */
+	__set_cpus_allowed(p, old_mask, &flags);
 
-	/* restore the cpus allowed mask */
-	set_cpus_allowed(p, old_mask);
+out:
+	put_task_struct(p);
 }
 
 /*
@@ -2603,31 +2633,27 @@ typedef struct {
 } migration_req_t;
 
 /*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely.  The
- * call is not atomic; no spinlocks may be held.
+ * See comment for set_cpus_allowed. calling rules are different:
+ * the task's runqueue lock must be held, and __set_cpus_allowed
+ * will return with the runqueue unlocked.
  */
-int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags)
 {
-	unsigned long flags;
 	migration_req_t req;
-	runqueue_t *rq;
+	runqueue_t *rq = task_rq(p);
 
-	if (any_online_cpu(new_mask) == NR_CPUS)
+	if (any_online_cpu(new_mask) == NR_CPUS) {
+		task_rq_unlock(rq, flags);
 		return -EINVAL;
+	}
 
-	rq = task_rq_lock(p, &flags);
 	p->cpus_allowed = new_mask;
 	/*
 	 * Can the task run on the task's current CPU? If not then
 	 * migrate the thread off to a proper CPU.
 	 */
 	if (cpu_isset(task_cpu(p), new_mask)) {
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, flags);
 		return 0;
 	}
 	/*
@@ -2636,18 +2662,39 @@ int set_cpus_allowed(task_t *p, cpumask_
 	 */
 	if (!p->array && !task_running(rq, p)) {
 		set_task_cpu(p, any_online_cpu(p->cpus_allowed));
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, flags);
 		return 0;
 	}
+
 	init_completion(&req.done);
 	req.task = p;
 	list_add(&req.list, &rq->migration_queue);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, flags);
 
 	wake_up_process(rq->migration_thread);
-
 	wait_for_completion(&req.done);
+
 	return 0;
+
+}
+
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely.  The
+ * call is not atomic; no spinlocks may be held.
+ */
+int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+{
+	unsigned long flags;
+	runqueue_t *rq;
+
+	rq = task_rq_lock(p, &flags);
+
+	return __set_cpus_allowed(p, new_mask, &flags);
 }
 
 EXPORT_SYMBOL_GPL(set_cpus_allowed);

_

  reply	other threads:[~2003-12-19  4:57 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-08  4:25 [PATCH][RFC] make cpu_sibling_map a cpumask_t Nick Piggin
2003-12-08 15:59 ` Anton Blanchard
2003-12-08 23:08   ` Nick Piggin
2003-12-09  0:14     ` Anton Blanchard
2003-12-11  4:25       ` [CFT][RFC] HT scheduler Nick Piggin
2003-12-11  7:24         ` Nick Piggin
2003-12-11  8:57           ` Nick Piggin
2003-12-11 11:52             ` William Lee Irwin III
2003-12-11 13:09               ` Nick Piggin
2003-12-11 13:23                 ` William Lee Irwin III
2003-12-11 13:30                   ` Nick Piggin
2003-12-11 13:32                     ` William Lee Irwin III
2003-12-11 15:30                       ` Nick Piggin
2003-12-11 15:38                         ` William Lee Irwin III
2003-12-11 15:51                           ` Nick Piggin
2003-12-11 15:56                             ` William Lee Irwin III
2003-12-11 16:37                               ` Nick Piggin
2003-12-11 16:40                                 ` William Lee Irwin III
2003-12-12  1:52                         ` [PATCH] improve rwsem scalability (was Re: [CFT][RFC] HT scheduler) Nick Piggin
2003-12-12  2:02                           ` Nick Piggin
2003-12-12  9:41                           ` Ingo Molnar
2003-12-13  0:07                             ` Nick Piggin
2003-12-14  0:44                               ` Nick Piggin
2003-12-17  5:27                                 ` Nick Piggin
2003-12-19 11:52                                   ` Nick Piggin
2003-12-19 15:06                                     ` Martin J. Bligh
2003-12-20  0:08                                       ` Nick Piggin
2003-12-12  0:58             ` [CFT][RFC] HT scheduler Rusty Russell
2003-12-11 10:01           ` Rhino
2003-12-11  8:14             ` Nick Piggin
2003-12-11 16:49               ` Rhino
2003-12-11 15:16                 ` Nick Piggin
2003-12-11 11:40             ` William Lee Irwin III
2003-12-11 17:05               ` Rhino
2003-12-11 15:17                 ` William Lee Irwin III
2003-12-11 16:28         ` Kevin P. Fleming
2003-12-11 16:41           ` Nick Piggin
2003-12-12  2:24         ` Rusty Russell
2003-12-12  7:00           ` Nick Piggin
2003-12-12  7:23             ` Rusty Russell
2003-12-13  6:43               ` Nick Piggin
2003-12-14  1:35                 ` bill davidsen
2003-12-14  2:18                   ` Nick Piggin
2003-12-14  4:32                     ` Jamie Lokier
2003-12-14  9:40                       ` Nick Piggin
2003-12-14 10:46                         ` Arjan van de Ven
2003-12-16 17:46                         ` Bill Davidsen
2003-12-16 18:22                       ` Linus Torvalds
2003-12-17  0:24                         ` Davide Libenzi
2003-12-17  0:41                           ` Linus Torvalds
2003-12-17  0:54                             ` Davide Libenzi
2003-12-16 17:34                     ` Bill Davidsen
2003-12-15  5:53                 ` Rusty Russell
2003-12-15 23:08                   ` Nick Piggin
2003-12-19  4:57                     ` Nick Piggin [this message]
2003-12-19  5:13                       ` Nick Piggin
2003-12-20  2:43                       ` Rusty Russell
2003-12-21  2:56                         ` Nick Piggin
2004-01-03 18:57                   ` Bill Davidsen
2003-12-15 20:21                 ` Zwane Mwaikambo
2003-12-15 23:20                   ` Nick Piggin
2003-12-16  0:11                     ` Zwane Mwaikambo
2003-12-12  8:59             ` Nick Piggin
2003-12-12 15:14               ` Martin J. Bligh
2003-12-08 19:44 ` [PATCH][RFC] make cpu_sibling_map a cpumask_t James Cleverdon
2003-12-08 20:38 ` Ingo Molnar
2003-12-08 20:51 ` Zwane Mwaikambo
2003-12-08 20:55   ` Ingo Molnar
2003-12-08 23:17     ` Nick Piggin
2003-12-08 23:36       ` Ingo Molnar
2003-12-08 23:58         ` Nick Piggin
2003-12-08 23:46 ` Rusty Russell
2003-12-09 13:36   ` Nick Piggin
2003-12-11 21:41     ` bill davidsen
     [not found] <20031213022038.300B22C2C1@lists.samba.org.suse.lists.linux.kernel>
     [not found] ` <3FDAB517.4000309@cyberone.com.au.suse.lists.linux.kernel>
     [not found]   ` <brgeo7$huv$1@gatekeeper.tmr.com.suse.lists.linux.kernel>
     [not found]     ` <3FDBC876.3020603@cyberone.com.au.suse.lists.linux.kernel>
     [not found]       ` <20031214043245.GC21241@mail.shareable.org.suse.lists.linux.kernel>
     [not found]         ` <3FDC3023.9030708@cyberone.com.au.suse.lists.linux.kernel>
     [not found]           ` <1071398761.5233.1.camel@laptop.fenrus.com.suse.lists.linux.kernel>
2003-12-14 16:26             ` [CFT][RFC] HT scheduler Andi Kleen
2003-12-14 16:54               ` Arjan van de Ven
     [not found] <200312161127.13691.habanero@us.ibm.com>
2003-12-16 17:37 ` Andrew Theurer
2003-12-17  2:41   ` Nick Piggin
2003-12-16 19:03 Nakajima, Jun
2003-12-17  0:38 Nakajima, Jun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3FE28529.1010003@cyberone.com.au \
    --to=piggin@cyberone.com.au \
    --cc=anton@samba.org \
    --cc=hawkes@sgi.com \
    --cc=jun.nakajima@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markw@osdl.org \
    --cc=mbligh@aracnet.com \
    --cc=mingo@redhat.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).