linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] scheduler fixes, 2.5.32-BK
@ 2002-08-30  7:43 Ingo Molnar
  2002-08-30  7:55 ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2002-08-30  7:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


the attached patch adds two scheduler related fixes:

 - changes the migration code to use struct completion. Andrew pointed out
   that there might be a small window in where the up() touches the
   semaphore while the waiting task goes on and frees its stack. And
   completion is more suited for this kind of stuff anyway.

 - removes two unneeded exports, pointed out by Andrew.

	Ingo

--- linux/kernel/sched.c.orig	Fri Aug 30 09:34:34 2002
+++ linux/kernel/sched.c	Fri Aug 30 09:35:49 2002
@@ -1901,7 +1901,7 @@
 typedef struct {
 	list_t list;
 	task_t *task;
-	struct semaphore sem;
+	struct completion done;
 } migration_req_t;
 
 /*
@@ -1945,13 +1945,13 @@
 		task_rq_unlock(rq, &flags);
 		goto out;
 	}
-	init_MUTEX_LOCKED(&req.sem);
+	init_completion(&req.done);
 	req.task = p;
 	list_add(&req.list, &rq->migration_queue);
 	task_rq_unlock(rq, &flags);
 	wake_up_process(rq->migration_thread);
 
-	down(&req.sem);
+	wait_for_completion(&req.done);
 out:
 	preempt_enable();
 }
@@ -2032,7 +2032,7 @@
 		double_rq_unlock(rq_src, rq_dest);
 		local_irq_restore(flags);
 
-		up(&req->sem);
+		complete(&req->done);
 	}
 }
 
--- linux/kernel/ksyms.c.orig	Fri Aug 30 09:37:36 2002
+++ linux/kernel/ksyms.c	Fri Aug 30 09:37:38 2002
@@ -496,8 +496,6 @@
 #endif
 
 EXPORT_SYMBOL(kstat);
-EXPORT_SYMBOL(nr_running);
-EXPORT_SYMBOL(nr_context_switches);
 
 /* misc */
 EXPORT_SYMBOL(panic);



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30  7:43 [patch] scheduler fixes, 2.5.32-BK Ingo Molnar
@ 2002-08-30  7:55 ` Ingo Molnar
  2002-08-30  8:12   ` Andrew Morton
  2002-08-30 17:03   ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2002-08-30  7:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


>  - changes the migration code to use struct completion. Andrew pointed out
>    that there might be a small window in where the up() touches the
>    semaphore while the waiting task goes on and frees its stack. And
>    completion is more suited for this kind of stuff anyway.

actually, i think the race does not exist. up() is perfectly safely done
on the on-stack semaphore, because both the wake_up() done by __up() and
the __down() path takes the waitqueue spinlock, so i cannot see where the
up() touches the semaphore after the down()-ed task has been woken up.

the second argument still holds though - a completion is probably slightly
cheaper in this case.

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30  7:55 ` Ingo Molnar
@ 2002-08-30  8:12   ` Andrew Morton
  2002-08-30  8:22     ` Ingo Molnar
  2002-08-30 17:03   ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2002-08-30  8:12 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel

Ingo Molnar wrote:
> 
> >  - changes the migration code to use struct completion. Andrew pointed out
> >    that there might be a small window in where the up() touches the
> >    semaphore while the waiting task goes on and frees its stack. And
> >    completion is more suited for this kind of stuff anyway.
> 
> actually, i think the race does not exist. up() is perfectly safely done
> on the on-stack semaphore, because both the wake_up() done by __up() and
> the __down() path takes the waitqueue spinlock, so i cannot see where the
> up() touches the semaphore after the down()-ed task has been woken up.
> 

yep, looks like the killing of the semaphore_lock made the race
go away.

But ia64, sparc and x86_64 use semaphore_lock, so they still are
exposed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30  8:12   ` Andrew Morton
@ 2002-08-30  8:22     ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2002-08-30  8:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel


On Fri, 30 Aug 2002, Andrew Morton wrote:

> yep, looks like the killing of the semaphore_lock made the race go away.
> 
> But ia64, sparc and x86_64 use semaphore_lock, so they still are
> exposed.

btw., this way completions become special-case semaphores optimized for
the context-switching path, not a correctness issue. At which point it's
also an interesting question whether in fact we need to make completion()  
that much of a different interface?

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30  7:55 ` Ingo Molnar
  2002-08-30  8:12   ` Andrew Morton
@ 2002-08-30 17:03   ` Linus Torvalds
  2002-08-30 17:05     ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2002-08-30 17:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton


On Fri, 30 Aug 2002, Ingo Molnar wrote:
> 
> actually, i think the race does not exist. up() is perfectly safely done
> on the on-stack semaphore, because both the wake_up() done by __up() and
> the __down() path takes the waitqueue spinlock, so i cannot see where
> the up() touches the semaphore after the down()-ed task has been woken
> up.

It touches the _spinlock_.

Which may no longer be a spinlock, since the waiter may never have blocked 
on it at all, and may have just succeeded directly with the atomic 
decrement of the counter.

Trust me, semaphores on the stack do not work unless there is some other 
synchronization above the semaphore level (or unless we make semaphores 
much slower, ie we take the spinlock even in the fast path).

		Linus


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30 17:03   ` Linus Torvalds
@ 2002-08-30 17:05     ` Ingo Molnar
  2002-08-30 17:16       ` Ingo Molnar
  2002-08-30 17:19       ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2002-08-30 17:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


On Fri, 30 Aug 2002, Linus Torvalds wrote:

> > actually, i think the race does not exist. up() is perfectly safely done
> > on the on-stack semaphore, because both the wake_up() done by __up() and
> > the __down() path takes the waitqueue spinlock, so i cannot see where
> > the up() touches the semaphore after the down()-ed task has been woken
> > up.
> 
> It touches the _spinlock_.

it touches the waitqueue spinlock - and the __down() path [ie. the process
that gets woken up, which has the semaphore on the stack] takes the
spinlock after waking up. Ie. there's guaranteed synchronization, the
semaphore will not be 'unused' before the __down() path takes the spinlock
- ie. after the __up() path releases the spinlock. What am i missing?

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30 17:05     ` Ingo Molnar
@ 2002-08-30 17:16       ` Ingo Molnar
  2002-08-30 17:19       ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2002-08-30 17:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


we used to have the global semaphore_lock - which, if used separately from
the waitqueue lock, indeed can cause the unuse of the semaphore structure
before the spin_unlock in wakeup() completes.

but since 2.5.25 or so we use the semaphore waitqueue's spinlock for
semaphore locking - this also neatly solves the semaphore-unuse problem.  
Four architectures, sparc, ia64, arm and x86-64 still use the global
semaphore_lock, but the other 13 architectures use the waitqueue spinlock
already.

(unless there's something else i missed.)

	Ingo



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30 17:05     ` Ingo Molnar
  2002-08-30 17:16       ` Ingo Molnar
@ 2002-08-30 17:19       ` Linus Torvalds
  2002-08-30 17:28         ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2002-08-30 17:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton


On Fri, 30 Aug 2002, Ingo Molnar wrote:
> 
> it touches the waitqueue spinlock - and the __down() path [ie. the process
> that gets woken up, which has the semaphore on the stack] takes the
> spinlock after waking up. Ie. there's guaranteed synchronization, the
> semaphore will not be 'unused' before the __down() path takes the spinlock
> - ie. after the __up() path releases the spinlock. What am i missing?

So why couldn't this happen? This is what used to happen before, I don't 
see that consolidating the spinlock had any impact at all.

	CPU #0						CPU #1

	down()						up()

		lock decl (negative)
		__down()				lock incl
			spinlock()			__up()
			atomic_add_negative()
				success - break
			spinunlock();
		}					wake_up()
	return - semaphore is now invalid		spin_lock()

							BOOM!


The fact is, that as long as down() and up() avoid taking the spinlock 
_before_ they touch "count", they aren't synchronized. 

And we definitely do _not_ want to take the spinlock before we touch 
count, since that would make the fast path a lot slower.

What?

		Linus


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30 17:19       ` Linus Torvalds
@ 2002-08-30 17:28         ` Ingo Molnar
  2002-08-30 17:47           ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2002-08-30 17:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


On Fri, 30 Aug 2002, Linus Torvalds wrote:

> So why couldn't this happen? This is what used to happen before, I don't
> see that consolidating the spinlock had any impact at all.
> 
> 	CPU #0						CPU #1
> 
> 	down()						up()
> 
> 		lock decl (negative)
> 		__down()				lock incl
> 			spinlock()			__up()
> 			atomic_add_negative()
> 				success - break
> 			spinunlock();
> 		}					wake_up()
> 	return - semaphore is now invalid		spin_lock()
> 
> 							BOOM!

hm, indeed, you are right - completions are the only safe method.

i'm starting to wonder whether it's possible at all (theoretically) to
have a mutex design which has the current semaphore implementation's good
fastpath properties, but could also be used on stack.

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30 17:28         ` Ingo Molnar
@ 2002-08-30 17:47           ` Linus Torvalds
  2002-08-30 18:05             ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2002-08-30 17:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton


On Fri, 30 Aug 2002, Ingo Molnar wrote:
> 
> hm, indeed, you are right - completions are the only safe method.
> 
> i'm starting to wonder whether it's possible at all (theoretically) to
> have a mutex design which has the current semaphore implementation's good
> fastpath properties, but could also be used on stack.

That's is my point. I don't think there is - although I suspect that many 
architectures could easily do it. For all I know, there might well be some 
tricks we could play on x86 with cmpxchg8b, for example.

But I simply think that our current "completion vs semaphore" split is a
pretty good one conceptually. They may _look_ like they are largely the
same operation, but they have pretty distinct behaviour both in what the
fast path is (ie "expected behaviour": semaphores expect to succeed,
completions expect to wait), and what the release criteria are (semaphores
do not guarantee that nobody looks at them after a down() has succeeded,
while completions do).

And one thing that tends to confirm my belief that "struct completion"  
actually makes sense as a separate thing from a semaphore has nothing to 
do with these implementation details. It's the much more conceptual one: a 
lot of the cases where we converted to completions are just a lot more 
_readable_ as completions.

Using a semaphore for much of it ("wait for the IO to complete" or "wait
for the thread to be set up") counted as a clever trick, but was a fairly
obscure clever trick. While the completion thing looks "obvious".

So even if the actual implementation of semaphores and completions were
100% the same, I would actually want to _keep_ this naming and conceptual
difference, simply because it just _looks_ cleaner, in my opinion.

		Linus


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] scheduler fixes, 2.5.32-BK
  2002-08-30 17:47           ` Linus Torvalds
@ 2002-08-30 18:05             ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2002-08-30 18:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


On Fri, 30 Aug 2002, Linus Torvalds wrote:

> > i'm starting to wonder whether it's possible at all (theoretically) to
> > have a mutex design which has the current semaphore implementation's good
> > fastpath properties, but could also be used on stack.
> 
> That's is my point. I don't think there is - although I suspect that
> many architectures could easily do it. For all I know, there might well
> be some tricks we could play on x86 with cmpxchg8b, for example.

it might also make sense to let semaphores really be a function call.  
Right now our semaphore fastpath goes like:

 770:   b9 24 00 00 00          mov    $0x24,%ecx
 7d5:   f0 ff 0d 24 00 00 00    lock decl 0x24
 77b:   0f 88 57 01 00 00       js     8d8

if down() was a function call, it would be like:

 790:   b8 24 00 00 00          mov    $0x24,%eax
 795:   e8 fc ff ff ff          call   796 <dummy2+0x6>

ie. 10 bytes icache footprint, vs. 18 bytes icache footprint in the
inlined variant (17 bytes on UP).

In a typical vmlinux there are 300 down()s, so this would save more than
2K of instructions off the hotpath. [even if only half of those down()s
are truly performance critical, it's 1K off.]

[btw., gcc load %ecx in the fastpath, which looks wrong, perhaps an
optimization bug in the inline assembly?]

and in that case we could implement semaphores by letting them take the
waitqueue spinlock even in the fastpath - it's not a scalability problem
because that cacheline must be exclusive-locked anyway for the atomic op.

and by doing that we could implement more complex things like fairness, or
writers-preferred-over-readers type of semantics much more easily - and
*many* of the subtle races would simply go away, since we'd be able to
assume a frozen semaphore state.

i suspect such a semaphore implementation would take only about 2-3 cycles
more than the current one, in the fastpath.

> But I simply think that our current "completion vs semaphore" split is a
> pretty good one conceptually. [...]

agreed. I used semaphores for completion purposes for quite some time and
in quite many pieces of code, and completions are just so much more
logical in naming.

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-08-30 17:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-30  7:43 [patch] scheduler fixes, 2.5.32-BK Ingo Molnar
2002-08-30  7:55 ` Ingo Molnar
2002-08-30  8:12   ` Andrew Morton
2002-08-30  8:22     ` Ingo Molnar
2002-08-30 17:03   ` Linus Torvalds
2002-08-30 17:05     ` Ingo Molnar
2002-08-30 17:16       ` Ingo Molnar
2002-08-30 17:19       ` Linus Torvalds
2002-08-30 17:28         ` Ingo Molnar
2002-08-30 17:47           ` Linus Torvalds
2002-08-30 18:05             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).