linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.0-test6 oops futex"
@ 2003-09-29  9:23 Klaus Dittrich
  2003-09-30  8:48 ` Jamie Lokier
  0 siblings, 1 reply; 8+ messages in thread
From: Klaus Dittrich @ 2003-09-29  9:23 UTC (permalink / raw)
  To: linux mailing-list

System SMP, Dual-Xeon, glibc-cvs with nptl-0.59, gcc-3.3.1
MozillaFirebird-0.61

With linux-2.6.0-test6 I got several times this oops.
I had no problems with linux-2.6.0-test5 before. (absolute stable for me)

Sep 29 11:06:36 xeon2 kernel: Unable to handle kernel paging request at virtual address 00100104
Sep 29 11:06:36 xeon2 kernel:  printing eip:
Sep 29 11:06:36 xeon2 kernel: c013522d
Sep 29 11:06:36 xeon2 kernel: *pde = 00000000
Sep 29 11:06:36 xeon2 kernel: Oops: 0002 [#1]
Sep 29 11:06:36 xeon2 kernel: CPU:    3
Sep 29 11:06:36 xeon2 kernel: EIP:    0060:[<c013522d>]    Not tainted
Sep 29 11:06:36 xeon2 kernel: EFLAGS: 00010246
Sep 29 11:06:36 xeon2 kernel: EIP is at futex_wake+0xe4/0x140
Sep 29 11:06:36 xeon2 kernel: eax: 00200200   ebx: f6e4deec   ecx: 00000000   edx: 00100100
Sep 29 11:06:36 xeon2 kernel: esi: c054cbc0   edi: 00000000   ebp: c054cbc0   esp: f626df3c
Sep 29 11:06:36 xeon2 kernel: ds: 007b   es: 007b   ss: 0068
Sep 29 11:06:36 xeon2 kernel: Process MozillaFirebird (pid: 967, threadinfo=f626c000 task=f6cdc080)
Sep 29 11:06:36 xeon2 kernel: Stack: f626df4c f626df4c 00000fe8 c054cbbc 0819c000 f7472c80 00000fe8 e75d6008
Sep 29 11:06:36 xeon2 kernel:        0819cfe8 00000001 0819cfe8 00000001 c0135c77 0819cfe8 00000001 00000000
Sep 29 11:06:36 xeon2 kernel:        00000001 c0135d85 0819cfe8 00000001 00000001 7fffffff 0819cfe8 00000000
Sep 29 11:06:36 xeon2 kernel: Call Trace:
Sep 29 11:06:36 xeon2 kernel:  [<c0135c77>] do_futex+0x7e/0x80
Sep 29 11:06:36 xeon2 kernel:  [<c0135d85>] sys_futex+0x10c/0x124
Sep 29 11:06:36 xeon2 kernel:  [<c0124d8b>] sys_gettimeofday+0x53/0xa8
Sep 29 11:06:36 xeon2 kernel:  [<c010aba9>] sysenter_past_esp+0x52/0x71
Sep 29 11:06:36 xeon2 kernel:
Sep 29 11:06:36 xeon2 kernel: Code: 89 42 04 89 10 89 5b 04 8d 43 08 89 1b ba 03 00 00 00 e8 3b
Sep 29 11:06:36 xeon2 kernel:  <6>note: MozillaFirebird[967] exited with preempt_count 1

-- 
Klaus 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test6 oops futex"
  2003-09-29  9:23 2.6.0-test6 oops futex" Klaus Dittrich
@ 2003-09-30  8:48 ` Jamie Lokier
  2003-09-30 20:53   ` Hugh Dickins
  0 siblings, 1 reply; 8+ messages in thread
From: Jamie Lokier @ 2003-09-30  8:48 UTC (permalink / raw)
  To: Klaus Dittrich
  Cc: linux mailing-list, akpm, Hu, Boris, Hugh Dickins,
	Ulrich Drepper, Jamie Lokier, Rusty Russell

Klaus Dittrich wrote:
> With linux-2.6.0-test6 I got several times this oops.
> I had no problems with linux-2.6.0-test5 before. (absolute stable for me)

This oops is in the first store in the __list_del() of
list_del_init(), inside the loop of futex_wake.

The "next" link of the current futex in the loop is corrupt.  More
precisely, the iterator list node contains { LIST_POISON1, LIST_POISON2 }
which means it must have been earlier deleted using list_del().

The only call to list_del() in the futex code is in unqueue_me(),
where it is protected against the same spinlock as all the other
queuing and dequeuing operations.

The one possibility I can think of is that the hash function isn't
consistent, either because of a hashing bug or because the hash
function argument is being changed.  Then different locks would be
used between one place and another, permitting the lists to become
corrupt.  That would explain the oops appearing only in test6, which
is when the new hashing scheme and split locks appeared.

I don't see any obvious flaw.  Rusty, Hugh, anyone else see it?

Ulrich, could you try a favourite futex stress test on vanilla
2.6.0-test6, to see if you can reliably trigger this oops?

Thanks,
-- Jamie

> Sep 29 11:06:36 xeon2 kernel: Unable to handle kernel paging request at virtual address 00100104
> Sep 29 11:06:36 xeon2 kernel:  printing eip:
> Sep 29 11:06:36 xeon2 kernel: c013522d
> Sep 29 11:06:36 xeon2 kernel: *pde = 00000000
> Sep 29 11:06:36 xeon2 kernel: Oops: 0002 [#1]
> Sep 29 11:06:36 xeon2 kernel: CPU:    3
> Sep 29 11:06:36 xeon2 kernel: EIP:    0060:[<c013522d>]    Not tainted
> Sep 29 11:06:36 xeon2 kernel: EFLAGS: 00010246
> Sep 29 11:06:36 xeon2 kernel: EIP is at futex_wake+0xe4/0x140
> Sep 29 11:06:36 xeon2 kernel: eax: 00200200   ebx: f6e4deec   ecx: 00000000   edx: 00100100
> Sep 29 11:06:36 xeon2 kernel: esi: c054cbc0   edi: 00000000   ebp: c054cbc0   esp: f626df3c
> Sep 29 11:06:36 xeon2 kernel: ds: 007b   es: 007b   ss: 0068
> Sep 29 11:06:36 xeon2 kernel: Process MozillaFirebird (pid: 967, threadinfo=f626c000 task=f6cdc080)
> Sep 29 11:06:36 xeon2 kernel: Stack: f626df4c f626df4c 00000fe8 c054cbbc 0819c000 f7472c80 00000fe8 e75d6008
> Sep 29 11:06:36 xeon2 kernel:        0819cfe8 00000001 0819cfe8 00000001 c0135c77 0819cfe8 00000001 00000000
> Sep 29 11:06:36 xeon2 kernel:        00000001 c0135d85 0819cfe8 00000001 00000001 7fffffff 0819cfe8 00000000
> Sep 29 11:06:36 xeon2 kernel: Call Trace:
> Sep 29 11:06:36 xeon2 kernel:  [<c0135c77>] do_futex+0x7e/0x80
> Sep 29 11:06:36 xeon2 kernel:  [<c0135d85>] sys_futex+0x10c/0x124
> Sep 29 11:06:36 xeon2 kernel:  [<c0124d8b>] sys_gettimeofday+0x53/0xa8
> Sep 29 11:06:36 xeon2 kernel:  [<c010aba9>] sysenter_past_esp+0x52/0x71
> Sep 29 11:06:36 xeon2 kernel:
> Sep 29 11:06:36 xeon2 kernel: Code: 89 42 04 89 10 89 5b 04 8d 43 08 89 1b ba 03 00 00 00 e8 3b
> Sep 29 11:06:36 xeon2 kernel:  <6>note: MozillaFirebird[967] exited with preempt_count 1
> 
> -- 
> Klaus 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test6 oops futex"
  2003-09-30  8:48 ` Jamie Lokier
@ 2003-09-30 20:53   ` Hugh Dickins
  2003-09-30 21:48     ` Ulrich Drepper
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Hugh Dickins @ 2003-09-30 20:53 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris,
	Ulrich Drepper, Rusty Russell

On Tue, 30 Sep 2003, Jamie Lokier wrote:
> Klaus Dittrich wrote:
> > With linux-2.6.0-test6 I got several times this oops.
> > I had no problems with linux-2.6.0-test5 before. (absolute stable for me)
> 
> This oops is in the first store in the __list_del() of
> list_del_init(), inside the loop of futex_wake.
> 
> The "next" link of the current futex in the loop is corrupt.  More
> precisely, the iterator list node contains { LIST_POISON1, LIST_POISON2 }
> which means it must have been earlier deleted using list_del().
> 
> The only call to list_del() in the futex code is in unqueue_me(),
> where it is protected against the same spinlock as all the other
> queuing and dequeuing operations.
> 
> The one possibility I can think of is that the hash function isn't
> consistent, either because of a hashing bug or because the hash
> function argument is being changed.  Then different locks would be
> used between one place and another, permitting the lists to become
> corrupt.  That would explain the oops appearing only in test6, which
> is when the new hashing scheme and split locks appeared.
> 
> I don't see any obvious flaw.  Rusty, Hugh, anyone else see it?

Excellent (hah, not again! what a cosy club we share) analysis.

I can't quite make the flaw I do see, fit the facts Klaus sees.

Consider unqueue_me on one cpu racing against this->key = key2 in
futex_requeue on another.  unqueue_me finds the right bh to lock
from q->key, but futex_requeue is shifting q->key beneath it.
Although futex_requeue holds both the relevant spinlocks, during
reassignment of key an irrelevant hashqueue may get calculated
and locked instead.

Whether or not this is relevant to Klaus' oops, it does need to be
fixed, doesn't it?  And your new key_refs version even more exposed?
I'm giving up on it, but expect you or Rusty will be ingenious enough
to rescue the split locks somehow.

(Oh, while you're there, be nice to fix nr_requeue 0.)

Hugh


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test6 oops futex"
  2003-09-30 20:53   ` Hugh Dickins
@ 2003-09-30 21:48     ` Ulrich Drepper
  2003-09-30 23:44     ` Rusty Russell
  2003-10-01  1:01     ` Jamie Lokier
  2 siblings, 0 replies; 8+ messages in thread
From: Ulrich Drepper @ 2003-09-30 21:48 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Jamie Lokier, linux mailing-list, Hu, Boris, Rusty Russell

Hugh Dickins wrote:

> (Oh, while you're there, be nice to fix nr_requeue 0.)

Not necessary.  FUTEX_REQUEUE with nr_requeue == 0 is the same as
FUTEX_WAKE.  When we wrote the code we decided that it is better to
optimize the requeue as much as possible and leave worrying about
invalid parameters to the user.  Not that the code will not cause any
crashes or so.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test6 oops futex"
  2003-09-30 20:53   ` Hugh Dickins
  2003-09-30 21:48     ` Ulrich Drepper
@ 2003-09-30 23:44     ` Rusty Russell
  2003-10-01  6:35       ` Jamie Lokier
  2003-10-01  1:01     ` Jamie Lokier
  2 siblings, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2003-09-30 23:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jamie Lokier, Klaus Dittrich, linux mailing-list, Andrew Morton,
	Hu, Boris, Ulrich Drepper

In message <Pine.LNX.4.44.0309302141220.4388-100000@localhost.localdomain> you write:
> Consider unqueue_me on one cpu racing against this->key = key2 in
> futex_requeue on another.  unqueue_me finds the right bh to lock
> from q->key, but futex_requeue is shifting q->key beneath it.
> Although futex_requeue holds both the relevant spinlocks, during
> reassignment of key an irrelevant hashqueue may get calculated
> and locked instead.

Yes, the lack of global lock does make this problematic.  You end up
holding the lock on one queue and removing from another.  I think the
recycling scenario a more likely cause in this case, but this too must
be fixed.

The clearest fix (other than going back to one big lock) is to have a
lock per futex to protect its contents (vs. the hash bucket lock which
protects the hash chain itself).  But the following race-check is
sufficient (there are only two places where we call futex_hash on a
"live" futex):

Name: Futex Requeue Race
Author: Hugh Dickins, Typing by Rusty Russell
Status: Experimental

D: Consider unqueue_me on one cpu racing against this->key = key2 in
D: futex_requeue on another.  unqueue_me finds the right bh to lock
D: from q->key, but futex_requeue is shifting q->key beneath it.
D: Although futex_requeue holds both the relevant spinlocks, during
D: reassignment of key an irrelevant hashqueue may get calculated
D: and locked instead.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .26056-linux-2.6.0-test6-bk2/kernel/futex.c .26056-linux-2.6.0-test6-bk2.updated/kernel/futex.c
--- .26056-linux-2.6.0-test6-bk2/kernel/futex.c	2003-09-29 10:26:06.000000000 +1000
+++ .26056-linux-2.6.0-test6-bk2.updated/kernel/futex.c	2003-10-01 09:40:58.000000000 +1000
@@ -342,10 +342,19 @@ static inline void queue_me(struct futex
 /* Return 1 if we were still queued (ie. 0 means we were woken) */
 static inline int unqueue_me(struct futex_q *q)
 {
-	struct futex_hash_bucket *bh = hash_futex(&q->key);
+	struct futex_hash_bucket *bh;
+	union futex_key key;
 	int ret = 0;
 
+again:
+	key = q->key;
+	bh = hash_futex(&key);
 	spin_lock(&bh->lock);
+	if (unlikely(!match_futex(&key, q->key)) {
+		/* Race against futex_requeue */
+		spin_unlock(&bh_lock);
+		goto again;
+	}
 	if (!list_empty(&q->list)) {
 		list_del(&q->list);
 		ret = 1;
@@ -455,11 +464,20 @@ static unsigned int futex_poll(struct fi
 			       struct poll_table_struct *wait)
 {
 	struct futex_q *q = filp->private_data;
-	struct futex_hash_bucket *bh = hash_futex(&q->key);
+	union futex_key key;
+	struct futex_hash_bucket *bh;
 	int ret = 0;
 
 	poll_wait(filp, &q->waiters, wait);
+again:
+	key = q->key;
+	bh = hash_futex(&key);
 	spin_lock(&bh->lock);
+	if (unlikely(!match_futex(&key, q->key)) {
+		/* Race against futex_requeue */
+		spin_unlock(&bh_lock);
+		goto again;
+	}
 	if (list_empty(&q->list))
 		ret = POLLIN | POLLRDNORM;
 	spin_unlock(&bh->lock);

--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test6 oops futex"
  2003-09-30 20:53   ` Hugh Dickins
  2003-09-30 21:48     ` Ulrich Drepper
  2003-09-30 23:44     ` Rusty Russell
@ 2003-10-01  1:01     ` Jamie Lokier
  2003-10-01  2:41       ` Jamie Lokier
  2 siblings, 1 reply; 8+ messages in thread
From: Jamie Lokier @ 2003-10-01  1:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris,
	Ulrich Drepper, Rusty Russell

Hugh Dickins wrote:
> I can't quite make the flaw I do see, fit the facts Klaus sees.
> 
> Consider unqueue_me on one cpu racing against this->key = key2 in
> futex_requeue on another.  unqueue_me finds the right bh to lock
> from q->key, but futex_requeue is shifting q->key beneath it.
> Although futex_requeue holds both the relevant spinlocks, during
> reassignment of key an irrelevant hashqueue may get calculated
> and locked instead.
> 
> Whether or not this is relevant to Klaus' oops, it does need to be
> fixed, doesn't it?  And your new key_refs version even more exposed?
> I'm giving up on it, but expect you or Rusty will be ingenious enough
> to rescue the split locks somehow.

Superb analysis!

It does indeed explain Klaus' oops.  unqueue_me() and futex_poll()
hash using &q->key outside any spinlocks, so if there's a concurrent
futex_requeue() the hash can be corrupt.  Result: wrong lock taken;
list manipulation races in test6, not in test5.  (Provided Klaus has
SMP or pre-empt).

Solutions are to call hash_futex() inside the lock, or store the
hash result inside futex_q, and read it inside the lock.

You're right, the key_refs version is more exposed, but because of the
change to futux_requeue logic rather than anything to do with key_refs.

> (Oh, while you're there, be nice to fix nr_requeue 0.)

Logically, zero num in FUTEX_QUEUE shouldn't wake anything either, but
it's understood that at least one is always woken.  It's a bit of a
wart I agree, but I'll go along with Ulrich's view.

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test6 oops futex"
  2003-10-01  1:01     ` Jamie Lokier
@ 2003-10-01  2:41       ` Jamie Lokier
  0 siblings, 0 replies; 8+ messages in thread
From: Jamie Lokier @ 2003-10-01  2:41 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Klaus Dittrich, linux mailing-list, Andrew Morton, Hu, Boris,
	Ulrich Drepper, Rusty Russell

Jamie Lokier wrote:
> Solutions are to call hash_futex() inside the lock, or store the
> hash result inside futex_q, and read it inside the lock.

What _am_ I talking about.  Can't call hash_futex() inside the lock,
it's needed to choose the lock ;)

Keeping the split locks is going to be tricky - all due to futex_requeue.

-- Jamie

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test6 oops futex"
  2003-09-30 23:44     ` Rusty Russell
@ 2003-10-01  6:35       ` Jamie Lokier
  0 siblings, 0 replies; 8+ messages in thread
From: Jamie Lokier @ 2003-10-01  6:35 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Hugh Dickins, Klaus Dittrich, linux mailing-list, Andrew Morton,
	Hu, Boris, Ulrich Drepper

Rusty Russell wrote:
> +again:
> +	key = q->key;
> +	bh = hash_futex(&key);
>  	spin_lock(&bh->lock);
> +	if (unlikely(!match_futex(&key, q->key)) {
> +		/* Race against futex_requeue */
> +		spin_unlock(&bh_lock);
> +		goto again;
> +	}

Bug:

	1. key = q->key copies bad key, while it is being changed.

	2. That makes the spin_lock() irrelevant.

	3. match_futex() compares word by word against another bad
	   key, while it is being changed again (by a second futex_requeue).

	4. It can match even though the key is wrong.

For example, say the first requeue changes q->key from (1,2) to (3,4).
key = q->key could read (1,4).

Say the second requeue changes q->key from (3,4) to (1,5).
match_futex() could read (1,4) and pass the test, even though (1,4)
is never a valid key.

-- Jamie

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-10-01  6:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-29  9:23 2.6.0-test6 oops futex" Klaus Dittrich
2003-09-30  8:48 ` Jamie Lokier
2003-09-30 20:53   ` Hugh Dickins
2003-09-30 21:48     ` Ulrich Drepper
2003-09-30 23:44     ` Rusty Russell
2003-10-01  6:35       ` Jamie Lokier
2003-10-01  1:01     ` Jamie Lokier
2003-10-01  2:41       ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).