* today's futex changes @ 2003-09-05 20:24 Ulrich Drepper 2003-09-05 22:24 ` Stephen Hemminger 2003-09-06 16:28 ` [PATCH] " Hugh Dickins 0 siblings, 2 replies; 23+ messages in thread From: Ulrich Drepper @ 2003-09-05 20:24 UTC (permalink / raw) To: Jamie Lokier, Linus Torvalds, Andrew Morton, Linux Kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ... broke NPTL. Tests which worked with previous kernels fail now. One test eventually succeeded, but the process somehow got stuck for about 30-40 seconds. Then it finished. Running strace showed a call to clone() as the last operation but there were other threads running at that time. I tried to login (via ssh) into the system when the process hang and this, too, was delayed and succeeded immediately when the strace'd process continued. I haven't looked at the changes made and wouldn't expect to understand all the details either. And I can understand if you don't want to run the glibc test suite. What I can offer are statically linked versions of the tests. One is here: http://people.redhat.com/drepper/tst-cond2.bz2 358572ec100de9b27833d6c4ee5ecdb5 tst-cond2 Let me know if you need more help. - -- - --------------. ,-. 444 Castro Street Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA Red Hat `--' drepper at redhat.com `--------------------------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/WPD32ijCOnn/RHQRApLUAJ9SiIelIOnUA/pOmol04AaM+hMvaQCgoA86 VNLbS7nFXoVggjDtWAJxZ5g= =tNkI -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: today's futex changes 2003-09-05 20:24 today's futex changes Ulrich Drepper @ 2003-09-05 22:24 ` Stephen Hemminger 2003-09-06 16:28 ` [PATCH] " Hugh Dickins 1 sibling, 0 replies; 23+ messages in thread From: Stephen Hemminger @ 2003-09-05 22:24 UTC (permalink / raw) To: Ulrich Drepper; +Cc: linux-kernel On Fri, 05 Sep 2003 13:24:23 -0700 Ulrich Drepper <drepper@redhat.com> wrote: > ... broke NPTL. This might explain something odd I am seeing with today's kernel. On login, Gnome starts and gets most of the way running, but the icons and screen background are missing. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Re: today's futex changes 2003-09-05 20:24 today's futex changes Ulrich Drepper 2003-09-05 22:24 ` Stephen Hemminger @ 2003-09-06 16:28 ` Hugh Dickins 2003-09-06 17:32 ` Ulrich Drepper ` (3 more replies) 1 sibling, 4 replies; 23+ messages in thread From: Hugh Dickins @ 2003-09-06 16:28 UTC (permalink / raw) To: Linus Torvalds Cc: Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, Rusty Russell, Linux Kernel On Fri, 5 Sep 2003, Ulrich Drepper wrote: > ... broke NPTL. Tests which worked with previous kernels fail now. One > test eventually succeeded, but the process somehow got stuck for about > 30-40 seconds. Then it finished. Running strace showed a call to > clone() as the last operation but there were other threads running at > that time. >.... > What I can offer are statically linked versions of the tests. > One is here: http://people.redhat.com/drepper/tst-cond2.bz2 Very helpful, thank you: it showed two bugs, one new and one old. Does the patch below work for you, Ulrich? The new bug is that "offset" has been declared as an alternative in the union, instead of as an element in the structures comprising it, effectively eliminating it from the key: keys match which should not. The old bug is that if futex_requeue were called with identical key1 and key2 (sensible? tended to happen given the first bug), it was liable to loop for a long time holding futex_lock: guard against that, still respecting the semantics of futex_requeue. While here, please let's also fix the get_futex_key VM_NONLINEAR case, which was returning the 1 from get_user_pages, taken as an error by its callers. And save a few bytes and improve debuggability by uninlining the top-level futex_wake, futex_requeue, futex_wait. Hugh --- 2.6.0-test4-bk8/kernel/futex.c Fri Sep 5 21:04:52 2003 +++ linux/kernel/futex.c Sat Sep 6 16:29:32 2003 @@ -49,16 +49,18 @@ struct { unsigned long pgoff; struct inode *inode; + int offset; } shared; struct { unsigned long uaddr; struct mm_struct *mm; + int offset; } private; struct { unsigned long word; void *ptr; + int offset; } both; - int offset; }; /* @@ -91,7 +93,7 @@ { return &futex_queues[hash_long(key->both.word + (unsigned long) key->both.ptr - + key->offset, FUTEX_HASHBITS)]; + + key->both.offset, FUTEX_HASHBITS)]; } /* @@ -101,7 +103,7 @@ { return (key1->both.word == key2->both.word && key1->both.ptr == key2->both.ptr - && key1->offset == key2->offset); + && key1->both.offset == key2->both.offset); } /* @@ -127,10 +129,10 @@ /* * The futex address must be "naturally" aligned. */ - key->offset = uaddr % PAGE_SIZE; - if (unlikely((key->offset % sizeof(u32)) != 0)) + key->both.offset = uaddr % PAGE_SIZE; + if (unlikely((key->both.offset % sizeof(u32)) != 0)) return -EINVAL; - uaddr -= key->offset; + uaddr -= key->both.offset; /* * The futex is hashed differently depending on whether @@ -199,6 +201,7 @@ key->shared.pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); put_page(page); + return 0; } return err; } @@ -208,7 +211,7 @@ * Wake up all waiters hashed on the physical page that is mapped * to this virtual address: */ -static inline int futex_wake(unsigned long uaddr, int num) +static int futex_wake(unsigned long uaddr, int num) { struct list_head *i, *next, *head; union futex_key key; @@ -247,7 +250,7 @@ * Requeue all waiters hashed on one physical page to another * physical page. */ -static inline int futex_requeue(unsigned long uaddr1, unsigned long uaddr2, +static int futex_requeue(unsigned long uaddr1, unsigned long uaddr2, int nr_wake, int nr_requeue) { struct list_head *i, *next, *head1, *head2; @@ -282,6 +285,9 @@ this->key = key2; if (ret - nr_wake >= nr_requeue) break; + /* Make sure to stop if key1 == key2 */ + if (head1 == head2 && head1 != next) + head1 = i; } } } @@ -320,7 +326,7 @@ return ret; } -static inline int futex_wait(unsigned long uaddr, int val, unsigned long time) +static int futex_wait(unsigned long uaddr, int val, unsigned long time) { DECLARE_WAITQUEUE(wait, current); int ret, curval; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-06 16:28 ` [PATCH] " Hugh Dickins @ 2003-09-06 17:32 ` Ulrich Drepper 2003-09-06 17:44 ` Jamie Lokier 2003-09-06 17:46 ` Jamie Lokier ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Ulrich Drepper @ 2003-09-06 17:32 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Jamie Lokier, Andrew Morton, Stephen Hemminger, Rusty Russell, Linux Kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hugh Dickins wrote: > Does the patch below work for you, Ulrich? Indeed it does. This was so far only on a UP HT machine. I'll hopefully later on can run it on a bigger SMP machine. Good work. - -- - --------------. ,-. 444 Castro Street Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA Red Hat `--' drepper at redhat.com `--------------------------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/WhoR2ijCOnn/RHQRAubxAJ9a4UeDkIQ61NBDIkwWWT8myvbsxACdEsnl xyLcJXc57avwpVltynzRcQE= =X+Xh -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-06 17:32 ` Ulrich Drepper @ 2003-09-06 17:44 ` Jamie Lokier 0 siblings, 0 replies; 23+ messages in thread From: Jamie Lokier @ 2003-09-06 17:44 UTC (permalink / raw) To: Ulrich Drepper Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Stephen Hemminger, Rusty Russell, Linux Kernel Ulrich Drepper wrote: > > Does the patch below work for you, Ulrich? > > Indeed it does. This was so far only on a UP HT machine. I'll > hopefully later on can run it on a bigger SMP machine. Good work. The bugs fixed in Hugh's patch explain all the symptoms I saw with your test program. -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-06 16:28 ` [PATCH] " Hugh Dickins 2003-09-06 17:32 ` Ulrich Drepper @ 2003-09-06 17:46 ` Jamie Lokier 2003-09-06 18:21 ` Hugh Dickins 2003-09-08 6:45 ` Rusty Russell 2003-09-08 17:04 ` Stephen Hemminger 3 siblings, 1 reply; 23+ messages in thread From: Jamie Lokier @ 2003-09-06 17:46 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Ulrich Drepper, Andrew Morton, Stephen Hemminger, Rusty Russell, Linux Kernel Hugh Dickins wrote: > The new bug is that "offset" has been declared as an alternative in > the union, instead of as an element in the structures comprising it, > effectively eliminating it from the key: keys match which should not. Auch! (Off to the post office for a pack of brown paper bags) (And a pillow) > The old bug is that if futex_requeue were called with identical > key1 and key2 (sensible? tended to happen given the first bug), > it was liable to loop for a long time holding futex_lock: guard > against that, still respecting the semantics of futex_requeue. That explains the hang I just saw in one run of Ulrich's test... And it explain's why it's not repeatable. With key1 == key2, you get to move nr_requeue waiters to the end of the waiting list. I can't think of a good use for it, but it does do something visible. > + /* Make sure to stop if key1 == key2 */ > + if (head1 == head2 && head1 != next) > + head1 = i; Subtle, when nr_requeue > 1. That's a disturbingly nice trick :) > While here, please let's also fix the get_futex_key VM_NONLINEAR > case, which was returning the 1 from get_user_pages, taken as an > error by its callers. Yes. I just spotted it too. > And save a few bytes and improve debuggability > by uninlining the top-level futex_wake, futex_requeue, futex_wait. Fair point about about debuggability, but does it really save bytes to uninline these called-once functions? -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-06 17:46 ` Jamie Lokier @ 2003-09-06 18:21 ` Hugh Dickins 0 siblings, 0 replies; 23+ messages in thread From: Hugh Dickins @ 2003-09-06 18:21 UTC (permalink / raw) To: Jamie Lokier Cc: Linus Torvalds, Ulrich Drepper, Andrew Morton, Stephen Hemminger, Rusty Russell, Linux Kernel On Sat, 6 Sep 2003, Jamie Lokier wrote: > > And save a few bytes and improve debuggability > > by uninlining the top-level futex_wake, futex_requeue, futex_wait. > > Fair point about about debuggability, but does it really save bytes to > uninline these called-once functions? It saved about 50 with whatever 2.96 I'm using on this machine, but I didn't try other gccs; it's probably added about that back in kallsyms... Hugh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-06 16:28 ` [PATCH] " Hugh Dickins 2003-09-06 17:32 ` Ulrich Drepper 2003-09-06 17:46 ` Jamie Lokier @ 2003-09-08 6:45 ` Rusty Russell 2003-09-08 17:33 ` Hugh Dickins ` (2 more replies) 2003-09-08 17:04 ` Stephen Hemminger 3 siblings, 3 replies; 23+ messages in thread From: Rusty Russell @ 2003-09-08 6:45 UTC (permalink / raw) To: Hugh Dickins Cc: Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, torvalds, Linux Kernel In message <Pine.LNX.4.44.0309061723160.1470-100000@localhost.localdomain> you write: > While here, please let's also fix the get_futex_key VM_NONLINEAR > case, which was returning the 1 from get_user_pages, taken as an > error by its callers. And save a few bytes and improve debuggability > by uninlining the top-level futex_wake, futex_requeue, futex_wait. OK, I've updated my patch on top of this. Mainly cosmetic, please review. Name: Minor Tweaks To Jamie Lokier's Futex Patch Author: Rusty Russell Status: Booted on 2.6.0-test4-bk9 Depends: Misc/futex-hugh.patch.gz D: Minor changes to Jamie's excellent futex patch. D: 1) Remove obsolete comment above hash array decl. D: 2) Semantics of futex on read-only pages unclear: require write perm. D: 3) Clarify comment about TASK_INTERRUPTIBLE. D: 4) Andrew Morton says spurious wakeup is a bug. Catch it. D: 5) Use Jenkins hash. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .24731-linux-2.6.0-test4-bk9/kernel/futex.c .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c --- .24731-linux-2.6.0-test4-bk9/kernel/futex.c 2003-09-08 10:44:26.000000000 +1000 +++ .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c 2003-09-08 12:01:23.000000000 +1000 @@ -33,7 +33,7 @@ #include <linux/poll.h> #include <linux/fs.h> #include <linux/file.h> -#include <linux/hash.h> +#include <linux/jhash.h> #include <linux/init.h> #include <linux/futex.h> #include <linux/mount.h> @@ -44,6 +44,7 @@ /* * Futexes are matched on equal values of this key. * The key type depends on whether it's a shared or private mapping. + * Don't rearrange without looking at hash_futex(). */ union futex_key { struct { @@ -79,7 +80,6 @@ struct futex_q { struct file *filp; }; -/* The key for the hash is the address + index + offset within page */ static struct list_head futex_queues[1<<FUTEX_HASHBITS]; static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED; @@ -89,11 +89,12 @@ static struct vfsmount *futex_mnt; /* * We hash on the keys returned from get_futex_key (see below). */ -static inline struct list_head *hash_futex(union futex_key *key) +static inline struct list_head *hash_futex(const union futex_key *key) { - return &futex_queues[hash_long(key->both.word - + (unsigned long) key->both.ptr - + key->both.offset, FUTEX_HASHBITS)]; + u32 hash = jhash2((u32*)&key->both.word, + (sizeof(key->both.word)+sizeof(key->both.ptr))/4, + key->both.offset); + return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)]; } /* @@ -145,17 +146,13 @@ static int get_futex_key(unsigned long u /* * Permissions. */ - if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ)) - return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES; + if (unlikely(vma->vm_flags & VM_IO)) + return -EPERM; + if (unlikely(vma->vm_flags & (VM_READ|VM_WRITE)) != (VM_READ|VM_WRITE)) + return -EACCES; /* * Private mappings are handled in a simple way. - * - * NOTE: When userspace waits on a MAP_SHARED mapping, even if - * it's a read-only handle, it's expected that futexes attach to - * the object not the particular process. Therefore we use - * VM_MAYSHARE here, not VM_SHARED which is restricted to shared - * mappings of _writable_ handles. */ if (likely(!(vma->vm_flags & VM_MAYSHARE))) { key->private.mm = mm; @@ -333,7 +330,6 @@ static int futex_wait(unsigned long uadd union futex_key key; struct futex_q q; - try_again: init_waitqueue_head(&q.waiters); down_read(¤t->mm->mmap_sem); @@ -367,10 +363,10 @@ static int futex_wait(unsigned long uadd /* * There might have been scheduling since the queue_me(), as we * cannot hold a spinlock across the get_user() in case it - * faults. So we cannot just set TASK_INTERRUPTIBLE state when + * faults, and we cannot just set TASK_INTERRUPTIBLE state when * queueing ourselves into the futex hash. This code thus has to - * rely on the futex_wake() code doing a wakeup after removing - * the waiter from the list. + * rely on the futex_wake() code removing us from hash when it + * wakes us up. */ add_wait_queue(&q.waiters, &wait); spin_lock(&futex_lock); @@ -394,26 +390,19 @@ static int futex_wait(unsigned long uadd * we are the only user of it. */ - /* - * Were we woken or interrupted for a valid reason? - */ - ret = unqueue_me(&q); - if (ret == 0) + /* If we were woken (and unqueued), we succeeded, whatever. */ + if (!unqueue_me(&q)) return 0; if (time == 0) return -ETIMEDOUT; if (signal_pending(current)) return -EINTR; - /* - * No, it was a spurious wakeup. Try again. Should never happen. :) - */ - goto try_again; + /* A spurious wakeup. Should never happen. */ + BUG(); out_unqueue: - /* - * Were we unqueued anyway? - */ + /* If we were woken (and unqueued), we succeeded, whatever. */ if (!unqueue_me(&q)) ret = 0; out_release_sem: -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 6:45 ` Rusty Russell @ 2003-09-08 17:33 ` Hugh Dickins 2003-09-08 17:51 ` Hugh Dickins 2003-09-09 1:37 ` Rusty Russell 2003-09-08 17:51 ` Jamie Lokier 2003-09-08 19:02 ` Andrew Morton 2 siblings, 2 replies; 23+ messages in thread From: Hugh Dickins @ 2003-09-08 17:33 UTC (permalink / raw) To: Rusty Russell Cc: Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, Linus Torvalds, Linux Kernel On Mon, 8 Sep 2003, Rusty Russell wrote: > OK, I've updated my patch on top of this. Mainly cosmetic, please > review. > > Name: Minor Tweaks To Jamie Lokier's Futex Patch > Author: Rusty Russell > Status: Booted on 2.6.0-test4-bk9 > Depends: Misc/futex-hugh.patch.gz > > D: Minor changes to Jamie's excellent futex patch. > D: 1) Remove obsolete comment above hash array decl. > D: 2) Semantics of futex on read-only pages unclear: require write perm. > D: 3) Clarify comment about TASK_INTERRUPTIBLE. > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it. > D: 5) Use Jenkins hash. Most of it (the futex_wait tweaks) looked fine to me - though I look forward to the first report of that BUG(). Part 2, requiring VM_WRITE and removing the comment on VM_MAYSHARE, seems a regression to me. Perhaps I misinterpreted Linus' action in taking Jamie's patch: I took that to mean he relented a little on his hardline position about VM_SHARED, and now accepts that in this context VM_MAYSHARE is more appropriate (easier to document). I know I argued that readonly futices are pointless, but I thought Jamie gave a good picture of how a readonly view could still be used. I'd rather that part were a separate patch, so Linus can merge or not as he wishes. In part 5, the Jenkins hashing, I was puzzled by the "/4" in + (sizeof(key->both.word)+sizeof(key->both.ptr))/4, both.ptr would be a multiple of 32 (? seems to be that way on PIII and P4, though I gave up trying to work out quite how slab.c aligns), and both.word would be a multiple of 1 in the shared.pgoff case, or a multiple of PAGE_SIZE in the private.uaddr case (private.uaddr = uaddr >> PAGE_SHIFT might make more sense). Whereas both.offset would be a multiple of 4. I don't suppose Mr Jenkins will mind, but I did find the "/4" puzzling in that context. Hugh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 17:33 ` Hugh Dickins @ 2003-09-08 17:51 ` Hugh Dickins 2003-09-09 1:37 ` Rusty Russell 1 sibling, 0 replies; 23+ messages in thread From: Hugh Dickins @ 2003-09-08 17:51 UTC (permalink / raw) To: Rusty Russell Cc: Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, Linus Torvalds, Linux Kernel On Mon, 8 Sep 2003, Hugh Dickins wrote: > > In part 5, the Jenkins hashing, I was puzzled by the "/4" in > + (sizeof(key->both.word)+sizeof(key->both.ptr))/4, > [much blabbering snipped] Sorry, I've just read that and noticed "sizeof": rosy dawn of enlightenment. Hugh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 17:33 ` Hugh Dickins 2003-09-08 17:51 ` Hugh Dickins @ 2003-09-09 1:37 ` Rusty Russell 1 sibling, 0 replies; 23+ messages in thread From: Rusty Russell @ 2003-09-09 1:37 UTC (permalink / raw) To: Hugh Dickins Cc: Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, Linus Torvalds, Linux Kernel In message <Pine.LNX.4.44.0309081746180.7008-100000@localhost.localdomain> you write: > Most of it (the futex_wait tweaks) looked fine to me - > though I look forward to the first report of that BUG(). Me too. But at least we'll *get* a report if someone does spurious wakeups. > Part 2, requiring VM_WRITE and removing the comment on VM_MAYSHARE, > seems a regression to me. Perhaps I misinterpreted Linus' action in > taking Jamie's patch: I took that to mean he relented a little on his > hardline position about VM_SHARED, and now accepts that in this context > VM_MAYSHARE is more appropriate (easier to document). I know I argued > that readonly futices are pointless, but I thought Jamie gave a good > picture of how a readonly view could still be used. I'd rather that > part were a separate patch, so Linus can merge or not as he wishes. Sure, I jumboed them together for feedback from you guys. All users I am currently aware of won't care either way. Current test-5 is: Sees Changes Sees FUTEX_WAKE from another MAP_SHARED from another MAP_SHARED MAP_PRIVATE read-only: Y N MAP_PRIVATE read-write: Y* N MAP_SHARED read-only: Y Y MAP_SHARED read-write: Y Y [* Only until page is written to, after which COW splits them ] Previously, the FUTEX_WAKE column was identical to the first column. Now, IMHO, this new semantic is more sensible, but I don't really mind. But I don't recall anything about believable use of RO futexes: Jamie? Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 6:45 ` Rusty Russell 2003-09-08 17:33 ` Hugh Dickins @ 2003-09-08 17:51 ` Jamie Lokier 2003-09-08 18:26 ` Jamie Lokier 2003-09-09 3:58 ` Rusty Russell 2003-09-08 19:02 ` Andrew Morton 2 siblings, 2 replies; 23+ messages in thread From: Jamie Lokier @ 2003-09-08 17:51 UTC (permalink / raw) To: Rusty Russell Cc: Hugh Dickins, Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, torvalds, Linux Kernel Rusty Russell wrote: > + u32 hash = jhash2((u32*)&key->both.word, Have you checked the code size? That does more work and has more code than is needed, especially on 32-bit archs. On 32-bit, jhash_3words() is much better because it reduces to a single call to __jhash_mix(), instead of the two done by jhash2 (only one is required for good hashing afaict). It is probably worth adding a jhash_3longs() to jhash.h, which does one call to __hash_mix() on 32-bit, two calls on 64-bit, and avoids the loop in both cases. [ Aside: For hashing individual integers, I prefer to use Thomas Wang's: http://www.concentric.net/~Ttwang/tech/inthash.htm He mentions Jenkin's function, and derived an integer mixing function from correspondence with Jenkins. For hashing 3 words together, Jenkins' hash does seem a bit more compact - if you don't call __jhash_mix() multiple times that is! ] > - if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ)) > - return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES; > + if (unlikely(vma->vm_flags & VM_IO)) > + return -EPERM; > + if (unlikely(vma->vm_flags & (VM_READ|VM_WRITE)) != (VM_READ|VM_WRITE)) > + return -EACCES; Is there a good reason to disallow read-only waiters? I agree with Hugh that it seems like a regression. > + /* A spurious wakeup. Should never happen. */ > + BUG(); :) The rest of your changes seem fine. I particularly appreciate your grammatical improvements to my comment :) -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 17:51 ` Jamie Lokier @ 2003-09-08 18:26 ` Jamie Lokier 2003-09-09 3:58 ` Rusty Russell 1 sibling, 0 replies; 23+ messages in thread From: Jamie Lokier @ 2003-09-08 18:26 UTC (permalink / raw) To: Rusty Russell Cc: Hugh Dickins, Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, torvalds, Linux Kernel Jamie Lokier wrote: > It is probably worth adding a jhash_3longs() to jhash.h, which does > one call to __hash_mix() on 32-bit, two calls on 64-bit, and avoids > the loop in both cases. Bob Jenkins has a 64-bit version of the mixing function, which we aren't using. It would be better to use a single iteration of that in jhash_3longs(). See: http://burtleburtle.net/bob/hash/evahash.html#hash64 -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 17:51 ` Jamie Lokier 2003-09-08 18:26 ` Jamie Lokier @ 2003-09-09 3:58 ` Rusty Russell 2003-09-10 11:39 ` Jamie Lokier 1 sibling, 1 reply; 23+ messages in thread From: Rusty Russell @ 2003-09-09 3:58 UTC (permalink / raw) To: Jamie Lokier Cc: Hugh Dickins, Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, torvalds, Linux Kernel, davem In message <20030908175140.GC27097@mail.jlokier.co.uk> you write: > Rusty Russell wrote: > > + u32 hash = jhash2((u32*)&key->both.word, > > Have you checked the code size? Good point: it's bigger, and there are 4 call sites, so it should be inlined. Done. > That does more work and has more code than is needed, especially on > 32-bit archs. On 32-bit, jhash_3words() is much better because it > reduces to a single call to __jhash_mix(), instead of the two done by > jhash2 (only one is required for good hashing afaict). > > It is probably worth adding a jhash_3longs() to jhash.h, which does > one call to __hash_mix() on 32-bit, two calls on 64-bit, and avoids > the loop in both cases. Well, I'm happy to let the compiler do this work 8) In fact, it does it quite well: the jhash_2words version is still 4 bytes shorter though, gcc 3.2.3, but then the hashes are slightly different (length isn't added in jhash_2words). > [ Aside: For hashing individual integers, I prefer to use Thomas Wang's: > > http://www.concentric.net/~Ttwang/tech/inthash.htm > > He mentions Jenkin's function, and derived an integer mixing function > from correspondence with Jenkins. Interesting: we could sub this for the specific jhash_x functions if someone wants to do the analysis. > > - if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ)) > > - return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES; > > + if (unlikely(vma->vm_flags & VM_IO)) > > + return -EPERM; > > + if (unlikely(vma->vm_flags & (VM_READ|VM_WRITE)) != (VM_READ|VM_WRITE)) > > + return -EACCES; > > Is there a good reason to disallow read-only waiters? > I agree with Hugh that it seems like a regression. Yes, I've reverted this part. > > + /* A spurious wakeup. Should never happen. */ > > + BUG(); > > :) > > The rest of your changes seem fine. I particularly appreciate your > grammatical improvements to my comment :) These days my biggest contribution to the futex code 8) BTW, I'm guessing from your preference for multi-line comments that you don't use a color-coding editor for source? I must say that once I got used to it, I really prefer comments in green. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. Name: Minor Tweaks To Jamie Lokier's Futex Patch Author: Rusty Russell Status: Booted on 2.6.0-test5 D: Minor changes to Jamie's excellent futex patch. D: 1) Remove obsolete comment above hash array decl. D: 2) Clarify comment about TASK_INTERRUPTIBLE. D: 3) Andrew Morton says spurious wakeup is a bug. Catch it. D: 4) Try Jenkins hash. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .24731-linux-2.6.0-test4-bk9/kernel/futex.c .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c --- .24731-linux-2.6.0-test4-bk9/kernel/futex.c 2003-09-08 10:44:26.000000000 +1000 +++ .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c 2003-09-08 12:01:23.000000000 +1000 @@ -33,7 +33,7 @@ #include <linux/poll.h> #include <linux/fs.h> #include <linux/file.h> -#include <linux/hash.h> +#include <linux/jhash.h> #include <linux/init.h> #include <linux/futex.h> #include <linux/mount.h> @@ -44,6 +44,7 @@ /* * Futexes are matched on equal values of this key. * The key type depends on whether it's a shared or private mapping. + * Don't rearrange members without looking at hash_futex(). */ union futex_key { struct { @@ -79,7 +80,6 @@ struct futex_q { struct file *filp; }; -/* The key for the hash is the address + index + offset within page */ static struct list_head futex_queues[1<<FUTEX_HASHBITS]; static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED; @@ -89,11 +89,12 @@ static struct vfsmount *futex_mnt; /* * We hash on the keys returned from get_futex_key (see below). */ -static inline struct list_head *hash_futex(union futex_key *key) +static struct list_head *hash_futex(const union futex_key *key) { - return &futex_queues[hash_long(key->both.word - + (unsigned long) key->both.ptr - + key->both.offset, FUTEX_HASHBITS)]; + u32 hash = jhash2((u32*)&key->both.word, + (sizeof(key->both.word)+sizeof(key->both.ptr))/4, + key->both.offset); + return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)]; } /* @@ -333,7 +330,6 @@ static int futex_wait(unsigned long uadd union futex_key key; struct futex_q q; - try_again: init_waitqueue_head(&q.waiters); down_read(¤t->mm->mmap_sem); @@ -367,10 +363,10 @@ static int futex_wait(unsigned long uadd /* * There might have been scheduling since the queue_me(), as we * cannot hold a spinlock across the get_user() in case it - * faults. So we cannot just set TASK_INTERRUPTIBLE state when + * faults, and we cannot just set TASK_INTERRUPTIBLE state when * queueing ourselves into the futex hash. This code thus has to - * rely on the futex_wake() code doing a wakeup after removing - * the waiter from the list. + * rely on the futex_wake() code removing us from hash when it + * wakes us up. */ add_wait_queue(&q.waiters, &wait); spin_lock(&futex_lock); @@ -394,26 +390,19 @@ static int futex_wait(unsigned long uadd * we are the only user of it. */ - /* - * Were we woken or interrupted for a valid reason? - */ - ret = unqueue_me(&q); - if (ret == 0) + /* If we were woken (and unqueued), we succeeded, whatever. */ + if (!unqueue_me(&q)) return 0; if (time == 0) return -ETIMEDOUT; if (signal_pending(current)) return -EINTR; - /* - * No, it was a spurious wakeup. Try again. Should never happen. :) - */ - goto try_again; + /* A spurious wakeup. Should never happen. */ + BUG(); out_unqueue: - /* - * Were we unqueued anyway? - */ + /* If we were woken (and unqueued), we succeeded, whatever. */ if (!unqueue_me(&q)) ret = 0; out_release_sem: ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-09 3:58 ` Rusty Russell @ 2003-09-10 11:39 ` Jamie Lokier 2003-09-10 22:00 ` Rusty Russell 0 siblings, 1 reply; 23+ messages in thread From: Jamie Lokier @ 2003-09-10 11:39 UTC (permalink / raw) To: Rusty Russell Cc: Hugh Dickins, Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, torvalds, Linux Kernel, davem Rusty Russell wrote: > BTW, I'm guessing from your preference for multi-line comments that > you don't use a color-coding editor for source? I must say that once > I got used to it, I really prefer comments in green. You mean the /* * */ style? No, I never use that style, except in futex.c to go along with the style that was already there! I use Emacs with comments in red-orange :) -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-10 11:39 ` Jamie Lokier @ 2003-09-10 22:00 ` Rusty Russell 2003-09-11 16:29 ` Jamie Lokier 0 siblings, 1 reply; 23+ messages in thread From: Rusty Russell @ 2003-09-10 22:00 UTC (permalink / raw) To: Jamie Lokier Cc: Hugh Dickins, Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, torvalds, Linux Kernel, davem In message <20030910113929.GA21945@mail.jlokier.co.uk> you write: > Rusty Russell wrote: > > BTW, I'm guessing from your preference for multi-line comments that > > you don't use a color-coding editor for source? I must say that once > > I got used to it, I really prefer comments in green. > > You mean the /* > * > */ style? > > No, I never use that style, except in futex.c to go along with the > style that was already there! I use Emacs with comments in red-orange :) I'll submit a trivial patch to condense them again: it's an Ingo-ism. Comments in my parts of the code are less, um, verbose 8) Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-10 22:00 ` Rusty Russell @ 2003-09-11 16:29 ` Jamie Lokier 0 siblings, 0 replies; 23+ messages in thread From: Jamie Lokier @ 2003-09-11 16:29 UTC (permalink / raw) To: Rusty Russell Cc: Hugh Dickins, Ulrich Drepper, Jamie Lokier, Andrew Morton, Stephen Hemminger, torvalds, Linux Kernel, davem Rusty Russell wrote: > > No, I never use that style, except in futex.c to go along with the > > style that was already there! I use Emacs with comments in red-orange :) > > I'll submit a trivial patch to condense them again: it's an Ingo-ism. > Comments in my parts of the code are less, um, verbose 8) Oh, the verbosity is definitely me, only the formatting is Ingo-ism :) -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 6:45 ` Rusty Russell 2003-09-08 17:33 ` Hugh Dickins 2003-09-08 17:51 ` Jamie Lokier @ 2003-09-08 19:02 ` Andrew Morton 2003-09-08 20:07 ` Jamie Lokier 2003-09-09 4:12 ` Rusty Russell 2 siblings, 2 replies; 23+ messages in thread From: Andrew Morton @ 2003-09-08 19:02 UTC (permalink / raw) To: Rusty Russell; +Cc: hugh, drepper, lk, shemminger, torvalds, linux-kernel Rusty Russell <rusty@rustcorp.com.au> wrote: > > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it. Yes, but going BUG() is a bit rude. We can detect the error, we can recover from it and it doesn't cause any user data corruption or anything. A rude printk is all that is needed here. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 19:02 ` Andrew Morton @ 2003-09-08 20:07 ` Jamie Lokier 2003-09-08 19:59 ` Andrew Morton 2003-09-08 20:11 ` Linus Torvalds 2003-09-09 4:12 ` Rusty Russell 1 sibling, 2 replies; 23+ messages in thread From: Jamie Lokier @ 2003-09-08 20:07 UTC (permalink / raw) To: Andrew Morton Cc: Rusty Russell, hugh, drepper, lk, shemminger, torvalds, linux-kernel Andrew Morton wrote: > Rusty Russell <rusty@rustcorp.com.au> wrote: > > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it. > > Yes, but going BUG() is a bit rude. We can detect the error, we can > recover from it and it doesn't cause any user data corruption or anything. > A rude printk is all that is needed here. Well, it should really _never_ happen. We are very careful. Is there something like BUG() which doesn't terminate the process? -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 20:07 ` Jamie Lokier @ 2003-09-08 19:59 ` Andrew Morton 2003-09-08 20:11 ` Linus Torvalds 1 sibling, 0 replies; 23+ messages in thread From: Andrew Morton @ 2003-09-08 19:59 UTC (permalink / raw) To: Jamie Lokier; +Cc: rusty, hugh, drepper, lk, shemminger, torvalds, linux-kernel Jamie Lokier <jamie@shareable.org> wrote: > > Andrew Morton wrote: > > Rusty Russell <rusty@rustcorp.com.au> wrote: > > > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it. > > > > Yes, but going BUG() is a bit rude. We can detect the error, we can > > recover from it and it doesn't cause any user data corruption or anything. > > A rude printk is all that is needed here. > > Well, it should really _never_ happen. We are very careful. Is > there something like BUG() which doesn't terminate the process? > A WARN_ON(1) is good enough. It will spit a stack dump and we will get to hear about it. Going BUG merely reduces the chances of the info hitting the logs and irritates our testers. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 20:07 ` Jamie Lokier 2003-09-08 19:59 ` Andrew Morton @ 2003-09-08 20:11 ` Linus Torvalds 1 sibling, 0 replies; 23+ messages in thread From: Linus Torvalds @ 2003-09-08 20:11 UTC (permalink / raw) To: Jamie Lokier Cc: Andrew Morton, Rusty Russell, hugh, drepper, lk, shemminger, Kernel Mailing List On Mon, 8 Sep 2003, Jamie Lokier wrote: > > Well, it should really _never_ happen. We are very careful. Is > there something like BUG() which doesn't terminate the process? That's exactly what WARN() does. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-08 19:02 ` Andrew Morton 2003-09-08 20:07 ` Jamie Lokier @ 2003-09-09 4:12 ` Rusty Russell 1 sibling, 0 replies; 23+ messages in thread From: Rusty Russell @ 2003-09-09 4:12 UTC (permalink / raw) To: Andrew Morton Cc: hugh, drepper, Jamie Lokier, lk, shemminger, torvalds, linux-kernel In message <20030908120234.5d05cda9.akpm@osdl.org> you write: > Rusty Russell <rusty@rustcorp.com.au> wrote: > > > > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it. > > Yes, but going BUG() is a bit rude. We can detect the error, we can > recover from it and it doesn't cause any user data corruption or anything. > A rude printk is all that is needed here. OK. Changed. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. Name: Minor Tweaks To Jamie Lokier's Futex Patch Author: Rusty Russell Status: Booted on 2.6.0-test5 D: Minor changes to Jamie's excellent futex patch. D: 1) Remove obsolete comment above hash array decl. D: 2) Clarify comment about TASK_INTERRUPTIBLE. D: 3) Andrew Morton says spurious wakeup is a bug. Catch it. D: 4) Try Jenkins hash. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9333-linux-2.6.0-test5/kernel/futex.c .9333-linux-2.6.0-test5.updated/kernel/futex.c --- .9333-linux-2.6.0-test5/kernel/futex.c 2003-09-09 10:35:05.000000000 +1000 +++ .9333-linux-2.6.0-test5.updated/kernel/futex.c 2003-09-09 14:06:06.000000000 +1000 @@ -33,7 +33,7 @@ #include <linux/poll.h> #include <linux/fs.h> #include <linux/file.h> -#include <linux/hash.h> +#include <linux/jhash.h> #include <linux/init.h> #include <linux/futex.h> #include <linux/mount.h> @@ -44,6 +44,7 @@ /* * Futexes are matched on equal values of this key. * The key type depends on whether it's a shared or private mapping. + * Don't rearrange members without looking at hash_futex(). */ union futex_key { struct { @@ -79,7 +80,6 @@ struct futex_q { struct file *filp; }; -/* The key for the hash is the address + index + offset within page */ static struct list_head futex_queues[1<<FUTEX_HASHBITS]; static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED; @@ -89,11 +89,12 @@ static struct vfsmount *futex_mnt; /* * We hash on the keys returned from get_futex_key (see below). */ -static inline struct list_head *hash_futex(union futex_key *key) +static struct list_head *hash_futex(const union futex_key *key) { - return &futex_queues[hash_long(key->both.word - + (unsigned long) key->both.ptr - + key->both.offset, FUTEX_HASHBITS)]; + u32 hash = jhash2((u32*)&key->both.word, + (sizeof(key->both.word)+sizeof(key->both.ptr))/4, + key->both.offset); + return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)]; } /* @@ -333,7 +334,6 @@ static int futex_wait(unsigned long uadd union futex_key key; struct futex_q q; - try_again: init_waitqueue_head(&q.waiters); down_read(¤t->mm->mmap_sem); @@ -367,10 +367,10 @@ static int futex_wait(unsigned long uadd /* * There might have been scheduling since the queue_me(), as we * cannot hold a spinlock across the get_user() in case it - * faults. So we cannot just set TASK_INTERRUPTIBLE state when + * faults, and we cannot just set TASK_INTERRUPTIBLE state when * queueing ourselves into the futex hash. This code thus has to - * rely on the futex_wake() code doing a wakeup after removing - * the waiter from the list. + * rely on the futex_wake() code removing us from hash when it + * wakes us up. */ add_wait_queue(&q.waiters, &wait); spin_lock(&futex_lock); @@ -394,26 +394,17 @@ static int futex_wait(unsigned long uadd * we are the only user of it. */ - /* - * Were we woken or interrupted for a valid reason? - */ - ret = unqueue_me(&q); - if (ret == 0) + /* If we were woken (and unqueued), we succeeded, whatever. */ + if (!unqueue_me(&q)) return 0; if (time == 0) return -ETIMEDOUT; - if (signal_pending(current)) - return -EINTR; - - /* - * No, it was a spurious wakeup. Try again. Should never happen. :) - */ - goto try_again; + /* A spurious wakeup should never happen. */ + WARN_ON(!signal_pending(current)); + return -EINTR; out_unqueue: - /* - * Were we unqueued anyway? - */ + /* If we were woken (and unqueued), we succeeded, whatever. */ if (!unqueue_me(&q)) ret = 0; out_release_sem: ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Re: today's futex changes 2003-09-06 16:28 ` [PATCH] " Hugh Dickins ` (2 preceding siblings ...) 2003-09-08 6:45 ` Rusty Russell @ 2003-09-08 17:04 ` Stephen Hemminger 3 siblings, 0 replies; 23+ messages in thread From: Stephen Hemminger @ 2003-09-08 17:04 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Ulrich Drepper, Jamie Lokier, Andrew Morton, Rusty Russell, Linux Kernel On Sat, 6 Sep 2003 17:28:44 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > On Fri, 5 Sep 2003, Ulrich Drepper wrote: > > ... broke NPTL. Tests which worked with previous kernels fail now. One > > test eventually succeeded, but the process somehow got stuck for about > > 30-40 seconds. Then it finished. Running strace showed a call to > > clone() as the last operation but there were other threads running at > > that time. > >.... > > What I can offer are statically linked versions of the tests. > > One is here: http://people.redhat.com/drepper/tst-cond2.bz2 > > Very helpful, thank you: it showed two bugs, one new and one old. > Does the patch below work for you, Ulrich? > > The new bug is that "offset" has been declared as an alternative in > the union, instead of as an element in the structures comprising it, > effectively eliminating it from the key: keys match which should not. > > The old bug is that if futex_requeue were called with identical > key1 and key2 (sensible? tended to happen given the first bug), > it was liable to loop for a long time holding futex_lock: guard > against that, still respecting the semantics of futex_requeue. > > While here, please let's also fix the get_futex_key VM_NONLINEAR > case, which was returning the 1 from get_user_pages, taken as an > error by its callers. And save a few bytes and improve debuggability > by uninlining the top-level futex_wake, futex_requeue, futex_wait. > > Hugh Everything is working fine for me now. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2003-09-11 16:29 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-09-05 20:24 today's futex changes Ulrich Drepper 2003-09-05 22:24 ` Stephen Hemminger 2003-09-06 16:28 ` [PATCH] " Hugh Dickins 2003-09-06 17:32 ` Ulrich Drepper 2003-09-06 17:44 ` Jamie Lokier 2003-09-06 17:46 ` Jamie Lokier 2003-09-06 18:21 ` Hugh Dickins 2003-09-08 6:45 ` Rusty Russell 2003-09-08 17:33 ` Hugh Dickins 2003-09-08 17:51 ` Hugh Dickins 2003-09-09 1:37 ` Rusty Russell 2003-09-08 17:51 ` Jamie Lokier 2003-09-08 18:26 ` Jamie Lokier 2003-09-09 3:58 ` Rusty Russell 2003-09-10 11:39 ` Jamie Lokier 2003-09-10 22:00 ` Rusty Russell 2003-09-11 16:29 ` Jamie Lokier 2003-09-08 19:02 ` Andrew Morton 2003-09-08 20:07 ` Jamie Lokier 2003-09-08 19:59 ` Andrew Morton 2003-09-08 20:11 ` Linus Torvalds 2003-09-09 4:12 ` Rusty Russell 2003-09-08 17:04 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).