linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] futex requeueing feature, futex-requeue-2.5.69-D3
@ 2003-05-19  9:31 Ingo Molnar
  2003-05-19 10:10 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-19  9:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Rusty Russell, Ulrich Drepper


the attached patch addresses a futex related SMP scalability problem of
glibc. A number of regressions have been reported to the NTPL mailing list
when going to many CPUs, for applications that use condition variables and
the pthread_cond_broadcast() API call. Using this functionality, testcode
shows a slowdown from 0.12 seconds runtime to over 237 seconds (!)  
runtime, on 4-CPU systems.

pthread condition variables use two futex-backed mutex-alike locks: an
internal one for the glibc CV state itself, and a user-supplied mutex
which the API guarantees to take in certain codepaths. (Unfortunately the
user-supplied mutex cannot be used to protect the CV state, so we've got
to deal with two locks.)

The cause of the slowdown is a 'swarm effect': if lots of threads are
blocked on a condition variable, and pthread_cond_broadcast() is done,
then glibc first does a FUTEX_WAKE on the cv-internal mutex, then down a
mutex_down() on the user-supplied mutex. Ie. a swarm of threads is created
which all race to serialize on the user-supplied mutex. The more threads
are used, the more likely it becomes that the scheduler will balance them
over to other CPUs - where they just schedule, try to lock the mutex, and
go to sleep. This 'swarm effect' is purely technical, a side-effect of
glibc's use of futexes, and the imperfect coupling of the two locks.

the solution to this problem is to not wake up the swarm of threads, but
'requeue' them from the CV-internal mutex to the user-supplied mutex. The
attached patch adds the FUTEX_REQUEUE feature FUTEX_REQUEUE requeues N
threads from futex address A to futex address B:

	sys_futex(uaddr, FUTEX_REQUEUE, nr_wake, NULL, uaddr2);

the 'val' parameter to sys_futex (nr_wake) is the # of woken up threads.  
This way glibc can wake up a single thread (which will take the
user-mutex), and can requeue the rest, with a single system-call.

Ulrich Drepper has implemented FUTEX_REQUEUE support in glibc, and a
number of people have tested it over the past couple of weeks. Here are
the measurements done by Saurabh Desai:

System: 4xPIII 700MHz

 ./cond-perf -r 100 -n 200:        1p       2p         4p
 Default NPTL:                 0.120s   0.211s   237.407s
 requeue NPTL:                 0.124s   0.156s     0.040s

 ./cond-perf -r 1000 -n 100:
 Default NPTL:                 0.276s   0.412s     0.530s
 requeue NPTL:                 0.349s   0.503s     0.550s

 ./pp -v -n 128 -i 1000 -S 32768:
 Default NPTL: 128 games in    1.111s   1.270s    16.894s
 requeue NPTL: 128 games in    1.111s   1.959s     2.426s

 ./pp -v -n 1024 -i 10 -S 32768:
 Default NPTL: 1024 games in   0.181s   0.394s     incompleted 2m+
 requeue NPTL: 1024 games in   0.166s   0.254s     0.341s

the speedup with increasing number of threads is quite significant, in the
128 threads, case it's more than 8 times. In the cond-perf test, on 4 CPUs
it's almost infinitely faster than the 'swarm of threads' catastrophy
triggered by the old code.

there's a slowdown on UP, which is expected: on UP the O(1) scheduler
implicitly serializes all active threads on the runqueue, and doesnt
degrade under lots of threads. On SMP the 'point of breakdown' depends on
the precise amount of time needed for the threads to become rated as
'cache-cold' by the load-balancer.

(the patch adds a new futex syscall parameter (uaddr2), which is a
compatible extension of sys_futex. Old NPTL applications will continue to
work without any impact, only the FUTEX_REQUEUE codepath uses the new
parameter.)

	Ingo

--- linux/include/linux/futex.h.orig	
+++ linux/include/linux/futex.h	
@@ -5,7 +5,8 @@
 #define FUTEX_WAIT (0)
 #define FUTEX_WAKE (1)
 #define FUTEX_FD (2)
+#define FUTEX_REQUEUE (3)
 
-extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime);
+extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2);
 
 #endif
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -457,7 +457,7 @@ void mm_release(struct task_struct *tsk,
 		 * not set up a proper pointer then tough luck.
 		 */
 		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL);
+		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL);
 	}
 }
 
--- linux/kernel/compat.c.orig	
+++ linux/kernel/compat.c	
@@ -214,7 +214,7 @@ asmlinkage long compat_sys_sigprocmask(i
 extern long do_futex(unsigned long, int, int, unsigned long);
 
 asmlinkage long compat_sys_futex(u32 *uaddr, int op, int val,
-		struct compat_timespec *utime)
+		struct compat_timespec *utime, u32 *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
@@ -224,7 +224,7 @@ asmlinkage long compat_sys_futex(u32 *ua
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout);
+	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
 }
 
 asmlinkage long sys_setrlimit(unsigned int resource, struct rlimit *rlim);
--- linux/kernel/futex.c.orig	
+++ linux/kernel/futex.c	
@@ -2,6 +2,9 @@
  *  Fast Userspace Mutexes (which I call "Futexes!").
  *  (C) Rusty Russell, IBM 2002
  *
+ *  Generalized futexes, futex requeueing, misc fixes by Ingo Molnar
+ *  (C) Copyright 2003 Red Hat Inc, All Rights Reserved
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -9,9 +12,6 @@
  *  "The futexes are also cursed."
  *  "But they come in a choice of three flavours!"
  *
- *  Generalized futexes for every mapping type, Ingo Molnar, 2002
- *
- *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
  *  the Free Software Foundation; either version 2 of the License, or
@@ -216,6 +216,61 @@ static void futex_vcache_callback(vcache
 	spin_unlock(&futex_lock);
 }
 
+/*
+ * Requeue all waiters hashed on one physical page to another
+ * physical page.
+ */
+static int futex_requeue(unsigned long uaddr1, int offset1, unsigned long uaddr2, int offset2, int num)
+{
+	struct list_head *i, *next, *head1, *head2;
+	struct page *page1, *page2;
+	int ret = 0;
+
+	lock_futex_mm();
+
+	page1 = __pin_page(uaddr1 - offset1);
+	if (!page1) {
+		unlock_futex_mm();
+		return -EFAULT;
+	}
+	page2 = __pin_page(uaddr2 - offset2);
+	if (!page2) {
+		unlock_futex_mm();
+		return -EFAULT;
+	}
+
+	head1 = hash_futex(page1, offset1);
+	head2 = hash_futex(page2, offset2);
+
+	list_for_each_safe(i, next, head1) {
+		struct futex_q *this = list_entry(i, struct futex_q, list);
+
+		if (this->page == page1 && this->offset == offset1) {
+			list_del_init(i);
+			__detach_vcache(&this->vcache);
+			if (++ret <= num) {
+				wake_up_all(&this->waiters);
+				if (this->filp)
+					send_sigio(&this->filp->f_owner, this->fd, POLL_IN);
+			} else {
+				unpin_page(this->page);
+				__pin_page_atomic (page2);
+				list_add_tail(i, head2);
+				__attach_vcache(&this->vcache, uaddr2, current->mm, futex_vcache_callback);
+				this->offset = offset2;
+				this->page = page2;
+			}
+		}
+	}
+
+	unlock_futex_mm();
+
+	unpin_page(page1);
+	unpin_page(page2);
+
+	return ret;
+}
+
 static inline void __queue_me(struct futex_q *q, struct page *page,
 				unsigned long uaddr, int offset,
 				int fd, struct file *filp)
@@ -425,9 +480,9 @@ out:
 	return ret;
 }
 
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout)
+long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
 {
-	unsigned long pos_in_page;
+	unsigned long pos_in_page, pos_in_page2;
 	int ret;
 
 	pos_in_page = uaddr % PAGE_SIZE;
@@ -443,6 +498,14 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE:
 		ret = futex_wake(uaddr, pos_in_page, val);
 		break;
+	case FUTEX_REQUEUE:
+		pos_in_page2 = uaddr2 % PAGE_SIZE;
+
+		/* Must be "naturally" aligned */
+		if (pos_in_page2 % sizeof(u32))
+			return -EINVAL;
+		ret = futex_requeue(uaddr, pos_in_page, uaddr2, pos_in_page2, val);
+		break;
 	case FUTEX_FD:
 		/* non-zero val means F_SETOWN(getpid()) & F_SETSIG(val) */
 		ret = futex_fd(uaddr, pos_in_page, val);
@@ -453,7 +516,7 @@ long do_futex(unsigned long uaddr, int o
 	return ret;
 }
 
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime)
+asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
@@ -463,7 +526,7 @@ asmlinkage long sys_futex(u32 __user *ua
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout);
+	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
 }
 
 static struct super_block *



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19  9:31 [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Ingo Molnar
@ 2003-05-19 10:10 ` Christoph Hellwig
  2003-05-19 10:16   ` Ingo Molnar
  2003-05-20  0:08   ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
  2003-05-19 10:23 ` Andrew Morton
  2003-05-20  0:04 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
  2 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-05-19 10:10 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel, Rusty Russell, Ulrich Drepper

On Mon, May 19, 2003 at 11:31:51AM +0200, Ingo Molnar wrote:
> the solution to this problem is to not wake up the swarm of threads, but
> 'requeue' them from the CV-internal mutex to the user-supplied mutex. The
> attached patch adds the FUTEX_REQUEUE feature FUTEX_REQUEUE requeues N
> threads from futex address A to futex address B:
> 
> 	sys_futex(uaddr, FUTEX_REQUEUE, nr_wake, NULL, uaddr2);
> 
> the 'val' parameter to sys_futex (nr_wake) is the # of woken up threads.  
> This way glibc can wake up a single thread (which will take the
> user-mutex), and can requeue the rest, with a single system-call.

Urgg, yet another sys_futex extension.  Could you please split all
these totally different cases into separate syscalls instead?

> +				wake_up_all(&this->waiters);
> +				if (this->filp)
> +					send_sigio(&this->filp->f_owner, this->fd, POLL_IN);
> +			} else {
> +				unpin_page(this->page);
> +				__pin_page_atomic (page2);
> +				list_add_tail(i, head2);
> +				__attach_vcache(&this->vcache, uaddr2, current->mm, futex_vcache_callback);

Please linewrap after 80 lines, thanks.

> +	case FUTEX_REQUEUE:
> +		pos_in_page2 = uaddr2 % PAGE_SIZE;
> +
> +		/* Must be "naturally" aligned */
> +		if (pos_in_page2 % sizeof(u32))
> +			return -EINVAL;

Who guarantess that the alignment of u32 is always the same as it's size?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19 10:10 ` Christoph Hellwig
@ 2003-05-19 10:16   ` Ingo Molnar
  2003-05-19 11:49     ` Christoph Hellwig
  2003-05-20  0:08   ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
  1 sibling, 1 reply; 43+ messages in thread
From: Ingo Molnar @ 2003-05-19 10:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, linux-kernel, Rusty Russell, Ulrich Drepper


On Mon, 19 May 2003, Christoph Hellwig wrote:

> [...] Could you please split all these totally different cases into
> separate syscalls instead?

sure, i'm all for it, but in a different pass, and after syncing up with
glibc. An API cleanup like this should have been done upon the
introduction of futexes, why didnt you comment on this then? Splitting off
FUTEX_REQUEUE in isolation is quite pointless.

> > +	case FUTEX_REQUEUE:
> > +		pos_in_page2 = uaddr2 % PAGE_SIZE;
> > +
> > +		/* Must be "naturally" aligned */
> > +		if (pos_in_page2 % sizeof(u32))
> > +			return -EINVAL;
> 
> Who guarantess that the alignment of u32 is always the same as it's size?

glibc. We do not want to handle all the misaligned cases for obvious
reasons. The use of u32 (instead of a native word) is a bit unfortunate on
64-bit systems but now a reality.

	Ingo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19  9:31 [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Ingo Molnar
  2003-05-19 10:10 ` Christoph Hellwig
@ 2003-05-19 10:23 ` Andrew Morton
  2003-05-19 10:30   ` [patch] futex requeueing feature, futex-requeue-2.5.69-D4 Ingo Molnar
  2003-05-20  0:04 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
  2 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2003-05-19 10:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: torvalds, linux-kernel, rusty, drepper

Ingo Molnar <mingo@elte.hu> wrote:
>
>  +	page1 = __pin_page(uaddr1 - offset1);
>  +	if (!page1) {
>  +		unlock_futex_mm();
>  +		return -EFAULT;
>  +	}
>  +	page2 = __pin_page(uaddr2 - offset2);
>  +	if (!page2) {
>  +		unlock_futex_mm();
>  +		return -EFAULT;
>  +	}

page1 is leaked.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D4
  2003-05-19 10:23 ` Andrew Morton
@ 2003-05-19 10:30   ` Ingo Molnar
  0 siblings, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-19 10:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, Rusty Russell, Ulrich Drepper


On Mon, 19 May 2003, Andrew Morton wrote:

> page1 is leaked.

doh, indeed. -D4 patch attached. It also fixes the line-wrap noticed by
Christoph Hellwig. Patch applies, compiles and boots fine.

	Ingo

--- linux/include/linux/futex.h.orig	
+++ linux/include/linux/futex.h	
@@ -5,7 +5,8 @@
 #define FUTEX_WAIT (0)
 #define FUTEX_WAKE (1)
 #define FUTEX_FD (2)
+#define FUTEX_REQUEUE (3)
 
-extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime);
+extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2);
 
 #endif
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -457,7 +457,7 @@ void mm_release(struct task_struct *tsk,
 		 * not set up a proper pointer then tough luck.
 		 */
 		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL);
+		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL);
 	}
 }
 
--- linux/kernel/compat.c.orig	
+++ linux/kernel/compat.c	
@@ -214,7 +214,7 @@ asmlinkage long compat_sys_sigprocmask(i
 extern long do_futex(unsigned long, int, int, unsigned long);
 
 asmlinkage long compat_sys_futex(u32 *uaddr, int op, int val,
-		struct compat_timespec *utime)
+		struct compat_timespec *utime, u32 *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
@@ -224,7 +224,7 @@ asmlinkage long compat_sys_futex(u32 *ua
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout);
+	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
 }
 
 asmlinkage long sys_setrlimit(unsigned int resource, struct rlimit *rlim);
--- linux/kernel/futex.c.orig	
+++ linux/kernel/futex.c	
@@ -2,6 +2,9 @@
  *  Fast Userspace Mutexes (which I call "Futexes!").
  *  (C) Rusty Russell, IBM 2002
  *
+ *  Generalized futexes, futex requeueing, misc fixes by Ingo Molnar
+ *  (C) Copyright 2003 Red Hat Inc, All Rights Reserved
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -9,9 +12,6 @@
  *  "The futexes are also cursed."
  *  "But they come in a choice of three flavours!"
  *
- *  Generalized futexes for every mapping type, Ingo Molnar, 2002
- *
- *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
  *  the Free Software Foundation; either version 2 of the License, or
@@ -216,6 +216,62 @@ static void futex_vcache_callback(vcache
 	spin_unlock(&futex_lock);
 }
 
+/*
+ * Requeue all waiters hashed on one physical page to another
+ * physical page.
+ */
+static int futex_requeue(unsigned long uaddr1, int offset1, unsigned long uaddr2, int offset2, int num)
+{
+	struct list_head *i, *next, *head1, *head2;
+	struct page *page1 = NULL, *page2 = NULL;
+	int ret = 0;
+
+	lock_futex_mm();
+
+	page1 = __pin_page(uaddr1 - offset1);
+	if (!page1)
+		goto out;
+	page2 = __pin_page(uaddr2 - offset2);
+	if (!page2)
+		goto out;
+
+	head1 = hash_futex(page1, offset1);
+	head2 = hash_futex(page2, offset2);
+
+	list_for_each_safe(i, next, head1) {
+		struct futex_q *this = list_entry(i, struct futex_q, list);
+
+		if (this->page == page1 && this->offset == offset1) {
+			list_del_init(i);
+			__detach_vcache(&this->vcache);
+			if (++ret <= num) {
+				wake_up_all(&this->waiters);
+				if (this->filp)
+					send_sigio(&this->filp->f_owner,
+							this->fd, POLL_IN);
+			} else {
+				unpin_page(this->page);
+				__pin_page_atomic (page2);
+				list_add_tail(i, head2);
+				__attach_vcache(&this->vcache, uaddr2,
+					current->mm, futex_vcache_callback);
+				this->offset = offset2;
+				this->page = page2;
+			}
+		}
+	}
+
+out:
+	unlock_futex_mm();
+
+	if (page1)
+		unpin_page(page1);
+	if (page2)
+		unpin_page(page2);
+
+	return ret;
+}
+
 static inline void __queue_me(struct futex_q *q, struct page *page,
 				unsigned long uaddr, int offset,
 				int fd, struct file *filp)
@@ -425,9 +481,9 @@ out:
 	return ret;
 }
 
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout)
+long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
 {
-	unsigned long pos_in_page;
+	unsigned long pos_in_page, pos_in_page2;
 	int ret;
 
 	pos_in_page = uaddr % PAGE_SIZE;
@@ -443,6 +499,14 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE:
 		ret = futex_wake(uaddr, pos_in_page, val);
 		break;
+	case FUTEX_REQUEUE:
+		pos_in_page2 = uaddr2 % PAGE_SIZE;
+
+		/* Must be "naturally" aligned */
+		if (pos_in_page2 % sizeof(u32))
+			return -EINVAL;
+		ret = futex_requeue(uaddr, pos_in_page, uaddr2, pos_in_page2, val);
+		break;
 	case FUTEX_FD:
 		/* non-zero val means F_SETOWN(getpid()) & F_SETSIG(val) */
 		ret = futex_fd(uaddr, pos_in_page, val);
@@ -453,7 +517,7 @@ long do_futex(unsigned long uaddr, int o
 	return ret;
 }
 
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime)
+asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
@@ -463,7 +527,7 @@ asmlinkage long sys_futex(u32 __user *ua
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout);
+	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
 }
 
 static struct super_block *


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19 10:16   ` Ingo Molnar
@ 2003-05-19 11:49     ` Christoph Hellwig
  2003-05-19 12:33       ` [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2 Ingo Molnar
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2003-05-19 11:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, Rusty Russell,
	Ulrich Drepper

On Mon, May 19, 2003 at 12:16:02PM +0200, Ingo Molnar wrote:
> sure, i'm all for it, but in a different pass, and after syncing up with
> glibc. An API cleanup like this should have been done upon the
> introduction of futexes, why didnt you comment on this then? Splitting off
> FUTEX_REQUEUE in isolation is quite pointless.

Maybe I don't spend all my time watching the futex API? :)  Okay,
let's make a deal, you add a new syscall for this case and I'll fix
up the older ones in a patch that's ontop of yours?

> > > +	case FUTEX_REQUEUE:
> > > +		pos_in_page2 = uaddr2 % PAGE_SIZE;
> > > +
> > > +		/* Must be "naturally" aligned */
> > > +		if (pos_in_page2 % sizeof(u32))
> > > +			return -EINVAL;
> > 
> > Who guarantess that the alignment of u32 is always the same as it's size?
> 
> glibc. We do not want to handle all the misaligned cases for obvious
> reasons. The use of u32 (instead of a native word) is a bit unfortunate on
> 64-bit systems but now a reality.

Sorry if the question wasn't clear, but who guarantess that the alignment
of u32 is the same as it's size?  You test of the size of u32, not it's
alignment even if they usually are the same.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 11:49     ` Christoph Hellwig
@ 2003-05-19 12:33       ` Ingo Molnar
  2003-05-19 12:51         ` Ingo Molnar
                           ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-19 12:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, linux-kernel, Rusty Russell, Ulrich Drepper


On Mon, 19 May 2003, Christoph Hellwig wrote:

> Maybe I don't spend all my time watching the futex API? :)  Okay, let's
> make a deal, you add a new syscall for this case and I'll fix up the
> older ones in a patch that's ontop of yours?

well, my point was that it's not an issue introduced by the FUTEX_REQUEUE
change. Here's a patch ontop of my last futex patch, which adds all the 
interface cleanups. Changes:

 - separate out sys_futex_wait, sys_futex_wake and sys_futex_requeue.

 - inline the futex_wait/wake/requeue functionality, as it should be.

 - start the phasing out of FUTEX_FD. This i believe is quite unclean and
   unrobust, because it attaches a new concept (futexes) to a very old
   (polling) concept. We want futex support in kernel-AIO, not in the
   polling APIs. AFAIK only NGPT uses FUTEX_FD.

> > > Who guarantess that the alignment of u32 is always the same as it's size?
> > 
> > glibc. We do not want to handle all the misaligned cases for obvious
> > reasons. The use of u32 (instead of a native word) is a bit unfortunate on
> > 64-bit systems but now a reality.
> 
> Sorry if the question wasn't clear, but who guarantess that the
> alignment of u32 is the same as it's size?  You test of the size of u32,
> not it's alignment even if they usually are the same.

glibc (the main user and API-multiplexer of futexes) ensures that futex
variables are aligned on sizeof(u32).

	Ingo

--- linux/include/linux/futex.h.orig	
+++ linux/include/linux/futex.h	
@@ -2,11 +2,13 @@
 #define _LINUX_FUTEX_H
 
 /* Second argument to futex syscall */
-#define FUTEX_WAIT (0)
-#define FUTEX_WAKE (1)
-#define FUTEX_FD (2)
-#define FUTEX_REQUEUE (3)
 
-extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2);
+asmlinkage long sys_futex_wait(u32 __user *__uaddr,int val,
+				struct timespec __user *utime);
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int val);
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1, u32 __user *__uaddr2,
+					int nr_wake);
 
 #endif
--- linux/arch/i386/kernel/entry.S.orig	
+++ linux/arch/i386/kernel/entry.S	
@@ -837,7 +837,7 @@ ENTRY(sys_call_table)
 	.long sys_fremovexattr
 	.long sys_tkill
 	.long sys_sendfile64
-	.long sys_futex		/* 240 */
+	.long old_futex		/* 240 */
 	.long sys_sched_setaffinity
 	.long sys_sched_getaffinity
 	.long sys_set_thread_area
@@ -865,6 +865,8 @@ ENTRY(sys_call_table)
  	.long sys_clock_gettime		/* 265 */
  	.long sys_clock_getres
  	.long sys_clock_nanosleep
- 
+	.long sys_futex_wait
+	.long sys_futex_wake
+	.long sys_futex_requeue		/* 270 */
  
 nr_syscalls=(.-sys_call_table)/4
--- linux/kernel/futex.c.orig	
+++ linux/kernel/futex.c	
@@ -347,7 +347,7 @@ static int futex_wait(unsigned long uadd
 	 * The get_user() above might fault and schedule so we
 	 * cannot just set TASK_INTERRUPTIBLE state when queueing
 	 * ourselves into the futex hash. This code thus has to
-	 * rely on the FUTEX_WAKE code doing a wakeup after removing
+	 * rely on the futex_wake() code doing a wakeup after removing
 	 * the waiter from the list.
 	 */
 	add_wait_queue(&q.waiters, &wait);
@@ -481,9 +481,13 @@ out:
 	return ret;
 }
 
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
+#define OLD_FUTEX_WAIT (0)
+#define OLD_FUTEX_WAKE (1)
+#define OLD_FUTEX_FD (2)
+
+static long do_old_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
 {
-	unsigned long pos_in_page, pos_in_page2;
+	unsigned long pos_in_page;
 	int ret;
 
 	pos_in_page = uaddr % PAGE_SIZE;
@@ -493,21 +497,13 @@ long do_futex(unsigned long uaddr, int o
 		return -EINVAL;
 
 	switch (op) {
-	case FUTEX_WAIT:
+	case OLD_FUTEX_WAIT:
 		ret = futex_wait(uaddr, pos_in_page, val, timeout);
 		break;
-	case FUTEX_WAKE:
+	case OLD_FUTEX_WAKE:
 		ret = futex_wake(uaddr, pos_in_page, val);
 		break;
-	case FUTEX_REQUEUE:
-		pos_in_page2 = uaddr2 % PAGE_SIZE;
-
-		/* Must be "naturally" aligned */
-		if (pos_in_page2 % sizeof(u32))
-			return -EINVAL;
-		ret = futex_requeue(uaddr, pos_in_page, uaddr2, pos_in_page2, val);
-		break;
-	case FUTEX_FD:
+	case OLD_FUTEX_FD:
 		/* non-zero val means F_SETOWN(getpid()) & F_SETSIG(val) */
 		ret = futex_fd(uaddr, pos_in_page, val);
 		break;
@@ -517,17 +513,63 @@ long do_futex(unsigned long uaddr, int o
 	return ret;
 }
 
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
+asmlinkage long old_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
 
-	if ((op == FUTEX_WAIT) && utime) {
+	if ((op == OLD_FUTEX_WAIT) && utime) {
+		if (copy_from_user(&t, utime, sizeof(t)) != 0)
+			return -EFAULT;
+		timeout = timespec_to_jiffies(&t) + 1;
+	}
+	return do_old_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
+}
+
+asmlinkage long sys_futex_wait(u32 __user *__uaddr, int val, struct timespec __user *utime)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
+	struct timespec t;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	if (utime) {
 		if (copy_from_user(&t, utime, sizeof(t)) != 0)
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
+	return futex_wait(uaddr, pos_in_page, val, timeout);
+}
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int val)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	return futex_wake(uaddr, pos_in_page, val);
+}
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1,
+				u32 __user *__uaddr2, int nr_wake)
+{
+	unsigned long uaddr1 = (unsigned long)__uaddr1,
+			uaddr2 = (unsigned long)__uaddr2;
+	unsigned long pos_in_page1 = uaddr1 % PAGE_SIZE,
+			pos_in_page2 = uaddr2 % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if ((pos_in_page1 | pos_in_page2) % sizeof(u32))
+		return -EINVAL;
+
+	return futex_requeue(uaddr1, pos_in_page1, uaddr2, pos_in_page2, nr_wake);
 }
 
 static struct super_block *
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -457,7 +457,7 @@ void mm_release(struct task_struct *tsk,
 		 * not set up a proper pointer then tough luck.
 		 */
 		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL);
+		sys_futex_wake(tidptr, 1);
 	}
 }
 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 12:33       ` [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2 Ingo Molnar
@ 2003-05-19 12:51         ` Ingo Molnar
  2003-05-19 14:47         ` bert hubert
  2003-05-19 15:18         ` Linus Torvalds
  2 siblings, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-19 12:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, linux-kernel, Rusty Russell, Ulrich Drepper


updated patch (the inlining changes were missing):

--- linux/include/linux/futex.h.orig	
+++ linux/include/linux/futex.h	
@@ -2,11 +2,13 @@
 #define _LINUX_FUTEX_H
 
 /* Second argument to futex syscall */
-#define FUTEX_WAIT (0)
-#define FUTEX_WAKE (1)
-#define FUTEX_FD (2)
-#define FUTEX_REQUEUE (3)
 
-extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2);
+asmlinkage long sys_futex_wait(u32 __user *__uaddr,int val,
+				struct timespec __user *utime);
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int val);
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1, u32 __user *__uaddr2,
+					int nr_wake);
 
 #endif
--- linux/arch/i386/kernel/entry.S.orig	
+++ linux/arch/i386/kernel/entry.S	
@@ -837,7 +837,7 @@ ENTRY(sys_call_table)
 	.long sys_fremovexattr
 	.long sys_tkill
 	.long sys_sendfile64
-	.long sys_futex		/* 240 */
+	.long old_futex		/* 240 */
 	.long sys_sched_setaffinity
 	.long sys_sched_getaffinity
 	.long sys_set_thread_area
@@ -865,6 +865,8 @@ ENTRY(sys_call_table)
  	.long sys_clock_gettime		/* 265 */
  	.long sys_clock_getres
  	.long sys_clock_nanosleep
- 
+	.long sys_futex_wait
+	.long sys_futex_wake
+	.long sys_futex_requeue		/* 270 */
  
 nr_syscalls=(.-sys_call_table)/4
--- linux/kernel/futex.c.orig	
+++ linux/kernel/futex.c	
@@ -98,13 +98,13 @@ static inline struct list_head *hash_fut
  *
  * Must be called with (and returns with) all futex-MM locks held.
  */
-static inline
-struct page *__pin_page_atomic (struct page *page)
+static inline struct page *__pin_page_atomic (struct page *page)
 {
 	if (!PageReserved(page))
 		get_page(page);
 	return page;
 }
+
 static struct page *__pin_page(unsigned long addr)
 {
 	struct mm_struct *mm = current->mm;
@@ -155,7 +155,7 @@ static inline void unpin_page(struct pag
  * Wake up all waiters hashed on the physical page that is mapped
  * to this virtual address:
  */
-static int futex_wake(unsigned long uaddr, int offset, int num)
+static inline int futex_wake(unsigned long uaddr, int offset, int num)
 {
 	struct list_head *i, *next, *head;
 	struct page *page;
@@ -220,7 +220,8 @@ static void futex_vcache_callback(vcache
  * Requeue all waiters hashed on one physical page to another
  * physical page.
  */
-static int futex_requeue(unsigned long uaddr1, int offset1, unsigned long uaddr2, int offset2, int num)
+static inline int futex_requeue(unsigned long uaddr1, int offset1,
+		unsigned long uaddr2, int offset2, int num)
 {
 	struct list_head *i, *next, *head1, *head2;
 	struct page *page1 = NULL, *page2 = NULL;
@@ -308,7 +309,7 @@ static inline int unqueue_me(struct fute
 	return ret;
 }
 
-static int futex_wait(unsigned long uaddr,
+static inline int futex_wait(unsigned long uaddr,
 		      int offset,
 		      int val,
 		      unsigned long time)
@@ -347,7 +348,7 @@ static int futex_wait(unsigned long uadd
 	 * The get_user() above might fault and schedule so we
 	 * cannot just set TASK_INTERRUPTIBLE state when queueing
 	 * ourselves into the futex hash. This code thus has to
-	 * rely on the FUTEX_WAKE code doing a wakeup after removing
+	 * rely on the futex_wake() code doing a wakeup after removing
 	 * the waiter from the list.
 	 */
 	add_wait_queue(&q.waiters, &wait);
@@ -481,9 +482,13 @@ out:
 	return ret;
 }
 
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
+#define OLD_FUTEX_WAIT (0)
+#define OLD_FUTEX_WAKE (1)
+#define OLD_FUTEX_FD (2)
+
+static long do_old_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
 {
-	unsigned long pos_in_page, pos_in_page2;
+	unsigned long pos_in_page;
 	int ret;
 
 	pos_in_page = uaddr % PAGE_SIZE;
@@ -493,21 +498,13 @@ long do_futex(unsigned long uaddr, int o
 		return -EINVAL;
 
 	switch (op) {
-	case FUTEX_WAIT:
+	case OLD_FUTEX_WAIT:
 		ret = futex_wait(uaddr, pos_in_page, val, timeout);
 		break;
-	case FUTEX_WAKE:
+	case OLD_FUTEX_WAKE:
 		ret = futex_wake(uaddr, pos_in_page, val);
 		break;
-	case FUTEX_REQUEUE:
-		pos_in_page2 = uaddr2 % PAGE_SIZE;
-
-		/* Must be "naturally" aligned */
-		if (pos_in_page2 % sizeof(u32))
-			return -EINVAL;
-		ret = futex_requeue(uaddr, pos_in_page, uaddr2, pos_in_page2, val);
-		break;
-	case FUTEX_FD:
+	case OLD_FUTEX_FD:
 		/* non-zero val means F_SETOWN(getpid()) & F_SETSIG(val) */
 		ret = futex_fd(uaddr, pos_in_page, val);
 		break;
@@ -517,17 +514,63 @@ long do_futex(unsigned long uaddr, int o
 	return ret;
 }
 
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
+asmlinkage long old_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
 
-	if ((op == FUTEX_WAIT) && utime) {
+	if ((op == OLD_FUTEX_WAIT) && utime) {
 		if (copy_from_user(&t, utime, sizeof(t)) != 0)
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
+	return do_old_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
+}
+
+asmlinkage long sys_futex_wait(u32 __user *__uaddr, int val, struct timespec __user *utime)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
+	struct timespec t;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	if (utime) {
+		if (copy_from_user(&t, utime, sizeof(t)) != 0)
+			return -EFAULT;
+		timeout = timespec_to_jiffies(&t) + 1;
+	}
+	return futex_wait(uaddr, pos_in_page, val, timeout);
+}
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int val)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	return futex_wake(uaddr, pos_in_page, val);
+}
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1,
+				u32 __user *__uaddr2, int nr_wake)
+{
+	unsigned long uaddr1 = (unsigned long)__uaddr1,
+			uaddr2 = (unsigned long)__uaddr2;
+	unsigned long pos_in_page1 = uaddr1 % PAGE_SIZE,
+			pos_in_page2 = uaddr2 % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if ((pos_in_page1 | pos_in_page2) % sizeof(u32))
+		return -EINVAL;
+
+	return futex_requeue(uaddr1, pos_in_page1, uaddr2, pos_in_page2, nr_wake);
 }
 
 static struct super_block *
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -457,7 +457,7 @@ void mm_release(struct task_struct *tsk,
 		 * not set up a proper pointer then tough luck.
 		 */
 		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL);
+		sys_futex_wake(tidptr, 1);
 	}
 }
 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 12:33       ` [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2 Ingo Molnar
  2003-05-19 12:51         ` Ingo Molnar
@ 2003-05-19 14:47         ` bert hubert
  2003-05-19 16:02           ` Ingo Molnar
  2003-05-19 15:18         ` Linus Torvalds
  2 siblings, 1 reply; 43+ messages in thread
From: bert hubert @ 2003-05-19 14:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, Rusty Russell,
	Ulrich Drepper

On Mon, May 19, 2003 at 02:33:01PM +0200, Ingo Molnar wrote:
>  - start the phasing out of FUTEX_FD. This i believe is quite unclean and
>    unrobust, because it attaches a new concept (futexes) to a very old
>    (polling) concept. We want futex support in kernel-AIO, not in the
>    polling APIs. AFAIK only NGPT uses FUTEX_FD.

I for one would want the ability to select, poll and epoll on a futex while
also being notified of availability of data on sockets. I see no alternative
even, except for messing with signals or running select with a small
timeout, introducing needless latency.

It may be weird, but it does work in practice. 'Unrobust' would be a problem
but I fail to see how this is unclean.

Thanks.


-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 12:33       ` [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2 Ingo Molnar
  2003-05-19 12:51         ` Ingo Molnar
  2003-05-19 14:47         ` bert hubert
@ 2003-05-19 15:18         ` Linus Torvalds
  2003-05-19 16:06           ` Ingo Molnar
  2 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2003-05-19 15:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, linux-kernel, Rusty Russell, Ulrich Drepper


On Mon, 19 May 2003, Ingo Molnar wrote:
> 
>  - start the phasing out of FUTEX_FD. This i believe is quite unclean and
>    unrobust, because it attaches a new concept (futexes) to a very old
>    (polling) concept. We want futex support in kernel-AIO, not in the
>    polling APIs. AFAIK only NGPT uses FUTEX_FD.

This sounds like a bad idea. 

Expecting "select()" and "poll()" to go away, and calling them "unclean
and unrobust" is just silly and stupid. There are a hell of a lot more 
programs using select-loops out there than there are AIO versions, and I'd 
argue that AIO is likely to be the much more "unrobust" solution, and 
probably doesn't even scale any better than using epoll.

In fact, it's hard to see any real advantages of aio over a sane polling
loop, as long as the polling doesn't have some O(n) overhead (in other
words, as long as you use epoll).

So stop pushing your own agenda and broken morals down other peoples 
throats, and re-do this patch properly.

		Linus



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 14:47         ` bert hubert
@ 2003-05-19 16:02           ` Ingo Molnar
  0 siblings, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-19 16:02 UTC (permalink / raw)
  To: bert hubert
  Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, Rusty Russell,
	Ulrich Drepper


On Mon, 19 May 2003, bert hubert wrote:

> I for one would want the ability to select, poll and epoll on a futex
> while also being notified of availability of data on sockets. I see no
> alternative even, except for messing with signals or running select with
> a small timeout, introducing needless latency.
> 
> It may be weird, but it does work in practice. 'Unrobust' would be a
> problem but I fail to see how this is unclean.

ok, i was flaming away mindlessly. New patch on the way.

	Ingo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 15:18         ` Linus Torvalds
@ 2003-05-19 16:06           ` Ingo Molnar
  2003-05-19 23:33             ` Jamie Lokier
  2003-05-20  0:15             ` Rusty Russell
  0 siblings, 2 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-19 16:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, linux-kernel, Rusty Russell, Ulrich Drepper


> >  - start the phasing out of FUTEX_FD. This i believe is quite unclean and
> >    unrobust, [...]

FUTEX_FD is an instant DoS, it allows the pinning of one page per file
descriptor, per thread. With a default limit of 1024 open files per
thread, and 256 threads (on a sane/conservative setup), this means 1 GB of
RAM can be pinned down by a normal unprivileged user.

Any suggestions how to fix it - or am i missing something why it should
not be considered a problem?

updated patch attached - it now has proper sys_futex_fd() support as well.

	Ingo

--- linux/include/linux/futex.h.orig	
+++ linux/include/linux/futex.h	
@@ -2,11 +2,15 @@
 #define _LINUX_FUTEX_H
 
 /* Second argument to futex syscall */
-#define FUTEX_WAIT (0)
-#define FUTEX_WAKE (1)
-#define FUTEX_FD (2)
-#define FUTEX_REQUEUE (3)
 
-extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2);
+asmlinkage long sys_futex_wait(u32 __user *__uaddr,int val,
+				struct timespec __user *utime);
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int nr_wake);
+
+asmlinkage long sys_futex_fd(u32 __user *__uaddr, int signal);
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1, u32 __user *__uaddr2,
+					int nr_wake);
 
 #endif
--- linux/arch/i386/kernel/entry.S.orig	
+++ linux/arch/i386/kernel/entry.S	
@@ -837,7 +837,7 @@ ENTRY(sys_call_table)
 	.long sys_fremovexattr
 	.long sys_tkill
 	.long sys_sendfile64
-	.long sys_futex		/* 240 */
+	.long old_futex		/* 240 */
 	.long sys_sched_setaffinity
 	.long sys_sched_getaffinity
 	.long sys_set_thread_area
@@ -865,6 +865,9 @@ ENTRY(sys_call_table)
  	.long sys_clock_gettime		/* 265 */
  	.long sys_clock_getres
  	.long sys_clock_nanosleep
- 
+	.long sys_futex_wait
+	.long sys_futex_wake
+	.long sys_futex_fd		/* 270 */
+	.long sys_futex_requeue
  
 nr_syscalls=(.-sys_call_table)/4
--- linux/kernel/futex.c.orig	
+++ linux/kernel/futex.c	
@@ -98,13 +98,13 @@ static inline struct list_head *hash_fut
  *
  * Must be called with (and returns with) all futex-MM locks held.
  */
-static inline
-struct page *__pin_page_atomic (struct page *page)
+static inline struct page *__pin_page_atomic (struct page *page)
 {
 	if (!PageReserved(page))
 		get_page(page);
 	return page;
 }
+
 static struct page *__pin_page(unsigned long addr)
 {
 	struct mm_struct *mm = current->mm;
@@ -155,7 +155,7 @@ static inline void unpin_page(struct pag
  * Wake up all waiters hashed on the physical page that is mapped
  * to this virtual address:
  */
-static int futex_wake(unsigned long uaddr, int offset, int num)
+static inline int futex_wake(unsigned long uaddr, int offset, int num)
 {
 	struct list_head *i, *next, *head;
 	struct page *page;
@@ -220,7 +220,8 @@ static void futex_vcache_callback(vcache
  * Requeue all waiters hashed on one physical page to another
  * physical page.
  */
-static int futex_requeue(unsigned long uaddr1, int offset1, unsigned long uaddr2, int offset2, int num)
+static inline int futex_requeue(unsigned long uaddr1, int offset1,
+		unsigned long uaddr2, int offset2, int num)
 {
 	struct list_head *i, *next, *head1, *head2;
 	struct page *page1 = NULL, *page2 = NULL;
@@ -308,7 +309,7 @@ static inline int unqueue_me(struct fute
 	return ret;
 }
 
-static int futex_wait(unsigned long uaddr,
+static inline int futex_wait(unsigned long uaddr,
 		      int offset,
 		      int val,
 		      unsigned long time)
@@ -347,7 +348,7 @@ static int futex_wait(unsigned long uadd
 	 * The get_user() above might fault and schedule so we
 	 * cannot just set TASK_INTERRUPTIBLE state when queueing
 	 * ourselves into the futex hash. This code thus has to
-	 * rely on the FUTEX_WAKE code doing a wakeup after removing
+	 * rely on the futex_wake() code doing a wakeup after removing
 	 * the waiter from the list.
 	 */
 	add_wait_queue(&q.waiters, &wait);
@@ -481,9 +482,13 @@ out:
 	return ret;
 }
 
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
+#define OLD_FUTEX_WAIT (0)
+#define OLD_FUTEX_WAKE (1)
+#define OLD_FUTEX_FD (2)
+
+static long do_old_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
 {
-	unsigned long pos_in_page, pos_in_page2;
+	unsigned long pos_in_page;
 	int ret;
 
 	pos_in_page = uaddr % PAGE_SIZE;
@@ -493,21 +498,13 @@ long do_futex(unsigned long uaddr, int o
 		return -EINVAL;
 
 	switch (op) {
-	case FUTEX_WAIT:
+	case OLD_FUTEX_WAIT:
 		ret = futex_wait(uaddr, pos_in_page, val, timeout);
 		break;
-	case FUTEX_WAKE:
+	case OLD_FUTEX_WAKE:
 		ret = futex_wake(uaddr, pos_in_page, val);
 		break;
-	case FUTEX_REQUEUE:
-		pos_in_page2 = uaddr2 % PAGE_SIZE;
-
-		/* Must be "naturally" aligned */
-		if (pos_in_page2 % sizeof(u32))
-			return -EINVAL;
-		ret = futex_requeue(uaddr, pos_in_page, uaddr2, pos_in_page2, val);
-		break;
-	case FUTEX_FD:
+	case OLD_FUTEX_FD:
 		/* non-zero val means F_SETOWN(getpid()) & F_SETSIG(val) */
 		ret = futex_fd(uaddr, pos_in_page, val);
 		break;
@@ -517,17 +514,75 @@ long do_futex(unsigned long uaddr, int o
 	return ret;
 }
 
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
+asmlinkage long old_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
 
-	if ((op == FUTEX_WAIT) && utime) {
+	if ((op == OLD_FUTEX_WAIT) && utime) {
 		if (copy_from_user(&t, utime, sizeof(t)) != 0)
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
+	return do_old_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
+}
+
+asmlinkage long sys_futex_wait(u32 __user *__uaddr, int val, struct timespec __user *utime)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
+	struct timespec t;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	if (utime) {
+		if (copy_from_user(&t, utime, sizeof(t)) != 0)
+			return -EFAULT;
+		timeout = timespec_to_jiffies(&t) + 1;
+	}
+	return futex_wait(uaddr, pos_in_page, val, timeout);
+}
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int nr_wake)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	return futex_wake(uaddr, pos_in_page, nr_wake);
+}
+
+asmlinkage long sys_futex_fd(u32 __user *__uaddr, int signal)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	return futex_fd(uaddr, pos_in_page, signal);
+}
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1,
+				u32 __user *__uaddr2, int nr_wake)
+{
+	unsigned long uaddr1 = (unsigned long)__uaddr1,
+			uaddr2 = (unsigned long)__uaddr2;
+	unsigned long pos_in_page1 = uaddr1 % PAGE_SIZE,
+			pos_in_page2 = uaddr2 % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if ((pos_in_page1 | pos_in_page2) % sizeof(u32))
+		return -EINVAL;
+
+	return futex_requeue(uaddr1, pos_in_page1, uaddr2, pos_in_page2, nr_wake);
 }
 
 static struct super_block *
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -457,7 +457,7 @@ void mm_release(struct task_struct *tsk,
 		 * not set up a proper pointer then tough luck.
 		 */
 		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL);
+		sys_futex_wake(tidptr, 1);
 	}
 }
 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 16:06           ` Ingo Molnar
@ 2003-05-19 23:33             ` Jamie Lokier
  2003-05-20  0:39               ` Rusty Russell
  2003-05-20  0:15             ` Rusty Russell
  1 sibling, 1 reply; 43+ messages in thread
From: Jamie Lokier @ 2003-05-19 23:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Christoph Hellwig, linux-kernel, Rusty Russell,
	Ulrich Drepper

Ingo Molnar wrote:
> > >  - start the phasing out of FUTEX_FD. This i believe is quite unclean and
> > >    unrobust, [...]
> 
> FUTEX_FD is an instant DoS, it allows the pinning of one page per file
> descriptor, per thread. With a default limit of 1024 open files per
> thread, and 256 threads (on a sane/conservative setup), this means 1 GB of
> RAM can be pinned down by a normal unprivileged user.

The correct solution [;)] is EP_FUTEX - allow a futex to be specified
as the source of an epoll event.

-- Jamie

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19  9:31 [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Ingo Molnar
  2003-05-19 10:10 ` Christoph Hellwig
  2003-05-19 10:23 ` Andrew Morton
@ 2003-05-20  0:04 ` Rusty Russell
  2003-05-20  0:40   ` Ulrich Drepper
  2003-05-20  6:19   ` Ingo Molnar
  2 siblings, 2 replies; 43+ messages in thread
From: Rusty Russell @ 2003-05-20  0:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper

In message <Pine.LNX.4.44.0305191103500.5653-100000@localhost.localdomain> you write:
> 
> the attached patch addresses a futex related SMP scalability problem of
> glibc. A number of regressions have been reported to the NTPL mailing list
> when going to many CPUs, for applications that use condition variables and
> the pthread_cond_broadcast() API call. Using this functionality, testcode
> shows a slowdown from 0.12 seconds runtime to over 237 seconds (!)  
> runtime, on 4-CPU systems.

I gave feedback on this before, but didn't get a response.

1) Overload the last futex arg (change from timeval * to void *),
   don't add YA arg at the end.

2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
   you really mean __alignof__ here.

I'm glad you finally put your name at the top of the file...

Thanks!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19 10:10 ` Christoph Hellwig
  2003-05-19 10:16   ` Ingo Molnar
@ 2003-05-20  0:08   ` Rusty Russell
  2003-05-20  0:31     ` Valdis.Kletnieks
  1 sibling, 1 reply; 43+ messages in thread
From: Rusty Russell @ 2003-05-20  0:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper

In message <20030519111028.B8663@infradead.org> you write:
> Urgg, yet another sys_futex extension.  Could you please split all
> these totally different cases into separate syscalls instead?

I don't mind either way, but wouldn't that be pointless incompatible
churn?

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 16:06           ` Ingo Molnar
  2003-05-19 23:33             ` Jamie Lokier
@ 2003-05-20  0:15             ` Rusty Russell
  1 sibling, 0 replies; 43+ messages in thread
From: Rusty Russell @ 2003-05-20  0:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Christoph Hellwig, linux-kernel, Ulrich Drepper

In message <Pine.LNX.4.44.0305191752130.13233-100000@localhost.localdomain> you
 write:
> 
> > >  - start the phasing out of FUTEX_FD. This i believe is quite unclean and
> > >    unrobust, [...]
> 
> FUTEX_FD is an instant DoS, it allows the pinning of one page per file
> descriptor, per thread. With a default limit of 1024 open files per
> thread, and 256 threads (on a sane/conservative setup), this means 1 GB of
> RAM can be pinned down by a normal unprivileged user.

Yes.  There was a patch which limited it, never got applied.

The real solution is not to pin the page: I pinned the page originally
to prevent dealing with addresses changing due to swap out, but you
found the COW bug and that blew away that theory anyway 8)

I think the vcache callbacks or similar could be extended to cover the
swap out/swap in case.

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:08   ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
@ 2003-05-20  0:31     ` Valdis.Kletnieks
  0 siblings, 0 replies; 43+ messages in thread
From: Valdis.Kletnieks @ 2003-05-20  0:31 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On Tue, 20 May 2003 10:08:46 +1000, Rusty Russell said:
> In message <20030519111028.B8663@infradead.org> you write:
> > Urgg, yet another sys_futex extension.  Could you please split all
> > these totally different cases into separate syscalls instead?
> 
> I don't mind either way, but wouldn't that be pointless incompatible
> churn?

Doesn't matter, the churn hasn't been reflected in glibc-kernheaders yet
either way.  Once kernheaders gets updated to match, you're stuck with whatever
it is at that point, so churn now while you have a chance. ;)

(Sorry, I couldn't resist)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-19 23:33             ` Jamie Lokier
@ 2003-05-20  0:39               ` Rusty Russell
  2003-05-20  1:14                 ` Davide Libenzi
  2003-05-20  1:50                 ` Jamie Lokier
  0 siblings, 2 replies; 43+ messages in thread
From: Rusty Russell @ 2003-05-20  0:39 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Linus Torvalds, Christoph Hellwig, linux-kernel, Ulrich Drepper

In message <20030519233353.GD13706@mail.jlokier.co.uk> you write:
> Ingo Molnar wrote:
> > FUTEX_FD is an instant DoS, it allows the pinning of one page per file
> > descriptor, per thread. With a default limit of 1024 open files per
> > thread, and 256 threads (on a sane/conservative setup), this means 1 GB of
> > RAM can be pinned down by a normal unprivileged user.
> 
> The correct solution [;)] is EP_FUTEX - allow a futex to be specified
> as the source of an epoll event.

Hi Jamie!

No, you don't understand (or maybe you were being facetious 8).

A normal futex blocks: ie. you can only have one futex being slept on
per process, so pinning a page is probably fine.

The NGPT guys wanted to allow waiting on more than one futex per
process.  If you want to allow this through *any* mechanism, you have to:

1) Allocate something for each futex.  The normal case uses the stack,
   so never fails.

2) Control the number which can be used at once.

The current implementation needs a tighter #2.

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:04 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
@ 2003-05-20  0:40   ` Ulrich Drepper
  2003-05-20  1:46     ` Rusty Russell
  2003-05-20  6:19   ` Ingo Molnar
  1 sibling, 1 reply; 43+ messages in thread
From: Ulrich Drepper @ 2003-05-20  0:40 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, Linus Torvalds, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rusty Russell wrote:

> 1) Overload the last futex arg (change from timeval * to void *),
>    don't add YA arg at the end.

It wasn't Ingo's idea.  I suggested it.  Overloading parameter types is
evil.  This isn't an issue anymore if the extension is implemented as a
new syscall which certainly is better.


> 2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
>    you really mean __alignof__ here.

I would always use sizeof() in this situation.  alignof is a property
the compiler can potentially relax.  On x86 it could easily be defined
to 1 in case someone wants to save memory.  OK, not really useful for
x86, but some embedded archs might be in such a situation.  Using
sizeof() makes sure that the variable is always 4-byte aligned.  For
atomic operations this certainly is a good minimal requirement.

If you want to really correct you might have to introduce something like
atomic_alignof().

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+yXmW2ijCOnn/RHQRAnXcAKCeYFvfkKIO/bbwqX1vUvbLkBvHKwCeN6zB
99qZZOVLMckvh6lxieUfVpY=
=LvwT
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-20  0:39               ` Rusty Russell
@ 2003-05-20  1:14                 ` Davide Libenzi
  2003-05-20  1:44                   ` Jamie Lokier
  2003-05-20  1:50                 ` Jamie Lokier
  1 sibling, 1 reply; 43+ messages in thread
From: Davide Libenzi @ 2003-05-20  1:14 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jamie Lokier, Linus Torvalds, Christoph Hellwig,
	Linux Kernel Mailing List, Ulrich Drepper

On Tue, 20 May 2003, Rusty Russell wrote:

> In message <20030519233353.GD13706@mail.jlokier.co.uk> you write:
> > Ingo Molnar wrote:
> > > FUTEX_FD is an instant DoS, it allows the pinning of one page per file
> > > descriptor, per thread. With a default limit of 1024 open files per
> > > thread, and 256 threads (on a sane/conservative setup), this means 1 GB of
> > > RAM can be pinned down by a normal unprivileged user.
> >
> > The correct solution [;)] is EP_FUTEX - allow a futex to be specified
> > as the source of an epoll event.

Futexes do support f_op->poll(), isn't it Rusty ? If so, they're supported
by epoll ...



- Davide


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-20  1:14                 ` Davide Libenzi
@ 2003-05-20  1:44                   ` Jamie Lokier
  2003-05-20 16:54                     ` Davide Libenzi
  0 siblings, 1 reply; 43+ messages in thread
From: Jamie Lokier @ 2003-05-20  1:44 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Rusty Russell, Linus Torvalds, Christoph Hellwig,
	Linux Kernel Mailing List, Ulrich Drepper

Davide Libenzi wrote:
> > In message <20030519233353.GD13706@mail.jlokier.co.uk> you write:
> > > Ingo Molnar wrote:
> > > > FUTEX_FD is an instant DoS, it allows the pinning of one page per file
> > > > descriptor, per thread. With a default limit of 1024 open files per
> > > > thread, and 256 threads (on a sane/conservative setup), this means 1 GB of
> > > > RAM can be pinned down by a normal unprivileged user.
> > >
> > > The correct solution [;)] is EP_FUTEX - allow a futex to be specified
> > > as the source of an epoll event.
> 
> Futexes do support f_op->poll(), isn't it Rusty ? If so, they're supported
> by epoll ...

Yes, they do and it should work (I haven't tried, though).

There is a practical problem when waiting on a futex in multiple
threads using epoll: you need a separate fd per waiter, rather than an
fd per waited-on futex.  This is because some uses of futexes depend
on waiters being woken in the exact order that they were queued.

To get this ordering, every waiter must allocate its own fd.  The only
practical way to do this is allocate the fd just prior to waiting, and
release it afterwards.

As you can imagine, with many threads this implies a lot of fds, which
are in limited supply, and a high rate of allocation and deallocation,
which may be relatively slow.

-- Jamie

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:40   ` Ulrich Drepper
@ 2003-05-20  1:46     ` Rusty Russell
  2003-05-20  2:11       ` Ulrich Drepper
  2003-05-20  6:27       ` Ingo Molnar
  0 siblings, 2 replies; 43+ messages in thread
From: Rusty Russell @ 2003-05-20  1:46 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Ingo Molnar, Linus Torvalds, linux-kernel

In message <3EC97996.1080800@redhat.com> you write:
> Rusty Russell wrote:
> 
> > 1) Overload the last futex arg (change from timeval * to void *),
> >    don't add YA arg at the end.
> 
> It wasn't Ingo's idea.  I suggested it.  Overloading parameter types is
> evil.  This isn't an issue anymore if the extension is implemented as a
> new syscall which certainly is better.

People are using the interface, so I don't think changing it because
"it's nicer this way" is worthwhile: Ingo's "new syscall" patch has
backwards compat code for the old syscalls.  That's fugly 8(

So I'd say if we're going to multiplex, then overloading is easiest,
and in fact already done for FUTEX_FD.

But it's an issue over which sane people can disagree, IMHO.

> > 2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
> >    you really mean __alignof__ here.
> 
> I would always use sizeof() in this situation.

Comment says: /* Must be "naturally" aligned */.  This used to be true
in a much earlier version of the code, now AFAICT the requirement test
should be:

	/* Handling futexes on multiple pages?  -ETOOHARD */
	if (pos_in_page + sizeof(u32) > PAGE_SIZE)
		return -EINVAL;

Ingo?

Thanks for the comments!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-20  0:39               ` Rusty Russell
  2003-05-20  1:14                 ` Davide Libenzi
@ 2003-05-20  1:50                 ` Jamie Lokier
  1 sibling, 0 replies; 43+ messages in thread
From: Jamie Lokier @ 2003-05-20  1:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Christoph Hellwig, linux-kernel, Ulrich Drepper

Rusty Russell wrote:
> 2) Control the number which can be used at once.
> The current implementation needs a tighter #2.

Alternatively, make a page swappable even while it's being waited on
somehow.  Can the vcache help with that?

-- Jamie


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  1:46     ` Rusty Russell
@ 2003-05-20  2:11       ` Ulrich Drepper
  2003-05-20  6:27       ` Ingo Molnar
  1 sibling, 0 replies; 43+ messages in thread
From: Ulrich Drepper @ 2003-05-20  2:11 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, Linus Torvalds, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rusty Russell wrote:

> People are using the interface, so I don't think changing it because
> "it's nicer this way" is worthwhile: Ingo's "new syscall" patch has
> backwards compat code for the old syscalls.

But not the new requeue variant and this is what needs the extra
parameter.  So the parameter question is not issue.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+yY7n2ijCOnn/RHQRAuwkAJ93BCAOgeebWmWvClkYLCiE7CHAIwCgmu6N
9xteUehAOkcGrOWnPL5Wi5U=
=lHNT
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:04 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
  2003-05-20  0:40   ` Ulrich Drepper
@ 2003-05-20  6:19   ` Ingo Molnar
  1 sibling, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-20  6:19 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper


On Tue, 20 May 2003, Rusty Russell wrote:

> 1) Overload the last futex arg (change from timeval * to void *),
>    don't add YA arg at the end.

the right approach is the splitup of the arguments, done in the later
cleanup patch. But i'd rather add yet another argument than some nasty
overload of arguments ...

> 2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
>    you really mean __alignof__ here.

ok. Could they ever be realistically different?

	Ingo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  1:46     ` Rusty Russell
  2003-05-20  2:11       ` Ulrich Drepper
@ 2003-05-20  6:27       ` Ingo Molnar
  2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:55         ` Rusty Russell
  1 sibling, 2 replies; 43+ messages in thread
From: Ingo Molnar @ 2003-05-20  6:27 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel


On Tue, 20 May 2003, Rusty Russell wrote:

> > It wasn't Ingo's idea.  I suggested it.  Overloading parameter types is
> > evil.  This isn't an issue anymore if the extension is implemented as a
> > new syscall which certainly is better.
> 
> People are using the interface, [...]

it's a _new_ argument, only used by new glibc. Like i mentioned it in the
original announcement, it's fully backwards compatible with any older app.
This is a minimum requirement for a live syscall!

> [...] so I don't think changing it because "it's nicer this way" is
> worthwhile: Ingo's "new syscall" patch has backwards compat code for the
> old syscalls.  That's fugly 8(

yes, but the damage has been done already, and now we've got to start the
slow wait for the old syscall to flush out of our tree. It will a few
years to get rid of the compat code, but we better start now. hch is
perfectly right that the old futex multiplexer interface is quite ugly,
the requeue op only made this even more prominent.

> Comment says: /* Must be "naturally" aligned */.  This used to be true
> in a much earlier version of the code, now AFAICT the requirement test
> should be:
> 
> 	/* Handling futexes on multiple pages?  -ETOOHARD */
> 	if (pos_in_page + sizeof(u32) > PAGE_SIZE)
> 		return -EINVAL;

yes - but i'd rather enforce this for every futex, than to hit it in every
1000th app that manages to misalign a futex _and_ lay it across two pages.  

Also, it's only x86 that guarantees atomic instructions on misaligned
futexes (even then it comes with a cycle penalty), are you sure this also
works on other architectures? So i'd rather be a bit more strict with this
requirement.

	Ingo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:27       ` Ingo Molnar
@ 2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:57           ` Rusty Russell
                             ` (2 more replies)
  2003-05-20  8:55         ` Rusty Russell
  1 sibling, 3 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-05-20  6:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel

On Tue, May 20, 2003 at 08:27:03AM +0200, Ingo Molnar wrote:
> yes, but the damage has been done already, and now we've got to start the
> slow wait for the old syscall to flush out of our tree.

Actually it should go away before 2.6.0.  sys_futex never was part of a
released stable kernel so having the old_ version around is silly.  I
Think it's enough time until 2.6 hits the roads for people to have those
vendor libc flushed out that use it.  (and sys_futex still isn't used
in the glibc CVS, only in the addon nptl package with pre-1 release
numbers.)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:27       ` Ingo Molnar
  2003-05-20  6:56         ` Christoph Hellwig
@ 2003-05-20  8:55         ` Rusty Russell
  1 sibling, 0 replies; 43+ messages in thread
From: Rusty Russell @ 2003-05-20  8:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel

In message <Pine.LNX.4.44.0305200821010.2445-100000@localhost.localdomain> you write:
> yes, but the damage has been done already, and now we've got to start the
> slow wait for the old syscall to flush out of our tree. It will a few
> years to get rid of the compat code, but we better start now. hch is
> perfectly right that the old futex multiplexer interface is quite ugly,
> the requeue op only made this even more prominent.

It's a judgement call: how simple it is to change vs. the amount of
damage done by not changing it.

I don't think it's worth changing, but I don't think we're going to
convince each other.

> > Comment says: /* Must be "naturally" aligned */.  This used to be true
> > in a much earlier version of the code, now AFAICT the requirement test
> > should be:
> > 

> > 	/* Handling futexes on multiple pages?  -ETOOHARD */
> > 	if (pos_in_page + sizeof(u32) > PAGE_SIZE)
> > 		return -EINVAL;
> 
> yes - but i'd rather enforce this for every futex, than to hit it in every
> 1000th app that manages to misalign a futex _and_ lay it across two pages.  

Good point.  I'd prefer to fix the comment though, since it's not
true.  How about changing it to something like:

      	/* We can't handle futexes across multiple pages: best to reject
	   any crazy alignment to save the users from themselves. */

> Also, it's only x86 that guarantees atomic instructions on misaligned
> futexes (even then it comes with a cycle penalty), are you sure this also
> works on other architectures? So i'd rather be a bit more strict with this
> requirement.

Sure.  My point was that this comment is actually from a v. early futex
version where the kernel actually did the atomic ops itself.

Hope that clarifies!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:56         ` Christoph Hellwig
@ 2003-05-20  8:57           ` Rusty Russell
  2003-05-20  9:03             ` Ingo Molnar
  2003-05-20  9:12           ` Ingo Oeser
  2003-05-20 15:41           ` Linus Torvalds
  2 siblings, 1 reply; 43+ messages in thread
From: Rusty Russell @ 2003-05-20  8:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ulrich Drepper, mingo, Linus Torvalds, linux-kernel

In message <20030520075627.A28002@infradead.org> you write:
> On Tue, May 20, 2003 at 08:27:03AM +0200, Ingo Molnar wrote:
> > yes, but the damage has been done already, and now we've got to start the
> > slow wait for the old syscall to flush out of our tree.
> 
> Actually it should go away before 2.6.0.  sys_futex never was part of a
> released stable kernel so having the old_ version around is silly.

Hmm, in that case I'd say "just break it", and I'd be all in favour of
demuxing the syscall.

But I think vendors have backported and released futexes, which is why
Ingo did this...

(Although you might be right about "by the time 2.6 is out" 8)
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  8:57           ` Rusty Russell
@ 2003-05-20  9:03             ` Ingo Molnar
  2003-05-20  9:51               ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Ingo Molnar @ 2003-05-20  9:03 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, Ulrich Drepper, Linus Torvalds, linux-kernel


On Tue, 20 May 2003, Rusty Russell wrote:

> > Actually it should go away before 2.6.0.  sys_futex never was part of a
> > released stable kernel so having the old_ version around is silly.
> 
> Hmm, in that case I'd say "just break it", and I'd be all in favour of
> demuxing the syscall.

have you all gone nuts??? It's not an option to break perfectly working
binaries out there. Hell, we didnt even reorder the new NPTL
syscalls/extensions 1-2 kernel releases after the fact. Please grow up!

the interface should have been gotten right initially. We are all guilty
of it - now lets face the consequences. It's only a couple of lines of
code in a well isolated place of the file so i dont know what the fuss is
about. I havent even added FUTEX_REQUEUE to the old API.

> But I think vendors have backported and released futexes, which is why
> Ingo did this...

of course. And which brought the productization of futexes in the first
place.

	Ingo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:57           ` Rusty Russell
@ 2003-05-20  9:12           ` Ingo Oeser
  2003-05-20 15:41           ` Linus Torvalds
  2 siblings, 0 replies; 43+ messages in thread
From: Ingo Oeser @ 2003-05-20  9:12 UTC (permalink / raw)
  To: Christoph Hellwig, Ingo Molnar, Rusty Russell, Ulrich Drepper,
	Linus Torvalds, linux-kernel

Hi,

On Tue, May 20, 2003 at 07:56:27AM +0100, Christoph Hellwig wrote:
> On Tue, May 20, 2003 at 08:27:03AM +0200, Ingo Molnar wrote:
> > yes, but the damage has been done already, and now we've got to start the
> > slow wait for the old syscall to flush out of our tree.
> 
> Actually it should go away before 2.6.0.  sys_futex never was part of a
> released stable kernel so having the old_ version around is silly.  I
> Think it's enough time until 2.6 hits the roads for people to have those
> vendor libc flushed out that use it.  (and sys_futex still isn't used
> in the glibc CVS, only in the addon nptl package with pre-1 release
> numbers.)

This sounds reasonable. The people who shipped that already to
customers have so heavily patched kernels, that this simple patch
for the old sys_futex shouldn't really matter.

But cluttering the kernel with an API/ABI that was born in the
same development cycle, where it has been obsoleted sounds not
worth the bytes it consumes. And so we have a syscall slot
available for the next development cycle.

Or did anybody promise that this was final already? Usally this
promise comes with x.even.y not with x.odd.y .

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  9:03             ` Ingo Molnar
@ 2003-05-20  9:51               ` Christoph Hellwig
  2003-05-20 12:56                 ` [patch] futex patches, futex-2.5.69-A2 Ingo Molnar
  2003-05-20 15:46                 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Linus Torvalds
  0 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-05-20  9:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel

On Tue, May 20, 2003 at 11:03:36AM +0200, Ingo Molnar wrote:
> have you all gone nuts??? It's not an option to break perfectly working
> binaries out there.

Of course it is.  Linux has enough problem problems due to past mainline
stupidities, now we don't need to codify vendor braindamages aswell.  E.g
mainline doesn't have the RH AS aio vsyscall crap or suse's get dev_t behind
/dev/console stuff either.  If Red Hat thinks it needs to live with more interfaces
than what the stable kernel release provide their on their own luck.

And it's a lot of time to 2.6 anyway, don't tell me Jakub isn't smart enough to
get a glibc rpm out by then that works with the new and the old futex stuff.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch] futex patches, futex-2.5.69-A2
  2003-05-20  9:51               ` Christoph Hellwig
@ 2003-05-20 12:56                 ` Ingo Molnar
  2003-05-20 14:08                   ` Christoph Hellwig
  2003-05-20 15:46                 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Linus Torvalds
  1 sibling, 1 reply; 43+ messages in thread
From: Ingo Molnar @ 2003-05-20 12:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel


On Tue, 20 May 2003, Christoph Hellwig wrote:

> On Tue, May 20, 2003 at 11:03:36AM +0200, Ingo Molnar wrote:
> > have you all gone nuts??? It's not an option to break perfectly working
> > binaries out there.
> 
> Of course it is.  Linux has enough problem problems due to past mainline
> stupidities, now we don't need to codify vendor braindamages aswell.  

it's not vendor braindamage. First, sys_futex() was pretty much given as
an API, contributed by IBM (Rusty) around 2.5.8, mainly for NGPT. The
sys_futex() API would have gone into 2.6.0 with flying colors as-is if
NPTL hadnt been around.

You probably wouldnt even have noticed this uncleanliness in the API, if
not for my requeue patch from yesterday. (which is another feature to make
futexes more useful, also driven by NPTL, not some evil ploy.)

face it, there will always be mistakes when creating APIs. But removing
them arbitrarily just makes the mistake worse, two mistakes do not make a
correct move.

most of the time we catch crap before it gets into the mainline kernel.  
Sometimes it takes months to notice, sometimes it takes years. I certainly
saw no-one complaining since sys_futex() went into 2.5.8 or so, more than
a year ago!

i'd be the last one to complain about some system-utility API being
changed, but here we are talking about tons of user installations.

> And it's a lot of time to 2.6 anyway, don't tell me Jakub isn't smart
> enough to get a glibc rpm out by then that works with the new and the
> old futex stuff.

you dont understand, do you? There are very valid and perfectly working
glibc installations [and maybe NGPT installations, futex users, etc.] out
there that will break if you remove sys_futex(). No amount of rpm hacking
will fix them up.

what do you think the reaction would have been, if half a year ago, when i
sent the first NPTL patches, i had started off by rewriting the
sys_futex() API and break the hell out of NGPT installations, just for the
fun of having a slightly cleaner API. We'd still be stuck with
LinuxThreads, i'm quite sure.

hell, this is 20 lines of code and no strings attached, and i've done all
the work already, it's not a big issue. So stop this purist crap ...

full futex patch against BK-curr attached: it fixes the page count leaks,
cleans up the code, splits up the futex ops into separate syscalls, keeps
the compatibility syscall and adds the requeueing feature. FUTEX_FD is
still fully supported. (and there's a sys_futex_fd syscall) (The patch
also fixes some leftover uaddr2 parameter passing present in the previous
patches.)

(the patch compiles, boots & works on x86/SMP, with NPTL enabled and
disabled as well.)

i think it might be more productive to put all this energy into comments
directed at the new APIs this patch adds.

Eg. is this alignment check the best one we can get? Is this particular
requeue API really the one we want? Plus now is the right time to add or
remove any functionality from the new syscalls. Next time around it will
be too late.

	Ingo

--- linux/include/linux/futex.h.orig	
+++ linux/include/linux/futex.h	
@@ -2,10 +2,15 @@
 #define _LINUX_FUTEX_H
 
 /* Second argument to futex syscall */
-#define FUTEX_WAIT (0)
-#define FUTEX_WAKE (1)
-#define FUTEX_FD (2)
 
-extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime);
+asmlinkage long sys_futex_wait(u32 __user *__uaddr,int val,
+				struct timespec __user *utime);
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int nr_wake);
+
+asmlinkage long sys_futex_fd(u32 __user *__uaddr, int signal);
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1, u32 __user *__uaddr2,
+					int nr_wake);
 
 #endif
--- linux/arch/i386/kernel/entry.S.orig	
+++ linux/arch/i386/kernel/entry.S	
@@ -837,7 +837,7 @@ ENTRY(sys_call_table)
 	.long sys_fremovexattr
 	.long sys_tkill
 	.long sys_sendfile64
-	.long sys_futex		/* 240 */
+	.long old_futex		/* 240 */
 	.long sys_sched_setaffinity
 	.long sys_sched_getaffinity
 	.long sys_set_thread_area
@@ -865,6 +865,9 @@ ENTRY(sys_call_table)
  	.long sys_clock_gettime		/* 265 */
  	.long sys_clock_getres
  	.long sys_clock_nanosleep
- 
+	.long sys_futex_wait
+	.long sys_futex_wake
+	.long sys_futex_fd		/* 270 */
+	.long sys_futex_requeue
  
 nr_syscalls=(.-sys_call_table)/4
--- linux/kernel/compat.c.orig	
+++ linux/kernel/compat.c	
@@ -211,7 +211,7 @@ asmlinkage long compat_sys_sigprocmask(i
 	return ret;
 }
 
-extern long do_futex(unsigned long, int, int, unsigned long);
+extern long do_old_futex(unsigned long uaddr, int op, int val, unsigned long timeout);
 
 asmlinkage long compat_sys_futex(u32 *uaddr, int op, int val,
 		struct compat_timespec *utime)
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -457,7 +457,7 @@ void mm_release(struct task_struct *tsk,
 		 * not set up a proper pointer then tough luck.
 		 */
 		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL);
+		sys_futex_wake(tidptr, 1);
 	}
 }
 
--- linux/kernel/futex.c.orig	
+++ linux/kernel/futex.c	
@@ -2,6 +2,9 @@
  *  Fast Userspace Mutexes (which I call "Futexes!").
  *  (C) Rusty Russell, IBM 2002
  *
+ *  Generalized futexes, futex requeueing, misc fixes by Ingo Molnar
+ *  (C) Copyright 2003 Red Hat Inc, All Rights Reserved
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -9,9 +12,6 @@
  *  "The futexes are also cursed."
  *  "But they come in a choice of three flavours!"
  *
- *  Generalized futexes for every mapping type, Ingo Molnar, 2002
- *
- *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
  *  the Free Software Foundation; either version 2 of the License, or
@@ -93,19 +93,18 @@ static inline struct list_head *hash_fut
 							FUTEX_HASHBITS)];
 }
 
-/* Waiter either waiting in FUTEX_WAIT or poll(), or expecting signal */
-static inline void tell_waiter(struct futex_q *q)
-{
-	wake_up_all(&q->waiters);
-	if (q->filp)
-		send_sigio(&q->filp->f_owner, q->fd, POLL_IN);
-}
-
 /*
  * Get kernel address of the user page and pin it.
  *
  * Must be called with (and returns with) all futex-MM locks held.
  */
+static inline struct page *__pin_page_atomic (struct page *page)
+{
+	if (!PageReserved(page))
+		get_page(page);
+	return page;
+}
+
 static struct page *__pin_page(unsigned long addr)
 {
 	struct mm_struct *mm = current->mm;
@@ -116,11 +115,8 @@ static struct page *__pin_page(unsigned 
 	 * Do a quick atomic lookup first - this is the fastpath.
 	 */
 	page = follow_page(mm, addr, 0);
-	if (likely(page != NULL)) {	
-		if (!PageReserved(page))
-			get_page(page);
-		return page;
-	}
+	if (likely(page != NULL))
+		return __pin_page_atomic(page);
 
 	/*
 	 * No luck - need to fault in the page:
@@ -150,16 +146,11 @@ repeat_lookup:
 	return page;
 }
 
-static inline void unpin_page(struct page *page)
-{
-	put_page(page);
-}
-
 /*
  * Wake up all waiters hashed on the physical page that is mapped
  * to this virtual address:
  */
-static int futex_wake(unsigned long uaddr, int offset, int num)
+static inline int futex_wake(unsigned long uaddr, int offset, int num)
 {
 	struct list_head *i, *next, *head;
 	struct page *page;
@@ -181,7 +172,9 @@ static int futex_wake(unsigned long uadd
 		if (this->page == page && this->offset == offset) {
 			list_del_init(i);
 			__detach_vcache(&this->vcache);
-			tell_waiter(this);
+			wake_up_all(&this->waiters);
+			if (this->filp)
+				send_sigio(&this->filp->f_owner, this->fd, POLL_IN);
 			ret++;
 			if (ret >= num)
 				break;
@@ -189,7 +182,7 @@ static int futex_wake(unsigned long uadd
 	}
 
 	unlock_futex_mm();
-	unpin_page(page);
+	put_page(page);
 
 	return ret;
 }
@@ -208,7 +201,9 @@ static void futex_vcache_callback(vcache
 	spin_lock(&futex_lock);
 
 	if (!list_empty(&q->list)) {
+		put_page(q->page);
 		q->page = new_page;
+		__pin_page_atomic(new_page);
 		list_del(&q->list);
 		list_add_tail(&q->list, head);
 	}
@@ -216,6 +211,63 @@ static void futex_vcache_callback(vcache
 	spin_unlock(&futex_lock);
 }
 
+/*
+ * Requeue all waiters hashed on one physical page to another
+ * physical page.
+ */
+static inline int futex_requeue(unsigned long uaddr1, int offset1,
+		unsigned long uaddr2, int offset2, int num)
+{
+	struct list_head *i, *next, *head1, *head2;
+	struct page *page1 = NULL, *page2 = NULL;
+	int ret = 0;
+
+	lock_futex_mm();
+
+	page1 = __pin_page(uaddr1 - offset1);
+	if (!page1)
+		goto out;
+	page2 = __pin_page(uaddr2 - offset2);
+	if (!page2)
+		goto out;
+
+	head1 = hash_futex(page1, offset1);
+	head2 = hash_futex(page2, offset2);
+
+	list_for_each_safe(i, next, head1) {
+		struct futex_q *this = list_entry(i, struct futex_q, list);
+
+		if (this->page == page1 && this->offset == offset1) {
+			list_del_init(i);
+			__detach_vcache(&this->vcache);
+			if (++ret <= num) {
+				wake_up_all(&this->waiters);
+				if (this->filp)
+					send_sigio(&this->filp->f_owner,
+							this->fd, POLL_IN);
+			} else {
+				put_page(this->page);
+				__pin_page_atomic (page2);
+				list_add_tail(i, head2);
+				__attach_vcache(&this->vcache, uaddr2,
+					current->mm, futex_vcache_callback);
+				this->offset = offset2;
+				this->page = page2;
+			}
+		}
+	}
+
+out:
+	unlock_futex_mm();
+
+	if (page1)
+		put_page(page1);
+	if (page2)
+		put_page(page2);
+
+	return ret;
+}
+
 static inline void __queue_me(struct futex_q *q, struct page *page,
 				unsigned long uaddr, int offset,
 				int fd, struct file *filp)
@@ -252,7 +304,7 @@ static inline int unqueue_me(struct fute
 	return ret;
 }
 
-static int futex_wait(unsigned long uaddr,
+static inline int futex_wait(unsigned long uaddr,
 		      int offset,
 		      int val,
 		      unsigned long time)
@@ -273,14 +325,17 @@ static int futex_wait(unsigned long uadd
 	}
 	__queue_me(&q, page, uaddr, offset, -1, NULL);
 
-	unlock_futex_mm();
-
-	/* Page is pinned, but may no longer be in this address space. */
+	/*
+	 * Page is pinned, but may no longer be in this address space.
+	 * It cannot schedule, so we access it with the spinlock held.
+	 */
 	if (get_user(curval, (int *)uaddr) != 0) {
+		unlock_futex_mm();
 		ret = -EFAULT;
 		goto out;
 	}
 	if (curval != val) {
+		unlock_futex_mm();
 		ret = -EWOULDBLOCK;
 		goto out;
 	}
@@ -288,13 +343,15 @@ static int futex_wait(unsigned long uadd
 	 * The get_user() above might fault and schedule so we
 	 * cannot just set TASK_INTERRUPTIBLE state when queueing
 	 * ourselves into the futex hash. This code thus has to
-	 * rely on the FUTEX_WAKE code doing a wakeup after removing
+	 * rely on the futex_wake() code doing a wakeup after removing
 	 * the waiter from the list.
 	 */
 	add_wait_queue(&q.waiters, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
-	if (!list_empty(&q.list))
+	if (!list_empty(&q.list)) {
+		unlock_futex_mm();
 		time = schedule_timeout(time);
+	}
 	set_current_state(TASK_RUNNING);
 	/*
 	 * NOTE: we don't remove ourselves from the waitqueue because
@@ -310,7 +367,7 @@ out:
 	/* Were we woken up anyway? */
 	if (!unqueue_me(&q))
 		ret = 0;
-	unpin_page(page);
+	put_page(q.page);
 
 	return ret;
 }
@@ -320,7 +377,7 @@ static int futex_close(struct inode *ino
 	struct futex_q *q = filp->private_data;
 
 	unqueue_me(q);
-	unpin_page(q->page);
+	put_page(q->page);
 	kfree(filp->private_data);
 	return 0;
 }
@@ -416,11 +473,76 @@ static int futex_fd(unsigned long uaddr,
 	page = NULL;
 out:
 	if (page)
-		unpin_page(page);
+		put_page(page);
 	return ret;
 }
 
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout)
+asmlinkage long sys_futex_wait(u32 __user *__uaddr, int val, struct timespec __user *utime)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
+	struct timespec t;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	if (utime) {
+		if (copy_from_user(&t, utime, sizeof(t)) != 0)
+			return -EFAULT;
+		timeout = timespec_to_jiffies(&t) + 1;
+	}
+	return futex_wait(uaddr, pos_in_page, val, timeout);
+}
+
+asmlinkage long sys_futex_wake(u32 __user *__uaddr, int nr_wake)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	return futex_wake(uaddr, pos_in_page, nr_wake);
+}
+
+asmlinkage long sys_futex_fd(u32 __user *__uaddr, int signal)
+{
+	unsigned long uaddr = (unsigned long)__uaddr;
+	unsigned long pos_in_page = uaddr % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if (pos_in_page % sizeof(u32))
+		return -EINVAL;
+
+	return futex_fd(uaddr, pos_in_page, signal);
+}
+
+asmlinkage long sys_futex_requeue(u32 __user *__uaddr1,
+				u32 __user *__uaddr2, int nr_wake)
+{
+	unsigned long uaddr1 = (unsigned long)__uaddr1,
+			uaddr2 = (unsigned long)__uaddr2;
+	unsigned long pos_in_page1 = uaddr1 % PAGE_SIZE,
+			pos_in_page2 = uaddr2 % PAGE_SIZE;
+
+	/* Must be "naturally" aligned */
+	if ((pos_in_page1 | pos_in_page2) % sizeof(u32))
+		return -EINVAL;
+
+	return futex_requeue(uaddr1, pos_in_page1, uaddr2, pos_in_page2, nr_wake);
+}
+
+/*
+ * Compatibility code:
+ */
+#define OLD_FUTEX_WAIT (0)
+#define OLD_FUTEX_WAKE (1)
+#define OLD_FUTEX_FD (2)
+
+long do_old_futex(unsigned long uaddr, int op, int val, unsigned long timeout)
 {
 	unsigned long pos_in_page;
 	int ret;
@@ -432,13 +554,13 @@ long do_futex(unsigned long uaddr, int o
 		return -EINVAL;
 
 	switch (op) {
-	case FUTEX_WAIT:
+	case OLD_FUTEX_WAIT:
 		ret = futex_wait(uaddr, pos_in_page, val, timeout);
 		break;
-	case FUTEX_WAKE:
+	case OLD_FUTEX_WAKE:
 		ret = futex_wake(uaddr, pos_in_page, val);
 		break;
-	case FUTEX_FD:
+	case OLD_FUTEX_FD:
 		/* non-zero val means F_SETOWN(getpid()) & F_SETSIG(val) */
 		ret = futex_fd(uaddr, pos_in_page, val);
 		break;
@@ -448,17 +570,17 @@ long do_futex(unsigned long uaddr, int o
 	return ret;
 }
 
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime)
+asmlinkage long old_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
 
-	if ((op == FUTEX_WAIT) && utime) {
+	if ((op == OLD_FUTEX_WAIT) && utime) {
 		if (copy_from_user(&t, utime, sizeof(t)) != 0)
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout);
+	return do_old_futex((unsigned long)uaddr, op, val, timeout);
 }
 
 static struct super_block *


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex patches, futex-2.5.69-A2
  2003-05-20 12:56                 ` [patch] futex patches, futex-2.5.69-A2 Ingo Molnar
@ 2003-05-20 14:08                   ` Christoph Hellwig
  2003-05-20 16:02                     ` Ingo Molnar
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2003-05-20 14:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel

On Tue, May 20, 2003 at 02:56:42PM +0200, Ingo Molnar wrote:
> it's not vendor braindamage.

I didn't sayd sys_futex is vendor braindamage.  The problem is that vendors
think they need to put all the unstable features into their release tree.
There's a reaaon those aren';t in the released stable kernels.

> you dont understand, do you? There are very valid and perfectly working
> glibc installations [and maybe NGPT installations, futex users, etc.] out
> there that will break if you remove sys_futex(). No amount of rpm hacking
> will fix them up.

And they'll break if you run _any_ released stable kernel, so what?
Let the vendors fix the mess they introduced by backporting unstable features
instead of burdening it up on mainline.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:57           ` Rusty Russell
  2003-05-20  9:12           ` Ingo Oeser
@ 2003-05-20 15:41           ` Linus Torvalds
  2 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2003-05-20 15:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Rusty Russell, Ulrich Drepper, linux-kernel


On Tue, 20 May 2003, Christoph Hellwig wrote:
> 
> Actually it should go away before 2.6.0.  sys_futex never was part of a
> released stable kernel so having the old_ version around is silly.

That's not a valid argument. That's an _excuse_.

Guys, binary compatibility is important. It's important enough that if 
something got extensively used on development kernels, it's _still_ a hell 
of a lot more important than most other things around. The _only_ things 
that trump binary compatibility are

 - developer sanity (ie it has to be truly mindbogglingly hard to support 
   the old interface)
 - stability (ie if the old interface was so badly designed that it can't 
   be done right - mmap on /proc/<pid>/mem was one of these).
 - it's been deprecated over at least one full stable release.

Something like "it's only been in the development kernels" is simply not
an issue. The only thing that matters is whether it is used by various
binaries or not. And I think futexes are used a lot by glibc..

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  9:51               ` Christoph Hellwig
  2003-05-20 12:56                 ` [patch] futex patches, futex-2.5.69-A2 Ingo Molnar
@ 2003-05-20 15:46                 ` Linus Torvalds
  1 sibling, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2003-05-20 15:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Rusty Russell, Ulrich Drepper, linux-kernel


On Tue, 20 May 2003, Christoph Hellwig wrote:
>
> On Tue, May 20, 2003 at 11:03:36AM +0200, Ingo Molnar wrote:
> > have you all gone nuts??? It's not an option to break perfectly working
> > binaries out there.
> 
> Of course it is. 

NO. 

Christoph, get a grip. Ingo is 100% right.

IT IS NEVER ACCEPTABLE TO BREAK USER LEVEL BINARIES! In particular, we do 
_not_ do it just because of some sense of "aesthetics".

If you want "aesthetics", go play with microkernels, or other academic
projects. They don't care about their users, they care about their ideas. 
The end result is, in my opinion, CRAP.

Linux has never been that way. The _founding_ principle of Linux was
"compatibility". Don't ever forget that. The user comes first. ALWAYS.  
Pretty code is a desirable feature, but if prettifying the code breaks
user apps, it doesn't get prettified.

Repeat after me: the goodness of an operating system is not in how pretty 
it is, but in how well it supports the user.

Make it your mantra.

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex patches, futex-2.5.69-A2
  2003-05-20 14:08                   ` Christoph Hellwig
@ 2003-05-20 16:02                     ` Ingo Molnar
  2003-05-20 19:55                       ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Ingo Molnar @ 2003-05-20 16:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel


On Tue, 20 May 2003, Christoph Hellwig wrote:

> > you dont understand, do you? There are very valid and perfectly working
> > glibc installations [and maybe NGPT installations, futex users, etc.] out
> > there that will break if you remove sys_futex(). No amount of rpm hacking
> > will fix them up.
>
> And they'll break if you run _any_ released stable kernel, so what?

you havent ever used Ulrich's nptl-enabled glibc, have you? It will boot
on any 2.4.1+ kernel, with and without nptl/tls support. It switches the
threading implementation depending on the kernel features it detects.

	Ingo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-20  1:44                   ` Jamie Lokier
@ 2003-05-20 16:54                     ` Davide Libenzi
  2003-05-20 17:59                       ` Jamie Lokier
  0 siblings, 1 reply; 43+ messages in thread
From: Davide Libenzi @ 2003-05-20 16:54 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linux Kernel Mailing List

On Tue, 20 May 2003, Jamie Lokier wrote:

> Yes, they do and it should work (I haven't tried, though).
>
> There is a practical problem when waiting on a futex in multiple
> threads using epoll: you need a separate fd per waiter, rather than an
> fd per waited-on futex.  This is because some uses of futexes depend
> on waiters being woken in the exact order that they were queued.

Not really Jamie. See a Futex event is not much different from a network
one. When the event shows up, one thread will pick it up (epoll_wait) and
will handle it. A futex event will very likely be a green light for some
sort of resource usage that whatever thread can pick up and handle.



- Davide


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-20 16:54                     ` Davide Libenzi
@ 2003-05-20 17:59                       ` Jamie Lokier
  2003-05-21 23:56                         ` Davide Libenzi
  0 siblings, 1 reply; 43+ messages in thread
From: Jamie Lokier @ 2003-05-20 17:59 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

Davide Libenzi wrote:
> > Yes, they do and it should work (I haven't tried, though).
> >
> > There is a practical problem when waiting on a futex in multiple
> > threads using epoll: you need a separate fd per waiter, rather than an
> > fd per waited-on futex.  This is because some uses of futexes depend
> > on waiters being woken in the exact order that they were queued.
> 
> Not really Jamie. See a Futex event is not much different from a network
> one. When the event shows up, one thread will pick it up (epoll_wait) and
> will handle it. A futex event will very likely be a green light for some
> sort of resource usage that whatever thread can pick up and handle.

That is true when a futex represents some item of work to do, or
readiness such as data coming in or going it.  Then it is quite
reasonable to think of it like a pollable fd.

But futexes are also used to represent contention for shared
resources, and the properties needed for this are quite different.

See futex_up_fair() in Rusty's futex-2.2 library.  That depends
crucially on getting exactly the right number of wakeup tokens passed,
and in the order the waiters blocked.

To pick an example, consider a dynamic web server which is
occasionally asked to render complex images, but where most of the
content is easily generated.  You might have 30 concurrent page
serving threads, but want to limit the number of threads which are
generating a particularly complex response to 3 at a time because of
memory constraints.

You cannot program this as putting things on a work queue and having
arbitrary threads pick them off, unless you are prepared to represent
the problem as an explicit state machine, where the intermediate
states can be represented as a data structure.  If the state is too
complex for that, which is often a reason for using threads in the
first place, that is not an option.

So it is important that _those_ kind of waiting threads are woken in
something approximating the order they went to sleep.  Otherwise there
is a danger of certain requests being starved unfairly.

To implement this you can use a simple futex, or a significantly more
complex, and slower, queue structure.

Using the futex, you either need an fd per waiter to get this
fairness, when the waiters are using epoll to listen for more than one
event, _or_ you need epoll to be clever and preserve the sleep-to-wake
ordering over a single futex.  I favour that last solution.

-- Jamie


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex patches, futex-2.5.69-A2
  2003-05-20 16:02                     ` Ingo Molnar
@ 2003-05-20 19:55                       ` Christoph Hellwig
  2003-05-21  5:06                         ` Martin Schlemmer
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2003-05-20 19:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel

On Tue, May 20, 2003 at 06:02:07PM +0200, Ingo Molnar wrote:
> you havent ever used Ulrich's nptl-enabled glibc, have you? It will boot
> on any 2.4.1+ kernel, with and without nptl/tls support. It switches the
> threading implementation depending on the kernel features it detects.

I have built a nptl-enabled glibc and no, it's doesn't work on 2.4 at all.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex patches, futex-2.5.69-A2
  2003-05-20 19:55                       ` Christoph Hellwig
@ 2003-05-21  5:06                         ` Martin Schlemmer
  2003-05-21  6:31                           ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Martin Schlemmer @ 2003-05-21  5:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Rusty Russell, Ulrich Drepper, Linus Torvalds, KML

On Tue, 2003-05-20 at 21:55, Christoph Hellwig wrote:
> On Tue, May 20, 2003 at 06:02:07PM +0200, Ingo Molnar wrote:
> > you havent ever used Ulrich's nptl-enabled glibc, have you? It will boot
> > on any 2.4.1+ kernel, with and without nptl/tls support. It switches the
> > threading implementation depending on the kernel features it detects.
> 
> I have built a nptl-enabled glibc and no, it's doesn't work on 2.4 at all.
> 

It is because you only compiled it with nptl support.

In recent (nptl enabled) Redhat glibc's glibc is build two times.
1) without nptl
2) with nptl

The version without nptl support is then installed into the 'default'
location.  The other is installed into /lib/tls/ and /usr/lib/tls/.
ld.so is then hacked (maybe in mainline glibc now?) to load the tls
enabled versions of the libraries only if the kernel/hardware support
it.


Regards,

-- 
Martin Schlemmer



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex patches, futex-2.5.69-A2
  2003-05-21  5:06                         ` Martin Schlemmer
@ 2003-05-21  6:31                           ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-05-21  6:31 UTC (permalink / raw)
  To: Martin Schlemmer
  Cc: Christoph Hellwig, Ingo Molnar, Rusty Russell, Ulrich Drepper,
	Linus Torvalds, KML

On Wed, May 21, 2003 at 07:06:57AM +0200, Martin Schlemmer wrote:
> On Tue, 2003-05-20 at 21:55, Christoph Hellwig wrote:
> > On Tue, May 20, 2003 at 06:02:07PM +0200, Ingo Molnar wrote:
> > > you havent ever used Ulrich's nptl-enabled glibc, have you? It will boot
> > > on any 2.4.1+ kernel, with and without nptl/tls support. It switches the
> > > threading implementation depending on the kernel features it detects.
> > 
> > I have built a nptl-enabled glibc and no, it's doesn't work on 2.4 at all.
> > 
> 
> It is because you only compiled it with nptl support.

I know.  I didn't ever claim you can install multiple glibc version.

> In recent (nptl enabled) Redhat glibc's glibc is build two times.

For x86 it's even built three times..


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2
  2003-05-20 17:59                       ` Jamie Lokier
@ 2003-05-21 23:56                         ` Davide Libenzi
  0 siblings, 0 replies; 43+ messages in thread
From: Davide Libenzi @ 2003-05-21 23:56 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linux Kernel Mailing List

On Tue, 20 May 2003, Jamie Lokier wrote:

> > Not really Jamie. See a Futex event is not much different from a network
> > one. When the event shows up, one thread will pick it up (epoll_wait) and
> > will handle it. A futex event will very likely be a green light for some
> > sort of resource usage that whatever thread can pick up and handle.
>
> That is true when a futex represents some item of work to do, or
> readiness such as data coming in or going it.  Then it is quite
> reasonable to think of it like a pollable fd.
>
> But futexes are also used to represent contention for shared
> resources, and the properties needed for this are quite different.
>
> See futex_up_fair() in Rusty's futex-2.2 library.  That depends
> crucially on getting exactly the right number of wakeup tokens passed,
> and in the order the waiters blocked.
>
> To pick an example, consider a dynamic web server which is
> occasionally asked to render complex images, but where most of the
> content is easily generated.  You might have 30 concurrent page
> serving threads, but want to limit the number of threads which are
> generating a particularly complex response to 3 at a time because of
> memory constraints.
>
> You cannot program this as putting things on a work queue and having
> arbitrary threads pick them off, unless you are prepared to represent
> the problem as an explicit state machine, where the intermediate
> states can be represented as a data structure.  If the state is too
> complex for that, which is often a reason for using threads in the
> first place, that is not an option.

Jamie, if you use the one thread per connection model you don't need epoll
at all. The std poll() scale very well with one fd :) If you're using a
N:1 (N connections over one thread/task) you definitely need either a
state machine or something like coroutines. With futex->poll() (under
epoll) you can easily create the object you need to solve your server
problem. You create an object (structure) that contain the futex and a
list of coroutines/state-structures. You drop the pointer to this
structure inside data.ptr of the epoll fd data. When you get an event you
invoke/proceed with a FIFO order. When an object wait on this "mutex" you
drop the calling coroutine/state-structure inside the list tail. All those
ops (getting data from epoll event, removing list head, adding to tail)
are lightning fast. Actually, strictly speaking, you don't even need a
futex to do this with a N:1 model :) But you do need it for a N:M (M > 1) model.



- Davide


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2003-05-21 23:44 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-19  9:31 [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Ingo Molnar
2003-05-19 10:10 ` Christoph Hellwig
2003-05-19 10:16   ` Ingo Molnar
2003-05-19 11:49     ` Christoph Hellwig
2003-05-19 12:33       ` [patch] futex API cleanups, futex-api-cleanup-2.5.69-A2 Ingo Molnar
2003-05-19 12:51         ` Ingo Molnar
2003-05-19 14:47         ` bert hubert
2003-05-19 16:02           ` Ingo Molnar
2003-05-19 15:18         ` Linus Torvalds
2003-05-19 16:06           ` Ingo Molnar
2003-05-19 23:33             ` Jamie Lokier
2003-05-20  0:39               ` Rusty Russell
2003-05-20  1:14                 ` Davide Libenzi
2003-05-20  1:44                   ` Jamie Lokier
2003-05-20 16:54                     ` Davide Libenzi
2003-05-20 17:59                       ` Jamie Lokier
2003-05-21 23:56                         ` Davide Libenzi
2003-05-20  1:50                 ` Jamie Lokier
2003-05-20  0:15             ` Rusty Russell
2003-05-20  0:08   ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
2003-05-20  0:31     ` Valdis.Kletnieks
2003-05-19 10:23 ` Andrew Morton
2003-05-19 10:30   ` [patch] futex requeueing feature, futex-requeue-2.5.69-D4 Ingo Molnar
2003-05-20  0:04 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Rusty Russell
2003-05-20  0:40   ` Ulrich Drepper
2003-05-20  1:46     ` Rusty Russell
2003-05-20  2:11       ` Ulrich Drepper
2003-05-20  6:27       ` Ingo Molnar
2003-05-20  6:56         ` Christoph Hellwig
2003-05-20  8:57           ` Rusty Russell
2003-05-20  9:03             ` Ingo Molnar
2003-05-20  9:51               ` Christoph Hellwig
2003-05-20 12:56                 ` [patch] futex patches, futex-2.5.69-A2 Ingo Molnar
2003-05-20 14:08                   ` Christoph Hellwig
2003-05-20 16:02                     ` Ingo Molnar
2003-05-20 19:55                       ` Christoph Hellwig
2003-05-21  5:06                         ` Martin Schlemmer
2003-05-21  6:31                           ` Christoph Hellwig
2003-05-20 15:46                 ` [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Linus Torvalds
2003-05-20  9:12           ` Ingo Oeser
2003-05-20 15:41           ` Linus Torvalds
2003-05-20  8:55         ` Rusty Russell
2003-05-20  6:19   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).