linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [robust-futex-1] futex: robust futex support
@ 2006-01-14  1:00 David Singleton
  2006-01-15  0:02 ` Ulrich Drepper
  0 siblings, 1 reply; 15+ messages in thread
From: David Singleton @ 2006-01-14  1:00 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: mingo, drepper

[-- Attachment #1: Type: text/plain, Size: 17 bytes --]

Andrew,

      


[-- Attachment #2: robust-futex-1 --]
[-- Type: text/plain, Size: 21842 bytes --]

Signed-off-by: David Singleton <dsingleton@mvista.com>

	The original code for this patch was supplied by Todd Kneisel.

	Robust futexes provide a locking mechanism that can be shared between
	user mode processes. The major difference between robust futexes and
	regular futexes is that when the owner of a robust futex dies, the
	next task waiting on the futex will be awakened, will get ownership
	of the futex lock, and will receive the error status EOWNERDEAD.

	Robust futexes allow the system to gracefuly continue if an application
	dies while holding a futex, without leaving waiting threads hung or
	requiring the system to be rebooted to clear the hang.

	Robust futexes are structures hung on a linked list on the inode.
	The list is scanned at exit time to find any futexes still held
	by the dying thread.  The structures backing the futex are removed
	when a dentry reference count drops to zero.  The exit path cleans
	up and locked futexes and the dentry reference count handles cleaning
	up robust futex resources.

	This implimentation supports shared pthread_mutexes, pthread_mutexes
	mmapped either in a file or in mmapped anonymous memory.

	Ulrich Drepper has a glibc that handles non-shared (malloc'd
	pthread_mutexes) all in user space.  This patch returns
	-ENOTSHARED from register futex if it is not a shared futex so his
	glibc can handle the non-shared pthread_mutexes without kernel support.

	The structure for robustness has its own slab cache, making
	it easy track how many robust futexes are in the system through
	/proc/slabinfo.
	
	Robust futexes have been tested fairly well in the community using
	realtime-preempt patch, which supports both robust and priority 
	inheriting pthread_mutexes.  There is a simple test suite, originally
	also supplied by Todd Kneisel, that tests basic robustness using
	regular futexes that have robust attribute applied.  They've also
	been tested using the fusyn posix test suite, with minor modifications
	to make the fusyn tests run with the NPTL libraries.


 fs/dcache.c                 |    2 
 fs/inode.c                  |    4 
 include/asm-generic/errno.h |    1 
 include/linux/fs.h          |    2 
 include/linux/futex.h       |   33 +++
 init/Kconfig                |    9 
 kernel/exit.c               |    2 
 kernel/futex.c              |  426 +++++++++++++++++++++++++++++++++++++++++++-
 8 files changed, 471 insertions(+), 8 deletions(-)

Index: linux-2.6.15/fs/dcache.c
===================================================================
--- linux-2.6.15.orig/fs/dcache.c
+++ linux-2.6.15/fs/dcache.c
@@ -33,6 +33,7 @@
 #include <linux/seqlock.h>
 #include <linux/swap.h>
 #include <linux/bootmem.h>
+#include <linux/futex.h>
 
 /* #define DCACHE_DEBUG 1 */
 
@@ -161,6 +162,7 @@ repeat:
 		return;
 	}
 
+	futex_free_robust_list(dentry->d_inode);
 	/*
 	 * AV: ->d_delete() is _NOT_ allowed to block now.
 	 */
Index: linux-2.6.15/fs/inode.c
===================================================================
--- linux-2.6.15.orig/fs/inode.c
+++ linux-2.6.15/fs/inode.c
@@ -23,6 +23,7 @@
 #include <linux/bootmem.h>
 #include <linux/inotify.h>
 #include <linux/mount.h>
+#include <linux/futex.h>
 
 /*
  * This is needed for the following functions:
@@ -208,6 +209,9 @@ void inode_init_once(struct inode *inode
 	INIT_LIST_HEAD(&inode->inotify_watches);
 	sema_init(&inode->inotify_sem, 1);
 #endif
+#ifdef CONFIG_ROBUST_FUTEX
+	futex_init_inode(inode);
+#endif
 }
 
 EXPORT_SYMBOL(inode_init_once);
Index: linux-2.6.15/include/linux/fs.h
===================================================================
--- linux-2.6.15.orig/include/linux/fs.h
+++ linux-2.6.15/include/linux/fs.h
@@ -383,6 +383,8 @@ struct address_space {
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	struct address_space	*assoc_mapping;	/* ditto */
+	struct list_head	robust_list;	/* list of robust futexes */
+	struct semaphore	robust_sem;	/* semaphore for robust */
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
Index: linux-2.6.15/include/linux/futex.h
===================================================================
--- linux-2.6.15.orig/include/linux/futex.h
+++ linux-2.6.15/include/linux/futex.h
@@ -10,6 +10,38 @@
 #define FUTEX_REQUEUE		3
 #define FUTEX_CMP_REQUEUE	4
 #define FUTEX_WAKE_OP		5
+#define FUTEX_REGISTER          6
+#define FUTEX_DEREGISTER        7
+#define FUTEX_RECOVER           8
+
+#ifdef __KERNEL__
+
+#ifdef CONFIG_ROBUST_FUTEX
+
+#define FUTEX_WAITERS				0x80000000
+#define FUTEX_OWNER_DIED			0x40000000
+#define FUTEX_NOT_RECOVERABLE			0x20000000
+#define FUTEX_FLAGS (FUTEX_WAITERS | FUTEX_OWNER_DIED | FUTEX_NOT_RECOVERABLE)
+#define FUTEX_PID                             ~(FUTEX_FLAGS)
+
+#define FUTEX_ATTR_SHARED                       0x10000000
+
+/*
+ * Used to track registered robust futexes. Attached to linked list in inodes.
+ */
+static inline void futex_init_inode(struct inode *inode)
+{
+        INIT_LIST_HEAD(&inode->i_data.robust_list);
+        init_MUTEX(&inode->i_data.robust_sem);
+}
+
+extern void futex_free_robust_list(struct inode *inode);
+extern void exit_futex(struct task_struct *tsk);
+#else
+# define futex_free_robust_list(a)      do { } while (0)
+# define exit_futex(b)                  do { } while (0)
+#define futex_init_inode(a) 		do { } while (0)
+#endif
 
 long do_futex(unsigned long uaddr, int op, int val,
 		unsigned long timeout, unsigned long uaddr2, int val2,
@@ -41,3 +73,4 @@ long do_futex(unsigned long uaddr, int o
    | ((oparg & 0xfff) << 12) | (cmparg & 0xfff))
 
 #endif
+#endif
Index: linux-2.6.15/kernel/exit.c
===================================================================
--- linux-2.6.15.orig/kernel/exit.c
+++ linux-2.6.15/kernel/exit.c
@@ -31,6 +31,7 @@
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
+#include <linux/futex.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -847,6 +848,7 @@ fastcall NORET_TYPE void do_exit(long co
 		exit_itimers(tsk->signal);
 		acct_process(code);
 	}
+	exit_futex(tsk);
 	exit_mm(tsk);
 
 	exit_sem(tsk);
Index: linux-2.6.15/kernel/futex.c
===================================================================
--- linux-2.6.15.orig/kernel/futex.c
+++ linux-2.6.15/kernel/futex.c
@@ -8,6 +8,9 @@
  *  Removed page pinning, fix privately mapped COW pages and other cleanups
  *  (C) Copyright 2003, 2004 Jamie Lokier
  *
+ *  Robust futexes added by Todd Kneisel
+ *  (C) Copyright 2005, Bull HN.
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -107,6 +110,9 @@ static struct futex_hash_bucket futex_qu
 /* Futex-fs vfsmount entry: */
 static struct vfsmount *futex_mnt;
 
+#ifdef CONFIG_ROBUST_FUTEX
+static kmem_cache_t *robust_futex_cachep;
+#endif
 /*
  * We hash on the keys returned from get_futex_key (see below).
  */
@@ -140,7 +146,8 @@ static inline int match_futex(union fute
  *
  * Should be called with &current->mm->mmap_sem but NOT any spinlocks.
  */
-static int get_futex_key(unsigned long uaddr, union futex_key *key)
+static int get_futex_key(unsigned long uaddr, union futex_key *key,
+			struct list_head **list, struct semaphore **sem)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
@@ -163,6 +170,14 @@ static int get_futex_key(unsigned long u
 	if (unlikely(!vma))
 		return -EFAULT;
 
+	if (vma->vm_file && vma->vm_file->f_mapping) {
+		*list = &vma->vm_file->f_mapping->robust_list;
+		*sem = &vma->vm_file->f_mapping->robust_sem;
+	} else {
+		*sem = NULL;
+		*list = NULL;
+	}
+
 	/*
 	 * Permissions.
 	 */
@@ -290,11 +305,12 @@ static int futex_wake(unsigned long uadd
 	struct futex_hash_bucket *bh;
 	struct list_head *head;
 	struct futex_q *this, *next;
+	struct semaphore *sem;
 	int ret;
 
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr, &key);
+	ret = get_futex_key(uaddr, &key, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -325,16 +341,17 @@ static int futex_wake_op(unsigned long u
 	union futex_key key1, key2;
 	struct futex_hash_bucket *bh1, *bh2;
 	struct list_head *head;
+	struct semaphore *sem;
 	struct futex_q *this, *next;
 	int ret, op_ret, attempt = 0;
 
 retryfull:
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr1, &key1);
+	ret = get_futex_key(uaddr1, &key1, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, &key2);
+	ret = get_futex_key(uaddr2, &key2, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -450,16 +467,17 @@ static int futex_requeue(unsigned long u
 	union futex_key key1, key2;
 	struct futex_hash_bucket *bh1, *bh2;
 	struct list_head *head1;
+	struct semaphore *sem;
 	struct futex_q *this, *next;
 	int ret, drop_count = 0;
 
  retry:
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr1, &key1);
+	ret = get_futex_key(uaddr1, &key1, &head1, &sem);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, &key2);
+	ret = get_futex_key(uaddr2, &key2, &head1, &sem);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -624,11 +642,13 @@ static int futex_wait(unsigned long uadd
 	int ret, curval;
 	struct futex_q q;
 	struct futex_hash_bucket *bh;
+	struct list_head *head;
+	struct semaphore *sem;
 
  retry:
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr, &q.key);
+	ret = get_futex_key(uaddr, &q.key, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out_release_sem;
 
@@ -766,6 +786,8 @@ static int futex_fd(unsigned long uaddr,
 {
 	struct futex_q *q;
 	struct file *filp;
+	struct list_head *head;
+	struct semaphore *sem;
 	int ret, err;
 
 	ret = -EINVAL;
@@ -801,7 +823,7 @@ static int futex_fd(unsigned long uaddr,
 	}
 
 	down_read(&current->mm->mmap_sem);
-	err = get_futex_key(uaddr, &q->key);
+	err = get_futex_key(uaddr, &q->key, &head, &sem);
 
 	if (unlikely(err != 0)) {
 		up_read(&current->mm->mmap_sem);
@@ -829,6 +851,380 @@ error:
 	goto out;
 }
 
+#ifdef CONFIG_ROBUST_FUTEX
+/*
+ * Robust futexes provide a locking mechanism that can be shared between
+ * user mode processes. The major difference between robust futexes and
+ * regular futexes is that when the owner of a robust futex dies, the
+ * next task waiting on the futex will be awakened, will get ownership
+ * of the futex lock, and will receive the error status EOWNERDEAD.
+ *
+ * A robust futex is a 32 bit integer stored in user mode shared memory.
+ * Bit 31 indicates that there are tasks waiting on the futex.
+ * Bit 30 indicates that the task that owned the futex has died.
+ * Bit 29 indicates that the futex is not recoverable and cannot be used.
+ * Bits 0-28 are the pid of the task that owns the futex lock, or zero if
+ * the futex is not locked.
+ */
+
+/*
+ * Used to track registered robust futexes. Attached to linked list in inodes.
+ */
+struct futex_robust {
+	struct list_head list;
+	union futex_key key;
+};
+
+/*
+ * there really isn't an atomic page fault, so we're going to
+ * put the burden on the user. If either futex_get_user or futex_put_user
+ * return -EFAULT, it really means it's avoiding a race condition
+ * and the user will have to try again.
+ */
+static int futex_put_user(int value, unsigned long uaddr)
+{
+	int ret = 0;
+
+	if ((put_user(value, (int __user *)uaddr)) != 0)
+		if ((put_user(value, (int __user *)uaddr)) != 0)
+			ret = -EFAULT;
+	return ret;
+}
+
+static int futex_get_user(unsigned long uaddr)
+{
+	int value = 0;
+
+	if (get_user(value, (int __user *)uaddr))
+		if (get_user(value, (int __user *)uaddr))
+			value = -EFAULT;
+	return value;
+}
+
+/**
+ * futex_free_robust_list - release the list of registered futexes.
+ * @inode: inode that may be a memory mapped file
+ *
+ * Called from dput() when a dentry reference count reaches zero.
+ * If the dentry is associated with a memory mapped file, then
+ * release the list of registered robust futexes that are contained
+ * in that mapping.
+ */
+void futex_free_robust_list(struct inode *inode)
+{
+	struct address_space *mapping;
+	struct list_head *head;
+ 	struct futex_robust *this, *next;
+
+	if (inode == NULL)
+		return;
+
+	mapping = inode->i_mapping;
+	if (mapping == NULL)
+		return;
+
+	if (list_empty(&mapping->robust_list))
+		return;
+
+	down(&mapping->robust_sem);
+
+	head = &mapping->robust_list;
+
+	list_for_each_entry_safe(this, next, head, list) {
+		list_del(&this->list);
+		kmem_cache_free(robust_futex_cachep, this);
+	}
+
+	up(&mapping->robust_sem);
+	return;
+}
+
+/**
+ * get_private_uaddr - convert a private futex_key to a user addr
+ * @key: the futex_key that identifies a futex.
+ *
+ * Private futex_keys identify a futex that is in non-shared memory.
+ * Robust futexes should never result in private futex_keys, but keep
+ * this code for completeness.
+ * Returns zero if futex is not contained in current task's mm
+ */
+static unsigned long get_private_uaddr(union futex_key *key)
+{
+	unsigned long uaddr = 0;
+
+	if (key->private.mm == current->mm)
+		uaddr = key->private.uaddr;
+	return uaddr;
+}
+
+/**
+ * get_shared_uaddr - convert a shared futex_key to a user addr.
+ * @key: a futex_key that identifies a futex.
+ * @vma: a vma that may contain the futex
+ *
+ * Shared futex_keys identify a futex that is contained in a vma,
+ * and so may be shared.
+ * Returns zero if futex is not contained in @vma
+ */
+static unsigned long get_shared_uaddr(union futex_key *key,
+				      struct vm_area_struct *vma)
+{
+	unsigned long uaddr = 0;
+	unsigned long tmpaddr;
+	struct address_space *mapping;
+
+	mapping = vma->vm_file->f_mapping;
+	if (key->shared.inode == mapping->host) {
+		tmpaddr = ((key->shared.pgoff - vma->vm_pgoff) << PAGE_SHIFT)
+				+ (key->shared.offset & ~0x1)
+				+ vma->vm_start;
+		if (tmpaddr >= vma->vm_start && tmpaddr < vma->vm_end)
+			uaddr = tmpaddr;
+	}
+
+	return uaddr;
+}
+
+/**
+ * get_futex_uaddr - convert a futex_key to a user addr.
+ * @key: futex_key that identifies a futex
+ * @vma: vma that may contain the futex
+ *
+ * Converts both shared and private futex_keys.
+ * Returns zero if futex is not contained in @vma or in the current
+ * task's mm.
+ */
+static unsigned long get_futex_uaddr(union futex_key *key,
+				     struct vm_area_struct *vma)
+{
+	unsigned long uaddr;
+
+	if ((key->both.offset & 0x1) == 0)
+		uaddr = get_private_uaddr(key);
+	else
+		uaddr = get_shared_uaddr(key,vma);
+
+	return uaddr;
+}
+
+/**
+ * find_owned_futex - find futexes owned by the current task
+ * @vma: the vma to search for futexes
+ * @head: list head for list of robust futexes
+ * @sem: semaphore that protects the list
+ *
+ * Walk the list of registered robust futexes for this @vma,
+ * setting the %FUTEX_OWNER_DIED flag on those futexes owned
+ * by the current, exiting task.
+ */
+static void find_owned_futex(struct vm_area_struct *vma, struct list_head *head,
+				struct semaphore *sem)
+{
+	struct futex_robust *this, *next;
+ 	unsigned long uaddr;
+	int value;
+
+	down(sem);
+
+	list_for_each_entry_safe(this, next, head, list) {
+
+		uaddr = get_futex_uaddr(&this->key, vma);
+		if (uaddr == 0)
+			continue;
+
+		up(sem);
+		up_read(&current->mm->mmap_sem);
+		value = futex_get_user(uaddr);
+		if ((value & FUTEX_PID) == current->pid) {
+			value |= FUTEX_OWNER_DIED;
+			futex_wake(uaddr, 1);
+			futex_put_user(value, uaddr);
+		}
+		down(sem);
+		down_read(&current->mm->mmap_sem);
+	}
+
+	up(sem);
+}
+
+/**
+ * exit_futex - futex processing when a task exits.
+ *
+ * Called from do_exit() when a task exits. Mark all robust futexes
+ * that are owned by the current terminating task as %FUTEX_OWNER_DIED.
+ */
+
+void exit_futex(struct task_struct *tsk)
+{
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	struct list_head *robust;
+	struct semaphore *sem;
+
+	if (tsk==NULL)
+		return;
+
+	mm = current->mm;
+	if (mm==NULL)
+		return;
+
+	down_read(&mm->mmap_sem);
+
+	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
+		if (vma->vm_file == NULL)
+			continue;
+
+		if (vma->vm_file->f_mapping == NULL)
+			continue;
+
+		robust = &vma->vm_file->f_mapping->robust_list;
+		sem = &vma->vm_file->f_mapping->robust_sem;
+		if (list_empty(robust))
+			continue;
+
+		find_owned_futex(vma, robust, sem);
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+/**
+ * futex_register - Record the existence of a robust futex in a vma.
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is created. Looks up the vma that contains the futex and
+ * adds an entry to the list of all robust futexes in the vma.
+ */
+static int futex_register(unsigned long uaddr, unsigned int attr)
+{
+	int ret;
+	struct futex_robust *robust;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	struct list_head *head = NULL;
+	struct semaphore *sem = NULL;
+
+	if ((attr & FUTEX_ATTR_SHARED) == 0)
+		return -ENOTSHARED;
+
+	robust = kmem_cache_alloc(robust_futex_cachep, GFP_KERNEL);
+	if (!robust) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	down_read(&current->mm->mmap_sem);
+
+	ret = get_futex_key(uaddr, &robust->key, &head, &sem);
+	if (unlikely(ret != 0))	{
+		kmem_cache_free(robust_futex_cachep, robust);
+		goto out_unlock;
+	}
+
+	vma = find_extend_vma(mm, uaddr);
+	if (unlikely(!vma)) {
+		ret = -EFAULT;
+		kmem_cache_free(robust_futex_cachep, robust);
+		goto out_unlock;
+	}
+
+	if (vma->vm_file && vma->vm_file->f_mapping) {
+		head = &vma->vm_file->f_mapping->robust_list;
+		sem = &vma->vm_file->f_mapping->robust_sem;
+	} else {
+		ret = -ENOTSHARED;
+		kmem_cache_free(robust_futex_cachep, robust);
+		goto out_unlock;
+	}
+
+	down(sem);
+	list_add_tail(&robust->list, head);
+	up(sem);
+
+out_unlock:
+	up_read(&current->mm->mmap_sem);
+out:
+	return ret;
+}
+
+/**
+ * futex_deregister - Delete robust futex registration from a vma
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is destroyed. Looks up the vma that contains the futex and
+ * removes the futex entry from the list of all robust futexes in
+ * the vma.
+ */
+static int futex_deregister(unsigned long uaddr)
+{
+	union futex_key key;
+	struct mm_struct *mm = current->mm;
+	struct list_head *head = NULL;
+	struct semaphore *sem = NULL;
+	struct futex_robust *this, *next;
+	int ret;
+
+	down_read(&mm->mmap_sem);
+
+	ret = get_futex_key(uaddr, &key, &head, &sem);
+	if (unlikely(ret != 0))
+		goto out;
+	if (head == NULL) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	down(sem);
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (match_futex (&this->key, &key)) {
+			futex_wake(uaddr, 1);
+			list_del(&this->list);
+			kmem_cache_free(robust_futex_cachep, this);
+			break;
+		}
+	}
+
+	up(sem);
+out:
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+
+/**
+ * futex_recover - Recover a futex after its owner died
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall).
+ * When a task dies while owning a robust futex, the futex is
+ * marked with %FUTEX_OWNER_DIED and ownership is transferred
+ * to the next waiting task. That task can choose to restore
+ * the futex to a useful state by calling this function.
+ */
+static int futex_recover(unsigned long uaddr)
+{
+	int ret = 0;
+	int value = 0;
+	union futex_key key;
+	struct list_head *head = NULL;
+	struct semaphore *sem = NULL;
+
+	down_read(&current->mm->mmap_sem);
+	ret = get_futex_key(uaddr, &key, &head, &sem);
+	up_read(&current->mm->mmap_sem);
+	if (ret != 0)
+		return ret;
+
+	if ((value = futex_get_user(uaddr)) == -EFAULT)
+		return ret;
+
+	value &= ~FUTEX_OWNER_DIED;
+	return futex_put_user(value, uaddr);
+}
+#endif
+
 long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,
 		unsigned long uaddr2, int val2, int val3)
 {
@@ -854,6 +1250,17 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE_OP:
 		ret = futex_wake_op(uaddr, uaddr2, val, val2, val3);
 		break;
+#ifdef CONFIG_ROBUST_FUTEX
+	case FUTEX_REGISTER:
+		ret = futex_register(uaddr, val);
+		break;
+	case FUTEX_DEREGISTER:
+		ret = futex_deregister(uaddr);
+		break;
+	case FUTEX_RECOVER:
+		ret = futex_recover(uaddr);
+		break;
+#endif
 	default:
 		ret = -ENOSYS;
 	}
@@ -901,6 +1308,9 @@ static int __init init(void)
 {
 	unsigned int i;
 
+#ifdef CONFIG_ROBUST_FUTEX
+	robust_futex_cachep = kmem_cache_create("robust_futex", sizeof(struct futex_robust), 0, 0, NULL, NULL);
+#endif
 	register_filesystem(&futex_fs_type);
 	futex_mnt = kern_mount(&futex_fs_type);
 
Index: linux-2.6.15/init/Kconfig
===================================================================
--- linux-2.6.15.orig/init/Kconfig
+++ linux-2.6.15/init/Kconfig
@@ -348,6 +348,15 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config ROBUST_FUTEX
+	bool "Enable robust futex support"
+	depends on FUTEX
+	default y
+	help
+	  Enable this option if you want to use robust user space mutexes.
+	  Enabling this option slows down the exit path of the kernel for
+	  all processes.  Robust futexes will run glibc-based applications correctly.
+
 config EPOLL
 	bool "Enable eventpoll support" if EMBEDDED
 	default y
Index: linux-2.6.15/include/asm-generic/errno.h
===================================================================
--- linux-2.6.15.orig/include/asm-generic/errno.h
+++ linux-2.6.15/include/asm-generic/errno.h
@@ -105,5 +105,6 @@
 /* for robust mutexes */
 #define	EOWNERDEAD	130	/* Owner died */
 #define	ENOTRECOVERABLE	131	/* State not recoverable */
+#define ENOTSHARED	132	/* no pshared attribute */
 
 #endif

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-1] futex: robust futex support
  2006-01-14  1:00 [robust-futex-1] futex: robust futex support David Singleton
@ 2006-01-15  0:02 ` Ulrich Drepper
  2006-01-15  0:04   ` david singleton
  0 siblings, 1 reply; 15+ messages in thread
From: Ulrich Drepper @ 2006-01-15  0:02 UTC (permalink / raw)
  To: David Singleton; +Cc: akpm, linux-kernel, mingo

On 1/13/06, David Singleton <dsingleton@mvista.com> wrote:
=============================================
> --- linux-2.6.15.orig/include/asm-generic/errno.h
> +++ linux-2.6.15/include/asm-generic/errno.h
> @@ -105,5 +105,6 @@
>  /* for robust mutexes */
>  #define        EOWNERDEAD      130     /* Owner died */
>  #define        ENOTRECOVERABLE 131     /* State not recoverable */
> +#define ENOTSHARED     132     /* no pshared attribute */

Do not introduce a new error code if at all possible.  Adding
userlang-visible error codes means the ABI changes due to the stupid
_sys_errlist variable.  Just reuse an error code which cannot be
returned in the futex code so far and which has some kind of
resemblence with what the error means.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-1] futex: robust futex support
  2006-01-15  0:02 ` Ulrich Drepper
@ 2006-01-15  0:04   ` david singleton
  2006-01-15  5:18     ` Ulrich Drepper
  0 siblings, 1 reply; 15+ messages in thread
From: david singleton @ 2006-01-15  0:04 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: akpm, mingo, linux-kernel


On Jan 14, 2006, at 4:02 PM, Ulrich Drepper wrote:

> On 1/13/06, David Singleton <dsingleton@mvista.com> wrote:
> =============================================
>> --- linux-2.6.15.orig/include/asm-generic/errno.h
>> +++ linux-2.6.15/include/asm-generic/errno.h
>> @@ -105,5 +105,6 @@
>>  /* for robust mutexes */
>>  #define        EOWNERDEAD      130     /* Owner died */
>>  #define        ENOTRECOVERABLE 131     /* State not recoverable */
>> +#define ENOTSHARED     132     /* no pshared attribute */
>
> Do not introduce a new error code if at all possible.  Adding
> userlang-visible error codes means the ABI changes due to the stupid
> _sys_errlist variable.  Just reuse an error code which cannot be
> returned in the futex code so far and which has some kind of
> resemblence with what the error means.

can you suggest some error codes you like to use?  I'll use
whatever you suggest.

thanks

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-1] futex: robust futex support
  2006-01-15  0:04   ` david singleton
@ 2006-01-15  5:18     ` Ulrich Drepper
  2006-01-15 20:00       ` David Singleton
  2006-01-17  2:27       ` [robust-futex-3] " david singleton
  0 siblings, 2 replies; 15+ messages in thread
From: Ulrich Drepper @ 2006-01-15  5:18 UTC (permalink / raw)
  To: david singleton; +Cc: akpm, mingo, linux-kernel

On 1/14/06, david singleton <dsingleton@mvista.com> wrote:
> can you suggest some error codes you like to use?  I'll use
> whatever you suggest.

How about EADDRNOTAVAIL.  The error message kind of makes sense, even
though "address" is misused.  And there definitely won't be a clash
with other error codes because it's currently only used in network
contexts.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-1] futex: robust futex support
  2006-01-15  5:18     ` Ulrich Drepper
@ 2006-01-15 20:00       ` David Singleton
  2006-01-17  2:27       ` [robust-futex-3] " david singleton
  1 sibling, 0 replies; 15+ messages in thread
From: David Singleton @ 2006-01-15 20:00 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: akpm, mingo, linux-kernel

Ulrich Drepper wrote:

>On 1/14/06, david singleton <dsingleton@mvista.com> wrote:
>  
>
>>can you suggest some error codes you like to use?  I'll use
>>whatever you suggest.
>>    
>>
>
>How about EADDRNOTAVAIL.  The error message kind of makes sense, even
>though "address" is misused.  And there definitely won't be a clash
>with other error codes because it's currently only used in network
>contexts.
>  
>
Will do.  I'm testing a patch that addresses Andi's suggestions now and 
I'll add
the return code today.

thanks


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [robust-futex-3] futex: robust futex support
  2006-01-15  5:18     ` Ulrich Drepper
  2006-01-15 20:00       ` David Singleton
@ 2006-01-17  2:27       ` david singleton
  2006-01-17 17:32         ` Ulrich Drepper
  2006-01-17 17:50         ` Ulrich Drepper
  1 sibling, 2 replies; 15+ messages in thread
From: david singleton @ 2006-01-17  2:27 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: akpm, mingo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]


On Jan 14, 2006, at 9:18 PM, Ulrich Drepper wrote:

> On 1/14/06, david singleton <dsingleton@mvista.com> wrote:
>> can you suggest some error codes you like to use?  I'll use
>> whatever you suggest.
>
> How about EADDRNOTAVAIL.  The error message kind of makes sense, even
> though "address" is misused.  And there definitely won't be a clash
> with other error codes because it's currently only used in network
> contexts.
>

Ulrich,
	I've fixed another memory leak in free_robust_list.   The entries in 
the slab caches
now look correct through the full test suite up to 7500 threads.   Does 
your glibc
work correctly with this patch?


diff -u linux-2.6.15/kernel/futex.c linux-2.6.15/kernel/futex.c
--- linux-2.6.15/kernel/futex.c
+++ linux-2.6.15/kernel/futex.c
@@ -917,6 +917,8 @@
         up(&mapping->robust_head->robust_sem);

         kmem_cache_free(file_futex_cachep, futex_head);
+       mapping->robust_head = NULL;
+
         return;
  }


The new full patch is attached, and at

http://source.mvista.com/~dsingleton/robust–futex-3

David



[-- Attachment #2: robust-futex-3 --]
[-- Type: application/octet-stream, Size: 21550 bytes --]

Signed-off-by: David Singleton <dsingleton@mvista.com>

	The original code for this patch was supplied by Todd Kneisel.

	Robust futexes provide a locking mechanism that can be shared between
	user mode processes. The major difference between robust futexes and
	regular futexes is that when the owner of a robust futex dies, the
	next task waiting on the futex will be awakened, will get ownership
	of the futex lock, and will receive the error status EOWNERDEAD.

	Robust futexes allow the system to gracefuly continue if an application
	dies while holding a futex, without leaving waiting threads hung or
	requiring the system to be rebooted to clear the hang.

	Robust futexes are structures hung on a linked list on the inode.
	The list is scanned at exit time to find any futexes still held
	by the dying thread.  The structures backing the futex are removed
	when a dentry reference count drops to zero.  The exit path cleans
	up and locked futexes and the dentry reference count handles cleaning
	up robust futex resources.

	This implimentation supports shared pthread_mutexes, pthread_mutexes
	mmapped either in a file or in mmapped anonymous memory.

	Ulrich Drepper has a glibc that handles non-shared (malloc'd
	pthread_mutexes) all in user space.  This patch returns
	-ENOTSHARED from register futex if it is not a shared futex so his
	glibc can handle the non-shared pthread_mutexes without kernel support.

	The structure for robustness has its own slab cache, making
	it easy track how many robust futexes are in the system through
	/proc/slabinfo.
	
	Robust futexes have been tested fairly well in the community using
	realtime-preempt patch, which supports both robust and priority 
	inheriting pthread_mutexes.  There is a simple test suite, originally
	also supplied by Todd Kneisel, that tests basic robustness using
	regular futexes that have robust attribute applied.  They've also
	been tested using the fusyn posix test suite, with minor modifications
	to make the fusyn tests run with the NPTL libraries.

	Fixed a memory leak of futex_head structures in register_futex.

	Fixed a memory leak in free_futex_list.

 fs/dcache.c           |    2 
 include/linux/fs.h    |    2 
 include/linux/futex.h |   33 +++
 init/Kconfig          |    9 +
 kernel/exit.c         |    2 
 kernel/futex.c        |  433 +++++++++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 473 insertions(+), 8 deletions(-)

Index: linux-2.6.15/fs/dcache.c
===================================================================
--- linux-2.6.15.orig/fs/dcache.c
+++ linux-2.6.15/fs/dcache.c
@@ -33,6 +33,7 @@
 #include <linux/seqlock.h>
 #include <linux/swap.h>
 #include <linux/bootmem.h>
+#include <linux/futex.h>
 
 /* #define DCACHE_DEBUG 1 */
 
@@ -161,6 +162,7 @@ repeat:
 		return;
 	}
 
+	futex_free_robust_list(dentry->d_inode);
 	/*
 	 * AV: ->d_delete() is _NOT_ allowed to block now.
 	 */
Index: linux-2.6.15/include/linux/fs.h
===================================================================
--- linux-2.6.15.orig/include/linux/fs.h
+++ linux-2.6.15/include/linux/fs.h
@@ -9,6 +9,7 @@
 #include <linux/config.h>
 #include <linux/limits.h>
 #include <linux/ioctl.h>
+#include <linux/futex.h>
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -383,6 +384,7 @@ struct address_space {
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	struct address_space	*assoc_mapping;	/* ditto */
+	struct futex_head	*robust_head;	/* list of robust futexes */
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
Index: linux-2.6.15/include/linux/futex.h
===================================================================
--- linux-2.6.15.orig/include/linux/futex.h
+++ linux-2.6.15/include/linux/futex.h
@@ -10,6 +10,38 @@
 #define FUTEX_REQUEUE		3
 #define FUTEX_CMP_REQUEUE	4
 #define FUTEX_WAKE_OP		5
+#define FUTEX_REGISTER          6
+#define FUTEX_DEREGISTER        7
+#define FUTEX_RECOVER           8
+
+#define FUTEX_WAITERS				0x80000000
+#define FUTEX_OWNER_DIED			0x40000000
+#define FUTEX_NOT_RECOVERABLE			0x20000000
+#define FUTEX_FLAGS (FUTEX_WAITERS | FUTEX_OWNER_DIED | FUTEX_NOT_RECOVERABLE)
+#define FUTEX_PID                             ~(FUTEX_FLAGS)
+
+#define FUTEX_ATTR_SHARED                       0x10000000
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <asm/semaphore.h>
+
+#ifdef CONFIG_ROBUST_FUTEX
+
+struct futex_head {
+	struct list_head robust_list;
+	struct semaphore robust_sem;
+};
+
+struct inode;
+struct task_struct;
+extern void futex_free_robust_list(struct inode *inode);
+extern void exit_futex(struct task_struct *tsk);
+#else
+# define futex_free_robust_list(a)      do { } while (0)
+# define exit_futex(b)                  do { } while (0)
+#define futex_init_inode(a) 		do { } while (0)
+#endif
 
 long do_futex(unsigned long uaddr, int op, int val,
 		unsigned long timeout, unsigned long uaddr2, int val2,
@@ -41,3 +73,4 @@ long do_futex(unsigned long uaddr, int o
    | ((oparg & 0xfff) << 12) | (cmparg & 0xfff))
 
 #endif
+#endif
Index: linux-2.6.15/kernel/exit.c
===================================================================
--- linux-2.6.15.orig/kernel/exit.c
+++ linux-2.6.15/kernel/exit.c
@@ -31,6 +31,7 @@
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
+#include <linux/futex.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -847,6 +848,7 @@ fastcall NORET_TYPE void do_exit(long co
 		exit_itimers(tsk->signal);
 		acct_process(code);
 	}
+	exit_futex(tsk);
 	exit_mm(tsk);
 
 	exit_sem(tsk);
Index: linux-2.6.15/kernel/futex.c
===================================================================
--- linux-2.6.15.orig/kernel/futex.c
+++ linux-2.6.15/kernel/futex.c
@@ -8,6 +8,9 @@
  *  Removed page pinning, fix privately mapped COW pages and other cleanups
  *  (C) Copyright 2003, 2004 Jamie Lokier
  *
+ *  Robust futexes added by Todd Kneisel
+ *  (C) Copyright 2005, Bull HN.
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -41,6 +44,7 @@
 #include <linux/syscalls.h>
 #include <linux/signal.h>
 #include <asm/futex.h>
+#include <asm/uaccess.h>
 
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
@@ -140,7 +144,8 @@ static inline int match_futex(union fute
  *
  * Should be called with &current->mm->mmap_sem but NOT any spinlocks.
  */
-static int get_futex_key(unsigned long uaddr, union futex_key *key)
+static int get_futex_key(unsigned long uaddr, union futex_key *key,
+			struct list_head **list, struct semaphore **sem)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
@@ -163,6 +168,15 @@ static int get_futex_key(unsigned long u
 	if (unlikely(!vma))
 		return -EFAULT;
 
+	if (vma->vm_file && vma->vm_file->f_mapping &&
+	    vma->vm_file->f_mapping->robust_head) {
+		*list = &vma->vm_file->f_mapping->robust_head->robust_list;
+		*sem = &vma->vm_file->f_mapping->robust_head->robust_sem;
+	} else {
+		*sem = NULL;
+		*list = NULL;
+	}
+
 	/*
 	 * Permissions.
 	 */
@@ -290,11 +304,12 @@ static int futex_wake(unsigned long uadd
 	struct futex_hash_bucket *bh;
 	struct list_head *head;
 	struct futex_q *this, *next;
+	struct semaphore *sem;
 	int ret;
 
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr, &key);
+	ret = get_futex_key(uaddr, &key, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -325,16 +340,17 @@ static int futex_wake_op(unsigned long u
 	union futex_key key1, key2;
 	struct futex_hash_bucket *bh1, *bh2;
 	struct list_head *head;
+	struct semaphore *sem;
 	struct futex_q *this, *next;
 	int ret, op_ret, attempt = 0;
 
 retryfull:
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr1, &key1);
+	ret = get_futex_key(uaddr1, &key1, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, &key2);
+	ret = get_futex_key(uaddr2, &key2, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -450,16 +466,17 @@ static int futex_requeue(unsigned long u
 	union futex_key key1, key2;
 	struct futex_hash_bucket *bh1, *bh2;
 	struct list_head *head1;
+	struct semaphore *sem;
 	struct futex_q *this, *next;
 	int ret, drop_count = 0;
 
  retry:
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr1, &key1);
+	ret = get_futex_key(uaddr1, &key1, &head1, &sem);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, &key2);
+	ret = get_futex_key(uaddr2, &key2, &head1, &sem);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -624,11 +641,13 @@ static int futex_wait(unsigned long uadd
 	int ret, curval;
 	struct futex_q q;
 	struct futex_hash_bucket *bh;
+	struct list_head *head;
+	struct semaphore *sem;
 
  retry:
 	down_read(&current->mm->mmap_sem);
 
-	ret = get_futex_key(uaddr, &q.key);
+	ret = get_futex_key(uaddr, &q.key, &head, &sem);
 	if (unlikely(ret != 0))
 		goto out_release_sem;
 
@@ -766,6 +785,8 @@ static int futex_fd(unsigned long uaddr,
 {
 	struct futex_q *q;
 	struct file *filp;
+	struct list_head *head;
+	struct semaphore *sem;
 	int ret, err;
 
 	ret = -EINVAL;
@@ -801,7 +822,7 @@ static int futex_fd(unsigned long uaddr,
 	}
 
 	down_read(&current->mm->mmap_sem);
-	err = get_futex_key(uaddr, &q->key);
+	err = get_futex_key(uaddr, &q->key, &head, &sem);
 
 	if (unlikely(err != 0)) {
 		up_read(&current->mm->mmap_sem);
@@ -829,6 +850,387 @@ error:
 	goto out;
 }
 
+#ifdef CONFIG_ROBUST_FUTEX
+/*
+ * Robust futexes provide a locking mechanism that can be shared between
+ * user mode processes. The major difference between robust futexes and
+ * regular futexes is that when the owner of a robust futex dies, the
+ * next task waiting on the futex will be awakened, will get ownership
+ * of the futex lock, and will receive the error status EOWNERDEAD.
+ *
+ * A robust futex is a 32 bit integer stored in user mode shared memory.
+ * Bit 31 indicates that there are tasks waiting on the futex.
+ * Bit 30 indicates that the task that owned the futex has died.
+ * Bit 29 indicates that the futex is not recoverable and cannot be used.
+ * Bits 0-28 are the pid of the task that owns the futex lock, or zero if
+ * the futex is not locked.
+ */
+
+static kmem_cache_t *robust_futex_cachep;
+static kmem_cache_t *file_futex_cachep;
+/*
+ * Used to track registered robust futexes. Attached to linked list in inodes.
+ */
+struct futex_robust {
+	struct list_head list;
+	union futex_key key;
+};
+
+/**
+ * futex_free_robust_list - release the list of registered futexes.
+ * @inode: inode that may be a memory mapped file
+ *
+ * Called from dput() when a dentry reference count reaches zero.
+ * If the dentry is associated with a memory mapped file, then
+ * release the list of registered robust futexes that are contained
+ * in that mapping.
+ */
+void futex_free_robust_list(struct inode *inode)
+{
+	struct address_space *mapping;
+	struct list_head *head;
+ 	struct futex_robust *this, *next;
+	struct futex_head *futex_head = NULL;
+
+	if (inode == NULL)
+		return;
+
+	mapping = inode->i_mapping;
+	if (mapping == NULL)
+		return;
+	if (mapping->robust_head == NULL)
+		return;
+
+	if (list_empty(&mapping->robust_head->robust_list))
+		return;
+
+	down(&mapping->robust_head->robust_sem);
+
+	head = &mapping->robust_head->robust_list;
+	futex_head = mapping->robust_head;
+
+	list_for_each_entry_safe(this, next, head, list) {
+		list_del(&this->list);
+		kmem_cache_free(robust_futex_cachep, this);
+	}
+
+	up(&mapping->robust_head->robust_sem);
+
+	kmem_cache_free(file_futex_cachep, futex_head);
+	mapping->robust_head = NULL;
+
+	return;
+}
+
+/**
+ * get_private_uaddr - convert a private futex_key to a user addr
+ * @key: the futex_key that identifies a futex.
+ *
+ * Private futex_keys identify a futex that is in non-shared memory.
+ * Robust futexes should never result in private futex_keys, but keep
+ * this code for completeness.
+ * Returns zero if futex is not contained in current task's mm
+ */
+static unsigned long get_private_uaddr(union futex_key *key)
+{
+	unsigned long uaddr = 0;
+
+	if (key->private.mm == current->mm)
+		uaddr = key->private.uaddr;
+	return uaddr;
+}
+
+/**
+ * get_shared_uaddr - convert a shared futex_key to a user addr.
+ * @key: a futex_key that identifies a futex.
+ * @vma: a vma that may contain the futex
+ *
+ * Shared futex_keys identify a futex that is contained in a vma,
+ * and so may be shared.
+ * Returns zero if futex is not contained in @vma
+ */
+static unsigned long get_shared_uaddr(union futex_key *key,
+				      struct vm_area_struct *vma)
+{
+	unsigned long uaddr = 0;
+	unsigned long tmpaddr;
+	struct address_space *mapping;
+
+	mapping = vma->vm_file->f_mapping;
+	if (key->shared.inode == mapping->host) {
+		tmpaddr = ((key->shared.pgoff - vma->vm_pgoff) << PAGE_SHIFT)
+				+ (key->shared.offset & ~0x1)
+				+ vma->vm_start;
+		if (tmpaddr >= vma->vm_start && tmpaddr < vma->vm_end)
+			uaddr = tmpaddr;
+	}
+
+	return uaddr;
+}
+
+/**
+ * get_futex_uaddr - convert a futex_key to a user addr.
+ * @key: futex_key that identifies a futex
+ * @vma: vma that may contain the futex
+ *
+ * Converts both shared and private futex_keys.
+ * Returns zero if futex is not contained in @vma or in the current
+ * task's mm.
+ */
+static unsigned long get_futex_uaddr(union futex_key *key,
+				     struct vm_area_struct *vma)
+{
+	unsigned long uaddr;
+
+	if ((key->both.offset & 0x1) == 0)
+		uaddr = get_private_uaddr(key);
+	else
+		uaddr = get_shared_uaddr(key,vma);
+
+	return uaddr;
+}
+
+/**
+ * find_owned_futex - find futexes owned by the current task
+ * @tsk: task that is exiting
+ * @vma: the vma to search for futexes
+ * @head: list head for list of robust futexes
+ * @sem: semaphore that protects the list
+ *
+ * Walk the list of registered robust futexes for this @vma,
+ * setting the %FUTEX_OWNER_DIED flag on those futexes owned
+ * by the current, exiting task.
+ */
+static void
+find_owned_futex(struct task_struct *tsk, struct vm_area_struct *vma,
+		 struct list_head *head, struct semaphore *sem)
+{
+	struct futex_robust *this, *next;
+ 	unsigned long uaddr;
+	int value;
+
+	down(sem);
+
+	list_for_each_entry_safe(this, next, head, list) {
+
+		uaddr = get_futex_uaddr(&this->key, vma);
+		if (uaddr == 0)
+			continue;
+
+		up(sem);
+		up_read(&current->mm->mmap_sem);
+		get_user(value, (int __user *)uaddr);
+		if ((value & FUTEX_PID) == tsk->pid) {
+			value |= FUTEX_OWNER_DIED;
+			futex_wake(uaddr, 1);
+			put_user(value, (int *__user)uaddr);
+		}
+		down_read(&current->mm->mmap_sem);
+		down(sem);
+	}
+
+	up(sem);
+}
+
+/**
+ * exit_futex - futex processing when a task exits.
+ *
+ * Called from do_exit() when a task exits. Mark all robust futexes
+ * that are owned by the current terminating task as %FUTEX_OWNER_DIED.
+ */
+
+void exit_futex(struct task_struct *tsk)
+{
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	struct list_head *robust;
+	struct semaphore *sem;
+
+	mm = current->mm;
+	if (mm==NULL)
+		return;
+
+	down_read(&mm->mmap_sem);
+
+	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
+		if (vma->vm_file == NULL)
+			continue;
+
+		if (vma->vm_file->f_mapping == NULL)
+			continue;
+
+		if (vma->vm_file->f_mapping->robust_head == NULL)
+			continue;
+
+		robust = &vma->vm_file->f_mapping->robust_head->robust_list;
+		sem = &vma->vm_file->f_mapping->robust_head->robust_sem;
+		if (list_empty(robust))
+			continue;
+
+		find_owned_futex(tsk, vma, robust, sem);
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+static void init_robust_list(struct address_space *f, struct futex_head *head)
+{
+	f->robust_head = head;
+	INIT_LIST_HEAD(&f->robust_head->robust_list);
+	init_MUTEX(&f->robust_head->robust_sem);
+}
+
+/**
+ * futex_register - Record the existence of a robust futex in a vma.
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is created. Looks up the vma that contains the futex and
+ * adds an entry to the list of all robust futexes in the vma.
+ */
+static int futex_register(unsigned long uaddr, unsigned int attr)
+{
+	int ret;
+	struct futex_robust *robust;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	struct list_head *head = NULL;
+	struct semaphore *sem = NULL;
+	struct futex_head *file_futex = NULL;
+
+	if ((attr & FUTEX_ATTR_SHARED) == 0)
+		return -EADDRNOTAVAIL;
+
+	robust = kmem_cache_alloc(robust_futex_cachep, GFP_KERNEL);
+	if (!robust) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	file_futex = kmem_cache_alloc(file_futex_cachep, GFP_KERNEL);
+	if (!file_futex) {
+		ret = -ENOMEM;
+		kmem_cache_free(robust_futex_cachep, robust);
+		goto out;
+	}
+
+	down_read(&current->mm->mmap_sem);
+
+	ret = get_futex_key(uaddr, &robust->key, &head, &sem);
+	if (unlikely(ret != 0))	{
+		kmem_cache_free(robust_futex_cachep, robust);
+		kmem_cache_free(file_futex_cachep, file_futex);
+		goto out_unlock;
+	}
+
+	vma = find_extend_vma(mm, uaddr);
+	if (unlikely(!vma)) {
+		ret = -EFAULT;
+		kmem_cache_free(robust_futex_cachep, robust);
+		kmem_cache_free(file_futex_cachep, file_futex);
+		goto out_unlock;
+	}
+
+	if (vma->vm_file && vma->vm_file->f_mapping) {
+		if (vma->vm_file->f_mapping->robust_head == NULL)
+			init_robust_list(vma->vm_file->f_mapping, file_futex);
+		else
+			kmem_cache_free(file_futex_cachep, file_futex);
+		head = &vma->vm_file->f_mapping->robust_head->robust_list;
+		sem = &vma->vm_file->f_mapping->robust_head->robust_sem;
+	} else {
+		ret = -EADDRNOTAVAIL;
+		kmem_cache_free(robust_futex_cachep, robust);
+		kmem_cache_free(file_futex_cachep, file_futex);
+		goto out_unlock;
+	}
+
+	down(sem);
+	list_add_tail(&robust->list, head);
+	up(sem);
+
+out_unlock:
+	up_read(&current->mm->mmap_sem);
+out:
+	return ret;
+}
+
+/**
+ * futex_deregister - Delete robust futex registration from a vma
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is destroyed. Looks up the vma that contains the futex and
+ * removes the futex entry from the list of all robust futexes in
+ * the vma.
+ */
+static int futex_deregister(unsigned long uaddr)
+{
+	union futex_key key;
+	struct mm_struct *mm = current->mm;
+	struct list_head *head = NULL;
+	struct semaphore *sem = NULL;
+	struct futex_robust *this, *next;
+	int ret;
+
+	down_read(&mm->mmap_sem);
+
+	ret = get_futex_key(uaddr, &key, &head, &sem);
+	if (unlikely(ret != 0))
+		goto out;
+	if (head == NULL) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	down(sem);
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (match_futex (&this->key, &key)) {
+			futex_wake(uaddr, 1);
+			list_del(&this->list);
+			kmem_cache_free(robust_futex_cachep, this);
+			break;
+		}
+	}
+
+	up(sem);
+out:
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+
+/**
+ * futex_recover - Recover a futex after its owner died
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall).
+ * When a task dies while owning a robust futex, the futex is
+ * marked with %FUTEX_OWNER_DIED and ownership is transferred
+ * to the next waiting task. That task can choose to restore
+ * the futex to a useful state by calling this function.
+ */
+static int futex_recover(unsigned long uaddr)
+{
+	int ret = 0;
+	int value = 0;
+	union futex_key key;
+	struct list_head *head = NULL;
+	struct semaphore *sem = NULL;
+
+	down_read(&current->mm->mmap_sem);
+	ret = get_futex_key(uaddr, &key, &head, &sem);
+	up_read(&current->mm->mmap_sem);
+	if (ret != 0)
+		return ret;
+
+	if (get_user(value, (int *__user)uaddr))
+		return ret;
+
+	value &= ~FUTEX_OWNER_DIED;
+	return put_user(value, (int *__user)uaddr);
+}
+#endif
+
 long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,
 		unsigned long uaddr2, int val2, int val3)
 {
@@ -854,6 +1256,17 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE_OP:
 		ret = futex_wake_op(uaddr, uaddr2, val, val2, val3);
 		break;
+#ifdef CONFIG_ROBUST_FUTEX
+	case FUTEX_REGISTER:
+		ret = futex_register(uaddr, val);
+		break;
+	case FUTEX_DEREGISTER:
+		ret = futex_deregister(uaddr);
+		break;
+	case FUTEX_RECOVER:
+		ret = futex_recover(uaddr);
+		break;
+#endif
 	default:
 		ret = -ENOSYS;
 	}
@@ -901,6 +1314,10 @@ static int __init init(void)
 {
 	unsigned int i;
 
+#ifdef CONFIG_ROBUST_FUTEX
+	robust_futex_cachep = kmem_cache_create("robust_futex", sizeof(struct futex_robust), 0, 0, NULL, NULL);
+	file_futex_cachep = kmem_cache_create("file_futex", sizeof(struct futex_head), 0, 0, NULL, NULL);
+#endif
 	register_filesystem(&futex_fs_type);
 	futex_mnt = kern_mount(&futex_fs_type);
 
Index: linux-2.6.15/init/Kconfig
===================================================================
--- linux-2.6.15.orig/init/Kconfig
+++ linux-2.6.15/init/Kconfig
@@ -348,6 +348,15 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config ROBUST_FUTEX
+	bool "Enable robust futex support"
+	depends on FUTEX
+	default y
+	help
+	  Enable this option if you want to use robust user space mutexes.
+	  Enabling this option slows down the exit path of the kernel for
+	  all processes.  Robust futexes will run glibc-based applications correctly.
+
 config EPOLL
 	bool "Enable eventpoll support" if EMBEDDED
 	default y

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-3] futex: robust futex support
  2006-01-17  2:27       ` [robust-futex-3] " david singleton
@ 2006-01-17 17:32         ` Ulrich Drepper
  2006-01-17 17:50         ` Ulrich Drepper
  1 sibling, 0 replies; 15+ messages in thread
From: Ulrich Drepper @ 2006-01-17 17:32 UTC (permalink / raw)
  To: david singleton; +Cc: akpm, mingo, linux-kernel

On 1/16/06, david singleton <dsingleton@mvista.com> wrote:
>         I've fixed another memory leak in free_robust_list.   The entries in
> the slab caches
> now look correct through the full test suite up to 7500 threads.   Does
> your glibc
> work correctly with this patch?

I'll see shortly.

But looking at the patch, I don't understand the use of
FUTEX_ATTR_SHARED.  The EADDRNOTAVAIL error is something the kernel
has to return if the address is not that of an object in a shared
memory region.  It's not information directly provided by the user of
futex_register.

So, I suggest removing the attr parameter from futex_register and
after get_futex_key, when you know where the futex is actually
located, return -EADDRNOTAVAIL if the futex is in private memory.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-3] futex: robust futex support
  2006-01-17  2:27       ` [robust-futex-3] " david singleton
  2006-01-17 17:32         ` Ulrich Drepper
@ 2006-01-17 17:50         ` Ulrich Drepper
  2006-01-19  2:26           ` [robust-futex-4] " david singleton
  1 sibling, 1 reply; 15+ messages in thread
From: Ulrich Drepper @ 2006-01-17 17:50 UTC (permalink / raw)
  To: david singleton; +Cc: akpm, mingo, linux-kernel

And another thing: semaphores are on their way out.  So, in
futex_deregister and in futex_head, shouldn't you use mutexes?  I
don't see that you realy need semaphores.

In futex_register, you define mm and initialize it with current->mm. 
That's OK.  But why then are you using

+       down_read(&current->mm->mmap_sem);

just a few lines below?

And finally (for now): in get_futex_key the VMA containing the futex
is determined.  And yet, in futex_register you have an identical
find_extend_vma call.  I don't know how expensive this function is. 
But I would assume that at least the error handling in futex_register
can go away since the VMA cannot be torn down while mmap_sem is taken,
right?  But perhaps this just points to more inconsistencies.  Why is
the list/sem lookup in get_futex_key?  Only to share the code with
futex_deregister.  But is that really worth it?  The majority of calls
to get_futex_key come from all the other call sites so the code you
added is only a cost without any gain.  Especially since you could in
futex_register do the whole thing without any additional cost and
because most of the new tests in get_futex_key are again tested in
futex_register (to determined shared vs non-shared) and do not have to
be tested in futex_deregister (we know the futex is in shared memory).

I suggest that if find_extend_vma is sufficiently expensive, pass a
pointer to a variable of that type to get_futex_key.  If it is cheap,
don't do anything.  Pull the new code in get_futex_key into
futex_register and futex_deregister, optimize out unnecessary code,
and merge with the rest of the functions.  It'll be much less
invasive.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [robust-futex-4] futex: robust futex support
  2006-01-17 17:50         ` Ulrich Drepper
@ 2006-01-19  2:26           ` david singleton
  2006-01-19  5:22             ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: david singleton @ 2006-01-19  2:26 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: akpm, mingo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

Ulrich,
	I've incorporated your suggestions, wih the exception of passing the 
pthreaad
mutex attribute into the kernel for futex_register.  I'd like to be 
able to
pass the attributes associated with the pthread_mutex into the kernel
so the kernel can support whatever attributes are on the mutex without
having to change the interface between glibc and the kernel.

	If we pass the attributes in for robustness we don't have to change
the interface if/when the kernel supports other attributes, like
priority inheritance.

	Let me know what you think of the implementation, and thanks
for the suggestions, they made the patch much less intrusive.


David

The original code for this patch was supplied by Todd Kneisel.

  Incorporated changes suggested by Urich Drepper.

  fs/dcache.c           |    2
  include/linux/fs.h    |    2
  include/linux/futex.h |   33 ++++
  init/Kconfig          |    9 +
  kernel/exit.c         |    2
  kernel/futex.c        |  395 
++++++++++++++++++++++++++++++++++++++++++++++++++
  6 files changed, 443 insertions(+)


[-- Attachment #2: robust-futex-4 --]
[-- Type: application/octet-stream, Size: 16422 bytes --]

Signed-off-by: David Singleton <dsingleton@mvista.com>

	The original code for this patch was supplied by Todd Kneisel.

	Incorporated changes suggested by Urich Drepper.

 fs/dcache.c           |    2 
 include/linux/fs.h    |    2 
 include/linux/futex.h |   33 ++++
 init/Kconfig          |    9 +
 kernel/exit.c         |    2 
 kernel/futex.c        |  395 ++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 443 insertions(+)

Index: linux-2.6.15/fs/dcache.c
===================================================================
--- linux-2.6.15.orig/fs/dcache.c
+++ linux-2.6.15/fs/dcache.c
@@ -33,6 +33,7 @@
 #include <linux/seqlock.h>
 #include <linux/swap.h>
 #include <linux/bootmem.h>
+#include <linux/futex.h>
 
 /* #define DCACHE_DEBUG 1 */
 
@@ -161,6 +162,7 @@ repeat:
 		return;
 	}
 
+	futex_free_robust_list(dentry->d_inode);
 	/*
 	 * AV: ->d_delete() is _NOT_ allowed to block now.
 	 */
Index: linux-2.6.15/include/linux/fs.h
===================================================================
--- linux-2.6.15.orig/include/linux/fs.h
+++ linux-2.6.15/include/linux/fs.h
@@ -9,6 +9,7 @@
 #include <linux/config.h>
 #include <linux/limits.h>
 #include <linux/ioctl.h>
+#include <linux/futex.h>
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -383,6 +384,7 @@ struct address_space {
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	struct address_space	*assoc_mapping;	/* ditto */
+	struct futex_head	*robust_head;	/* list of robust futexes */
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
Index: linux-2.6.15/include/linux/futex.h
===================================================================
--- linux-2.6.15.orig/include/linux/futex.h
+++ linux-2.6.15/include/linux/futex.h
@@ -10,6 +10,38 @@
 #define FUTEX_REQUEUE		3
 #define FUTEX_CMP_REQUEUE	4
 #define FUTEX_WAKE_OP		5
+#define FUTEX_REGISTER          6
+#define FUTEX_DEREGISTER        7
+#define FUTEX_RECOVER           8
+
+#define FUTEX_WAITERS				0x80000000
+#define FUTEX_OWNER_DIED			0x40000000
+#define FUTEX_NOT_RECOVERABLE			0x20000000
+#define FUTEX_FLAGS (FUTEX_WAITERS | FUTEX_OWNER_DIED | FUTEX_NOT_RECOVERABLE)
+#define FUTEX_PID                             ~(FUTEX_FLAGS)
+
+#define FUTEX_ATTR_SHARED                       0x10000000
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#ifdef CONFIG_ROBUST_FUTEX
+
+struct futex_head {
+	struct list_head robust_list;
+	struct mutex robust_mutex;
+};
+
+struct inode;
+struct task_struct;
+extern void futex_free_robust_list(struct inode *inode);
+extern void exit_futex(struct task_struct *tsk);
+#else
+# define futex_free_robust_list(a)      do { } while (0)
+# define exit_futex(b)                  do { } while (0)
+#define futex_init_inode(a) 		do { } while (0)
+#endif
 
 long do_futex(unsigned long uaddr, int op, int val,
 		unsigned long timeout, unsigned long uaddr2, int val2,
@@ -41,3 +73,4 @@ long do_futex(unsigned long uaddr, int o
    | ((oparg & 0xfff) << 12) | (cmparg & 0xfff))
 
 #endif
+#endif
Index: linux-2.6.15/kernel/exit.c
===================================================================
--- linux-2.6.15.orig/kernel/exit.c
+++ linux-2.6.15/kernel/exit.c
@@ -31,6 +31,7 @@
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
+#include <linux/futex.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -847,6 +848,7 @@ fastcall NORET_TYPE void do_exit(long co
 		exit_itimers(tsk->signal);
 		acct_process(code);
 	}
+	exit_futex(tsk);
 	exit_mm(tsk);
 
 	exit_sem(tsk);
Index: linux-2.6.15/kernel/futex.c
===================================================================
--- linux-2.6.15.orig/kernel/futex.c
+++ linux-2.6.15/kernel/futex.c
@@ -8,6 +8,9 @@
  *  Removed page pinning, fix privately mapped COW pages and other cleanups
  *  (C) Copyright 2003, 2004 Jamie Lokier
  *
+ *  Robust futexes added by Todd Kneisel
+ *  (C) Copyright 2005, Bull HN.
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -40,7 +43,9 @@
 #include <linux/pagemap.h>
 #include <linux/syscalls.h>
 #include <linux/signal.h>
+#include <linux/mutex.h>
 #include <asm/futex.h>
+#include <asm/uaccess.h>
 
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
@@ -829,6 +834,381 @@ error:
 	goto out;
 }
 
+#ifdef CONFIG_ROBUST_FUTEX
+/*
+ * Robust futexes provide a locking mechanism that can be shared between
+ * user mode processes. The major difference between robust futexes and
+ * regular futexes is that when the owner of a robust futex dies, the
+ * next task waiting on the futex will be awakened, will get ownership
+ * of the futex lock, and will receive the error status EOWNERDEAD.
+ *
+ * A robust futex is a 32 bit integer stored in user mode shared memory.
+ * Bit 31 indicates that there are tasks waiting on the futex.
+ * Bit 30 indicates that the task that owned the futex has died.
+ * Bit 29 indicates that the futex is not recoverable and cannot be used.
+ * Bits 0-28 are the pid of the task that owns the futex lock, or zero if
+ * the futex is not locked.
+ */
+
+static kmem_cache_t *robust_futex_cachep;
+static kmem_cache_t *file_futex_cachep;
+/*
+ * Used to track registered robust futexes. Attached to linked list in inodes.
+ */
+struct futex_robust {
+	struct list_head list;
+	union futex_key key;
+};
+
+/**
+ * futex_free_robust_list - release the list of registered futexes.
+ * @inode: inode that may be a memory mapped file
+ *
+ * Called from dput() when a dentry reference count reaches zero.
+ * If the dentry is associated with a memory mapped file, then
+ * release the list of registered robust futexes that are contained
+ * in that mapping.
+ */
+void futex_free_robust_list(struct inode *inode)
+{
+	struct address_space *mapping;
+	struct list_head *head;
+ 	struct futex_robust *this, *next;
+	struct futex_head *futex_head = NULL;
+
+	if (inode == NULL)
+		return;
+
+	mapping = inode->i_mapping;
+	if (mapping == NULL)
+		return;
+	if (mapping->robust_head == NULL)
+		return;
+
+	if (list_empty(&mapping->robust_head->robust_list))
+		return;
+
+	mutex_lock(&mapping->robust_head->robust_mutex);
+
+	head = &mapping->robust_head->robust_list;
+	futex_head = mapping->robust_head;
+
+	list_for_each_entry_safe(this, next, head, list) {
+		list_del(&this->list);
+		kmem_cache_free(robust_futex_cachep, this);
+	}
+
+	mutex_unlock(&mapping->robust_head->robust_mutex);
+
+	kmem_cache_free(file_futex_cachep, futex_head);
+	mapping->robust_head = NULL;
+
+	return;
+}
+
+/**
+ * get_private_uaddr - convert a private futex_key to a user addr
+ * @key: the futex_key that identifies a futex.
+ *
+ * Private futex_keys identify a futex that is in non-shared memory.
+ * Robust futexes should never result in private futex_keys, but keep
+ * this code for completeness.
+ * Returns zero if futex is not contained in current task's mm
+ */
+static unsigned long get_private_uaddr(union futex_key *key)
+{
+	unsigned long uaddr = 0;
+
+	if (key->private.mm == current->mm)
+		uaddr = key->private.uaddr;
+	return uaddr;
+}
+
+/**
+ * get_shared_uaddr - convert a shared futex_key to a user addr.
+ * @key: a futex_key that identifies a futex.
+ * @vma: a vma that may contain the futex
+ *
+ * Shared futex_keys identify a futex that is contained in a vma,
+ * and so may be shared.
+ * Returns zero if futex is not contained in @vma
+ */
+static unsigned long get_shared_uaddr(union futex_key *key,
+				      struct vm_area_struct *vma)
+{
+	unsigned long uaddr = 0;
+	unsigned long tmpaddr;
+	struct address_space *mapping;
+
+	mapping = vma->vm_file->f_mapping;
+	if (key->shared.inode == mapping->host) {
+		tmpaddr = ((key->shared.pgoff - vma->vm_pgoff) << PAGE_SHIFT)
+				+ (key->shared.offset & ~0x1)
+				+ vma->vm_start;
+		if (tmpaddr >= vma->vm_start && tmpaddr < vma->vm_end)
+			uaddr = tmpaddr;
+	}
+
+	return uaddr;
+}
+
+/**
+ * get_futex_uaddr - convert a futex_key to a user addr.
+ * @key: futex_key that identifies a futex
+ * @vma: vma that may contain the futex
+ *
+ * Converts both shared and private futex_keys.
+ * Returns zero if futex is not contained in @vma or in the current
+ * task's mm.
+ */
+static unsigned long get_futex_uaddr(union futex_key *key,
+				     struct vm_area_struct *vma)
+{
+	unsigned long uaddr;
+
+	if ((key->both.offset & 0x1) == 0)
+		uaddr = get_private_uaddr(key);
+	else
+		uaddr = get_shared_uaddr(key,vma);
+
+	return uaddr;
+}
+
+/**
+ * find_owned_futex - find futexes owned by the current task
+ * @tsk: task that is exiting
+ * @vma: vma containing robust futexes
+ * @head: list head for list of robust futexes
+ * @mutex: mutex that protects the list
+ *
+ * Walk the list of registered robust futexes for this @vma,
+ * setting the %FUTEX_OWNER_DIED flag on those futexes owned
+ * by the current, exiting task.
+ */
+static void
+find_owned_futex(struct task_struct *tsk, struct vm_area_struct *vma,
+		 struct list_head *head, struct mutex *mutex)
+{
+	struct futex_robust *this, *next;
+ 	unsigned long uaddr;
+	int value;
+
+	mutex_lock(mutex);
+
+	list_for_each_entry_safe(this, next, head, list) {
+
+		uaddr = get_futex_uaddr(&this->key, vma);
+		if (uaddr == 0)
+			continue;
+
+		mutex_unlock(mutex);
+		up_read(&tsk->mm->mmap_sem);
+		get_user(value, (int __user *)uaddr);
+		if ((value & FUTEX_PID) == tsk->pid) {
+			value |= FUTEX_OWNER_DIED;
+			futex_wake(uaddr, 1);
+			put_user(value, (int *__user)uaddr);
+		}
+		down_read(&tsk->mm->mmap_sem);
+		mutex_lock(mutex);
+	}
+
+	mutex_unlock(mutex);
+}
+
+/**
+ * exit_futex - futex processing when a task exits.
+ *
+ * Called from do_exit() when a task exits. Mark all robust futexes
+ * that are owned by the current terminating task as %FUTEX_OWNER_DIED.
+ */
+
+void exit_futex(struct task_struct *tsk)
+{
+	struct mm_struct *mm;
+	struct list_head *robust;
+	struct vm_area_struct *vma;
+	struct mutex *mutex;
+
+	mm = current->mm;
+	if (mm==NULL)
+		return;
+
+	down_read(&mm->mmap_sem);
+
+	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
+		if (vma->vm_file == NULL)
+			continue;
+
+		if (vma->vm_file->f_mapping == NULL)
+			continue;
+
+		if (vma->vm_file->f_mapping->robust_head == NULL)
+			continue;
+
+		robust = &vma->vm_file->f_mapping->robust_head->robust_list;
+
+		if (list_empty(robust))
+			continue;
+
+		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
+
+		find_owned_futex(tsk, vma, robust, mutex);
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+static void init_robust_list(struct address_space *f, struct futex_head *head)
+{
+	f->robust_head = head;
+	INIT_LIST_HEAD(&f->robust_head->robust_list);
+	mutex_init(&f->robust_head->robust_mutex);
+}
+
+/**
+ * futex_register - Record the existence of a robust futex in a vma.
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is created. Looks up the vma that contains the futex and
+ * adds an entry to the list of all robust futexes in the vma.
+ */
+static int futex_register(unsigned long uaddr, unsigned int attr)
+{
+	int ret = 0;
+	struct futex_robust *robust;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	struct list_head *head = NULL;
+	struct mutex *mutex = NULL;
+	struct futex_head *file_futex = NULL;
+
+	if ((attr & FUTEX_ATTR_SHARED) == 0)
+		return -EADDRNOTAVAIL;
+
+	robust = kmem_cache_alloc(robust_futex_cachep, GFP_KERNEL);
+	if (!robust) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	file_futex = kmem_cache_alloc(file_futex_cachep, GFP_KERNEL);
+	if (!file_futex) {
+		ret = -ENOMEM;
+		kmem_cache_free(robust_futex_cachep, robust);
+		goto out;
+	}
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, uaddr);
+
+	if (vma->vm_file && vma->vm_file->f_mapping) {
+		if (vma->vm_file->f_mapping->robust_head == NULL)
+			init_robust_list(vma->vm_file->f_mapping, file_futex);
+		else
+			kmem_cache_free(file_futex_cachep, file_futex);
+		head = &vma->vm_file->f_mapping->robust_head->robust_list;
+		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
+	} else {
+		ret = -EADDRNOTAVAIL;
+		kmem_cache_free(robust_futex_cachep, robust);
+		kmem_cache_free(file_futex_cachep, file_futex);
+		goto out_unlock;
+	}
+
+	mutex_lock(mutex);
+	list_add_tail(&robust->list, head);
+	mutex_unlock(mutex);
+
+out_unlock:
+	up_read(&mm->mmap_sem);
+out:
+	return ret;
+}
+
+/**
+ * futex_deregister - Delete robust futex registration from a vma
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is destroyed. Looks up the vma that contains the futex and
+ * removes the futex entry from the list of all robust futexes in
+ * the vma.
+ */
+static int futex_deregister(unsigned long uaddr)
+{
+	union futex_key key;
+	struct mm_struct *mm = current->mm;
+	struct list_head *head = NULL;
+	struct mutex *mutex = NULL;
+	struct vm_area_struct *vma;
+	struct futex_robust *this, *next;
+	int ret;
+
+	down_read(&mm->mmap_sem);
+
+	ret = get_futex_key(uaddr, &key);
+	if (unlikely(ret != 0))
+		goto out;
+	vma = find_extend_vma(mm, uaddr);
+	if (vma->vm_file && vma->vm_file->f_mapping &&
+	    vma->vm_file->f_mapping->robust_head) {
+		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
+		head = &vma->vm_file->f_mapping->robust_head->robust_list;
+	}
+	if (head == NULL) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	mutex_lock(mutex);
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (match_futex (&this->key, &key)) {
+			futex_wake(uaddr, 1);
+			list_del(&this->list);
+			kmem_cache_free(robust_futex_cachep, this);
+			break;
+		}
+	}
+
+	mutex_unlock(mutex);
+out:
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+
+/**
+ * futex_recover - Recover a futex after its owner died
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall).
+ * When a task dies while owning a robust futex, the futex is
+ * marked with %FUTEX_OWNER_DIED and ownership is transferred
+ * to the next waiting task. That task can choose to restore
+ * the futex to a useful state by calling this function.
+ */
+static int futex_recover(unsigned long uaddr)
+{
+	int ret = 0;
+	int value = 0;
+	union futex_key key;
+
+	down_read(&current->mm->mmap_sem);
+	ret = get_futex_key(uaddr, &key);
+	up_read(&current->mm->mmap_sem);
+	if (ret != 0)
+		return ret;
+
+	if (get_user(value, (int *__user)uaddr))
+		return ret;
+
+	value &= ~FUTEX_OWNER_DIED;
+	return put_user(value, (int *__user)uaddr);
+}
+#endif
+
 long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,
 		unsigned long uaddr2, int val2, int val3)
 {
@@ -854,6 +1234,17 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE_OP:
 		ret = futex_wake_op(uaddr, uaddr2, val, val2, val3);
 		break;
+#ifdef CONFIG_ROBUST_FUTEX
+	case FUTEX_REGISTER:
+		ret = futex_register(uaddr, val);
+		break;
+	case FUTEX_DEREGISTER:
+		ret = futex_deregister(uaddr);
+		break;
+	case FUTEX_RECOVER:
+		ret = futex_recover(uaddr);
+		break;
+#endif
 	default:
 		ret = -ENOSYS;
 	}
@@ -901,6 +1292,10 @@ static int __init init(void)
 {
 	unsigned int i;
 
+#ifdef CONFIG_ROBUST_FUTEX
+	robust_futex_cachep = kmem_cache_create("robust_futex", sizeof(struct futex_robust), 0, 0, NULL, NULL);
+	file_futex_cachep = kmem_cache_create("file_futex", sizeof(struct futex_head), 0, 0, NULL, NULL);
+#endif
 	register_filesystem(&futex_fs_type);
 	futex_mnt = kern_mount(&futex_fs_type);
 
Index: linux-2.6.15/init/Kconfig
===================================================================
--- linux-2.6.15.orig/init/Kconfig
+++ linux-2.6.15/init/Kconfig
@@ -348,6 +348,15 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config ROBUST_FUTEX
+	bool "Enable robust futex support"
+	depends on FUTEX
+	default y
+	help
+	  Enable this option if you want to use robust user space mutexes.
+	  Enabling this option slows down the exit path of the kernel for
+	  all processes.  Robust futexes will run glibc-based applications correctly.
+
 config EPOLL
 	bool "Enable eventpoll support" if EMBEDDED
 	default y

[-- Attachment #3: Type: text/plain, Size: 1763 bytes --]



On Jan 17, 2006, at 9:50 AM, Ulrich Drepper wrote:

> And another thing: semaphores are on their way out.  So, in
> futex_deregister and in futex_head, shouldn't you use mutexes?  I
> don't see that you realy need semaphores.
>
> In futex_register, you define mm and initialize it with current->mm.
> That's OK.  But why then are you using
>
> +       down_read(&current->mm->mmap_sem);
>
> just a few lines below?
>
> And finally (for now): in get_futex_key the VMA containing the futex
> is determined.  And yet, in futex_register you have an identical
> find_extend_vma call.  I don't know how expensive this function is.
> But I would assume that at least the error handling in futex_register
> can go away since the VMA cannot be torn down while mmap_sem is taken,
> right?  But perhaps this just points to more inconsistencies.  Why is
> the list/sem lookup in get_futex_key?  Only to share the code with
> futex_deregister.  But is that really worth it?  The majority of calls
> to get_futex_key come from all the other call sites so the code you
> added is only a cost without any gain.  Especially since you could in
> futex_register do the whole thing without any additional cost and
> because most of the new tests in get_futex_key are again tested in
> futex_register (to determined shared vs non-shared) and do not have to
> be tested in futex_deregister (we know the futex is in shared memory).
>
> I suggest that if find_extend_vma is sufficiently expensive, pass a
> pointer to a variable of that type to get_futex_key.  If it is cheap,
> don't do anything.  Pull the new code in get_futex_key into
> futex_register and futex_deregister, optimize out unnecessary code,
> and merge with the rest of the functions.  It'll be much less
> invasive.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-4] futex: robust futex support
  2006-01-19  2:26           ` [robust-futex-4] " david singleton
@ 2006-01-19  5:22             ` Andrew Morton
  2006-01-20  0:47               ` [robust-futex-5] " david singleton
                                 ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2006-01-19  5:22 UTC (permalink / raw)
  To: david singleton; +Cc: drepper, mingo, linux-kernel

david singleton <dsingleton@mvista.com> wrote:
>
> [-ENOCHANGELOG]
> 

> @@ -383,6 +384,7 @@ struct address_space {
>  	spinlock_t		private_lock;	/* for use by the address_space */
>  	struct list_head	private_list;	/* ditto */
>  	struct address_space	*assoc_mapping;	/* ditto */
> +	struct futex_head	*robust_head;	/* list of robust futexes */
>  } __attribute__((aligned(sizeof(long))));

It's worth putting this field inside CONFIG_ROBUST_FUTEX

> +#else
> +# define futex_free_robust_list(a)      do { } while (0)
> +# define exit_futex(b)                  do { } while (0)
> +#define futex_init_inode(a) 		do { } while (0)
> +#endif

Indenting went wonky.

> +/*
> + * Robust futexes provide a locking mechanism that can be shared between
> + * user mode processes. The major difference between robust futexes and
> + * regular futexes is that when the owner of a robust futex dies, the
> + * next task waiting on the futex will be awakened, will get ownership
> + * of the futex lock, and will receive the error status EOWNERDEAD.
> + *
> + * A robust futex is a 32 bit integer stored in user mode shared memory.
> + * Bit 31 indicates that there are tasks waiting on the futex.
> + * Bit 30 indicates that the task that owned the futex has died.
> + * Bit 29 indicates that the futex is not recoverable and cannot be used.
> + * Bits 0-28 are the pid of the task that owns the futex lock, or zero if
> + * the futex is not locked.
> + */

Nice comment!

> +/**
> + * futex_free_robust_list - release the list of registered futexes.
> + * @inode: inode that may be a memory mapped file
> + *
> + * Called from dput() when a dentry reference count reaches zero.
> + * If the dentry is associated with a memory mapped file, then
> + * release the list of registered robust futexes that are contained
> + * in that mapping.
> + */
> +void futex_free_robust_list(struct inode *inode)
> +{
> +	struct address_space *mapping;
> +	struct list_head *head;
> + 	struct futex_robust *this, *next;
> +	struct futex_head *futex_head = NULL;
> +
> +	if (inode == NULL)
> +		return;

Is this test needed?

This function is called when a dentry's refcount falls to zero.  But there
could be other refs to this inode which might get upset at having their
robust futexes thrown away.  Shouldn't this be based on inode destruction
rather than dentry?

> +	mapping = inode->i_mapping;
> +	if (mapping == NULL)
> +		return;

inodes never have a null ->i_mapping

> +	if (mapping->robust_head == NULL)
> +		return;
> +
> +	if (list_empty(&mapping->robust_head->robust_list))
> +		return;
> +
> +	mutex_lock(&mapping->robust_head->robust_mutex);
> +
> +	head = &mapping->robust_head->robust_list;
> +	futex_head = mapping->robust_head;
> +
> +	list_for_each_entry_safe(this, next, head, list) {
> +		list_del(&this->list);
> +		kmem_cache_free(robust_futex_cachep, this);
> +	}

If we're throwing away the entire contents of the list, there's no need to
detach items as we go.

> +/**
> + * get_shared_uaddr - convert a shared futex_key to a user addr.
> + * @key: a futex_key that identifies a futex.
> + * @vma: a vma that may contain the futex
> + *
> + * Shared futex_keys identify a futex that is contained in a vma,
> + * and so may be shared.
> + * Returns zero if futex is not contained in @vma
> + */
> +static unsigned long get_shared_uaddr(union futex_key *key,
> +				      struct vm_area_struct *vma)
> +{
> +	unsigned long uaddr = 0;
> +	unsigned long tmpaddr;
> +	struct address_space *mapping;
> +
> +	mapping = vma->vm_file->f_mapping;
> +	if (key->shared.inode == mapping->host) {
> +		tmpaddr = ((key->shared.pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> +				+ (key->shared.offset & ~0x1)
> +				+ vma->vm_start;
> +		if (tmpaddr >= vma->vm_start && tmpaddr < vma->vm_end)
> +			uaddr = tmpaddr;
> +	}
> +
> +	return uaddr;
> +}

What locking guarantees that vma->vm_file->f_mapping continues to exist? 
Bear in mind that another thread which shares this thread's files can close
fds, unlink files, munmap files, etc.

> +/**
> + * find_owned_futex - find futexes owned by the current task
> + * @tsk: task that is exiting
> + * @vma: vma containing robust futexes
> + * @head: list head for list of robust futexes
> + * @mutex: mutex that protects the list
> + *
> + * Walk the list of registered robust futexes for this @vma,
> + * setting the %FUTEX_OWNER_DIED flag on those futexes owned
> + * by the current, exiting task.
> + */
> +static void
> +find_owned_futex(struct task_struct *tsk, struct vm_area_struct *vma,
> +		 struct list_head *head, struct mutex *mutex)
> +{
> +	struct futex_robust *this, *next;
> + 	unsigned long uaddr;
> +	int value;
> +
> +	mutex_lock(mutex);
> +
> +	list_for_each_entry_safe(this, next, head, list) {
> +
> +		uaddr = get_futex_uaddr(&this->key, vma);
> +		if (uaddr == 0)
> +			continue;
> +
> +		mutex_unlock(mutex);
> +		up_read(&tsk->mm->mmap_sem);
> +		get_user(value, (int __user *)uaddr);
> +		if ((value & FUTEX_PID) == tsk->pid) {
> +			value |= FUTEX_OWNER_DIED;
> +			futex_wake(uaddr, 1);
> +			put_user(value, (int *__user)uaddr);

That's a bit unconventional - normally we'd perform the other-task-visible
write and _then_ wake up the other task.  What prevents the woken task from
waking then seeing the not-yet-written-to value?

> +void exit_futex(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm;
> +	struct list_head *robust;
> +	struct vm_area_struct *vma;
> +	struct mutex *mutex;
> +
> +	mm = current->mm;
> +	if (mm==NULL)
> +		return;
> +
> +	down_read(&mm->mmap_sem);
> +
> +	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
> +		if (vma->vm_file == NULL)
> +			continue;
> +
> +		if (vma->vm_file->f_mapping == NULL)
> +			continue;
> +
> +		if (vma->vm_file->f_mapping->robust_head == NULL)
> +			continue;
> +
> +		robust = &vma->vm_file->f_mapping->robust_head->robust_list;
> +
> +		if (list_empty(robust))
> +			continue;
> +
> +		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
> +
> +		find_owned_futex(tsk, vma, robust, mutex);

The name "find_owned_mutex" is a bit misleading - it implies that it is
some lookup function which has no side-effects.  But find_owned_futex()
actually makes significant state changes.

> +
> +	if (vma->vm_file && vma->vm_file->f_mapping) {
> +		if (vma->vm_file->f_mapping->robust_head == NULL)
> +			init_robust_list(vma->vm_file->f_mapping, file_futex);
> +		else
> +			kmem_cache_free(file_futex_cachep, file_futex);
> +		head = &vma->vm_file->f_mapping->robust_head->robust_list;
> +		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;

The patch in general does an awful lot of these lengthy pointer chases. 
It's more readable to create temporaries to avoid this.  Sometimes it
generates better code, too.

The kmem_cache_free above is a bit sad.  It means that in the common case
we'll allocate a file_futex and then free it again.  It's legal to do
GFP_KERNEL allocations within mmap_sem, so I suggest you switch this to
allocate-only-if-needed.

> +	} else {
> +		ret = -EADDRNOTAVAIL;
> +		kmem_cache_free(robust_futex_cachep, robust);
> +		kmem_cache_free(file_futex_cachep, file_futex);
> +		goto out_unlock;
> +	}

Again, we could have checked whether we needed to allocate these before
allocating them.

> +	if (vma->vm_file && vma->vm_file->f_mapping &&
> +	    vma->vm_file->f_mapping->robust_head) {
> +		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
> +		head = &vma->vm_file->f_mapping->robust_head->robust_list;

Pointer chasing, again...

> +
> +	list_for_each_entry_safe(this, next, head, list) {
> +		if (match_futex (&this->key, &key)) {
                               ^
                                A stray space got in there.

>  
> +#ifdef CONFIG_ROBUST_FUTEX
> +	robust_futex_cachep = kmem_cache_create("robust_futex", sizeof(struct futex_robust), 0, 0, NULL, NULL);
> +	file_futex_cachep = kmem_cache_create("file_futex", sizeof(struct futex_head), 0, 0, NULL, NULL);
> +#endif

A bit of 80-column wrapping needed there please.

Are futex_heads likely to be allocated in sufficient volume to justify
their own slab cache, rather than using kmalloc()?  The speed is the same -
if anything, kmalloc() will be faster because its text and data are more
likely to be in CPU cache.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [robust-futex-5] futex: robust futex support
  2006-01-19  5:22             ` Andrew Morton
@ 2006-01-20  0:47               ` david singleton
  2006-01-20 17:41               ` [robust-futex-4] " Ingo Oeser
  2006-01-23 18:20               ` Todd Kneisel
  2 siblings, 0 replies; 15+ messages in thread
From: david singleton @ 2006-01-20  0:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, linux-kernel, drepper

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

         Andrew,
		I have incorporated the changes you suggested, with the exception of 
the check
	for a NULL mm in exit_futex.  Apparently there is a task that exits in 
the boot path
	that has a null mm and causes the kernel to crash on boot without this 
check.

  fs/inode.c                    |    2
  include/linux/fs.h       |    4
  include/linux/futex.h |   33 ++++
  init/Kconfig                 |    9 +
  kernel/exit.c               |    2
  kernel/futex.c             |  398 
++++++++++++++++++++++++++++++++++++++++++++++++++
  6 files changed, 448 insertions(+)

David


[-- Attachment #2: robust-futex-5 --]
[-- Type: application/octet-stream, Size: 16275 bytes --]

Signed-off-by: David Singleton <dsingleton@mvista.com>

	Incorporated changes suggested by Andrew Morton

 fs/inode.c            |    2 
 include/linux/fs.h    |    4 
 include/linux/futex.h |   33 ++++
 init/Kconfig          |    9 +
 kernel/exit.c         |    2 
 kernel/futex.c        |  398 ++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 448 insertions(+)

Index: linux-2.6.15/include/linux/fs.h
===================================================================
--- linux-2.6.15.orig/include/linux/fs.h
+++ linux-2.6.15/include/linux/fs.h
@@ -9,6 +9,7 @@
 #include <linux/config.h>
 #include <linux/limits.h>
 #include <linux/ioctl.h>
+#include <linux/futex.h>
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -383,6 +384,9 @@ struct address_space {
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	struct address_space	*assoc_mapping;	/* ditto */
+#ifdef CONFIG_ROBUST_FUTEX
+	struct futex_head	*robust_head;	/* list of robust futexes */
+#endif
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
Index: linux-2.6.15/include/linux/futex.h
===================================================================
--- linux-2.6.15.orig/include/linux/futex.h
+++ linux-2.6.15/include/linux/futex.h
@@ -10,6 +10,38 @@
 #define FUTEX_REQUEUE		3
 #define FUTEX_CMP_REQUEUE	4
 #define FUTEX_WAKE_OP		5
+#define FUTEX_REGISTER          6
+#define FUTEX_DEREGISTER        7
+#define FUTEX_RECOVER           8
+
+#define FUTEX_WAITERS				0x80000000
+#define FUTEX_OWNER_DIED			0x40000000
+#define FUTEX_NOT_RECOVERABLE			0x20000000
+#define FUTEX_FLAGS (FUTEX_WAITERS | FUTEX_OWNER_DIED | FUTEX_NOT_RECOVERABLE)
+#define FUTEX_PID                             ~(FUTEX_FLAGS)
+
+#define FUTEX_ATTR_SHARED                       0x10000000
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#ifdef CONFIG_ROBUST_FUTEX
+
+struct futex_head {
+	struct list_head robust_list;
+	struct mutex robust_mutex;
+};
+
+struct inode;
+struct task_struct;
+extern void futex_free_robust_list(struct inode *inode);
+extern void exit_futex(struct task_struct *tsk);
+#else
+# define futex_free_robust_list(a)	do { } while (0)
+# define exit_futex(b)			do { } while (0)
+#define futex_init_inode(a)		do { } while (0)
+#endif
 
 long do_futex(unsigned long uaddr, int op, int val,
 		unsigned long timeout, unsigned long uaddr2, int val2,
@@ -41,3 +73,4 @@ long do_futex(unsigned long uaddr, int o
    | ((oparg & 0xfff) << 12) | (cmparg & 0xfff))
 
 #endif
+#endif
Index: linux-2.6.15/kernel/exit.c
===================================================================
--- linux-2.6.15.orig/kernel/exit.c
+++ linux-2.6.15/kernel/exit.c
@@ -31,6 +31,7 @@
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
+#include <linux/futex.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -847,6 +848,7 @@ fastcall NORET_TYPE void do_exit(long co
 		exit_itimers(tsk->signal);
 		acct_process(code);
 	}
+	exit_futex(tsk);
 	exit_mm(tsk);
 
 	exit_sem(tsk);
Index: linux-2.6.15/kernel/futex.c
===================================================================
--- linux-2.6.15.orig/kernel/futex.c
+++ linux-2.6.15/kernel/futex.c
@@ -8,6 +8,9 @@
  *  Removed page pinning, fix privately mapped COW pages and other cleanups
  *  (C) Copyright 2003, 2004 Jamie Lokier
  *
+ *  Robust futexes added by Todd Kneisel
+ *  (C) Copyright 2005, Bull HN.
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -40,7 +43,9 @@
 #include <linux/pagemap.h>
 #include <linux/syscalls.h>
 #include <linux/signal.h>
+#include <linux/mutex.h>
 #include <asm/futex.h>
+#include <asm/uaccess.h>
 
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
@@ -829,6 +834,382 @@ error:
 	goto out;
 }
 
+#ifdef CONFIG_ROBUST_FUTEX
+/*
+ * Robust futexes provide a locking mechanism that can be shared between
+ * user mode processes. The major difference between robust futexes and
+ * regular futexes is that when the owner of a robust futex dies, the
+ * next task waiting on the futex will be awakened, will get ownership
+ * of the futex lock, and will receive the error status EOWNERDEAD.
+ *
+ * A robust futex is a 32 bit integer stored in user mode shared memory.
+ * Bit 31 indicates that there are tasks waiting on the futex.
+ * Bit 30 indicates that the task that owned the futex has died.
+ * Bit 29 indicates that the futex is not recoverable and cannot be used.
+ * Bits 0-28 are the pid of the task that owns the futex lock, or zero if
+ * the futex is not locked.
+ */
+
+static kmem_cache_t *robust_futex_cachep;
+static kmem_cache_t *file_futex_cachep;
+/*
+ * Used to track registered robust futexes. Attached to linked list in inodes.
+ */
+struct futex_robust {
+	struct list_head list;
+	union futex_key key;
+};
+
+/**
+ * futex_free_robust_list - release the list of registered futexes.
+ * @inode: inode that may be a memory mapped file
+ *
+ * Called from dput() when a dentry reference count reaches zero.
+ * If the dentry is associated with a memory mapped file, then
+ * release the list of registered robust futexes that are contained
+ * in that mapping.
+ */
+void futex_free_robust_list(struct inode *inode)
+{
+	struct address_space *mapping = inode->i_mapping;
+	struct list_head *head;
+ 	struct futex_robust *this, *next;
+	struct futex_head *futex_head = NULL;
+
+	futex_head = mapping->robust_head;
+
+	if (futex_head == NULL)
+		return;
+
+	if (list_empty(&futex_head->robust_list))
+		return;
+
+	mutex_lock(&futex_head->robust_mutex);
+
+	head = &futex_head->robust_list;
+
+	list_for_each_entry_safe(this, next, head, list) {
+		kmem_cache_free(robust_futex_cachep, this);
+	}
+
+	mutex_unlock(&futex_head->robust_mutex);
+
+	kmem_cache_free(file_futex_cachep, futex_head);
+	mapping->robust_head = NULL;
+
+	return;
+}
+
+/**
+ * get_private_uaddr - convert a private futex_key to a user addr
+ * @key: the futex_key that identifies a futex.
+ *
+ * Private futex_keys identify a futex that is in non-shared memory.
+ * Robust futexes should never result in private futex_keys, but keep
+ * this code for completeness.
+ * Returns zero if futex is not contained in current task's mm
+ */
+static unsigned long get_private_uaddr(union futex_key *key)
+{
+	unsigned long uaddr = 0;
+
+	if (key->private.mm == current->mm)
+		uaddr = key->private.uaddr;
+	return uaddr;
+}
+
+/**
+ * get_shared_uaddr - convert a shared futex_key to a user addr.
+ * @key: a futex_key that identifies a futex.
+ * @vma: a vma that may contain the futex
+ *
+ * Shared futex_keys identify a futex that is contained in a vma,
+ * and so may be shared.
+ * Returns zero if futex is not contained in @vma
+ */
+static unsigned long get_shared_uaddr(union futex_key *key,
+				      struct vm_area_struct *vma)
+{
+	unsigned long uaddr = 0;
+	unsigned long tmpaddr;
+	struct address_space *mapping;
+
+	mapping = vma->vm_file->f_mapping;
+	if (key->shared.inode == mapping->host) {
+		tmpaddr = ((key->shared.pgoff - vma->vm_pgoff) << PAGE_SHIFT)
+				+ (key->shared.offset & ~0x1)
+				+ vma->vm_start;
+		if (tmpaddr >= vma->vm_start && tmpaddr < vma->vm_end)
+			uaddr = tmpaddr;
+	}
+
+	return uaddr;
+}
+
+/**
+ * get_futex_uaddr - convert a futex_key to a user addr.
+ * @key: futex_key that identifies a futex
+ * @vma: vma that may contain the futex
+ *
+ * Converts both shared and private futex_keys.
+ * Returns zero if futex is not contained in @vma or in the current
+ * task's mm.
+ */
+static unsigned long get_futex_uaddr(union futex_key *key,
+				     struct vm_area_struct *vma)
+{
+	unsigned long uaddr;
+
+	if ((key->both.offset & 0x1) == 0)
+		uaddr = get_private_uaddr(key);
+	else
+		uaddr = get_shared_uaddr(key,vma);
+
+	return uaddr;
+}
+
+/**
+ * release_owned_futex - find futexes owned by the current task
+ * @tsk: task that is exiting
+ * @vma: vma containing robust futexes
+ * @head: list head for list of robust futexes
+ * @mutex: mutex that protects the list
+ *
+ * Walk the list of registered robust futexes for this @vma,
+ * setting the %FUTEX_OWNER_DIED flag on those futexes owned
+ * by the current, exiting task.
+ */
+static void
+release_owned_futex(struct task_struct *tsk, struct vm_area_struct *vma,
+		    struct futex_head *robust)
+{
+	struct futex_robust *this, *next;
+	struct mutex *mutex = &robust->robust_mutex;
+	struct list_head *head = &robust->robust_list;
+ 	unsigned long uaddr;
+	int value;
+
+	mutex_lock(mutex);
+
+	list_for_each_entry_safe(this, next, head, list) {
+
+		uaddr = get_futex_uaddr(&this->key, vma);
+		if (uaddr == 0)
+			continue;
+
+		mutex_unlock(mutex);
+		up_read(&tsk->mm->mmap_sem);
+		get_user(value, (int __user *)uaddr);
+		if ((value & FUTEX_PID) == tsk->pid) {
+			value |= FUTEX_OWNER_DIED;
+			put_user(value, (int *__user)uaddr);
+			futex_wake(uaddr, 1);
+		}
+		down_read(&tsk->mm->mmap_sem);
+		mutex_lock(mutex);
+	}
+
+	mutex_unlock(mutex);
+}
+
+/**
+ * exit_futex - futex processing when a task exits.
+ *
+ * Called from do_exit() when a task exits. Mark all robust futexes
+ * that are owned by the current terminating task as %FUTEX_OWNER_DIED.
+ */
+
+void exit_futex(struct task_struct *tsk)
+{
+	struct mm_struct *mm = tsk->mm;
+	struct list_head *robust;
+	struct futex_head *head;
+	struct vm_area_struct *vma;
+	struct address_space *addr;
+
+	if (mm == NULL)
+		return;
+
+	down_read(&mm->mmap_sem);
+
+	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
+		if (vma->vm_file == NULL)
+			continue;
+
+		addr = vma->vm_file->f_mapping;
+		if (addr == NULL)
+			continue;
+
+		if (addr->robust_head == NULL)
+			continue;
+
+		head = addr->robust_head;
+		robust = &head->robust_list;
+
+		if (list_empty(robust))
+			continue;
+
+		release_owned_futex(tsk, vma, head);
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+static void init_robust_list(struct address_space *f, struct futex_head *head)
+{
+	f->robust_head = head;
+	INIT_LIST_HEAD(&f->robust_head->robust_list);
+	mutex_init(&f->robust_head->robust_mutex);
+}
+
+/**
+ * futex_register - Record the existence of a robust futex in a vma.
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is created. Looks up the vma that contains the futex and
+ * adds an entry to the list of all robust futexes in the vma.
+ */
+static int futex_register(unsigned long uaddr, unsigned int attr)
+{
+	int ret = 0;
+	struct futex_robust *robust;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	struct address_space *addr = NULL;
+	struct list_head *head = NULL;
+	struct mutex *mutex = NULL;
+	struct futex_head *file_futex = NULL;
+
+	if ((attr & FUTEX_ATTR_SHARED) == 0)
+		return -EADDRNOTAVAIL;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, uaddr);
+
+	if (vma->vm_file) {
+		addr = vma->vm_file->f_mapping;
+		if (addr == NULL) {
+			ret = -EADDRNOTAVAIL;
+			goto out;
+		}
+		robust = kmem_cache_alloc(robust_futex_cachep, GFP_KERNEL);
+		if (!robust) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (addr->robust_head == NULL) {
+			file_futex = kmem_cache_alloc(file_futex_cachep, GFP_KERNEL);
+			if (!file_futex) {
+				kmem_cache_free(robust_futex_cachep, robust);
+				ret = -ENOMEM;
+				goto out;
+			}
+			init_robust_list(addr, file_futex);
+		}
+		head = &addr->robust_head->robust_list;
+		mutex = &addr->robust_head->robust_mutex;
+	} else {
+		ret = -EADDRNOTAVAIL;
+		goto out;
+	}
+
+	mutex_lock(mutex);
+	list_add_tail(&robust->list, head);
+	mutex_unlock(mutex);
+
+out:
+	up_read(&mm->mmap_sem);
+
+	return ret;
+}
+
+/**
+ * futex_deregister - Delete robust futex registration from a vma
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is destroyed. Looks up the vma that contains the futex and
+ * removes the futex entry from the list of all robust futexes in
+ * the vma.
+ */
+static int futex_deregister(unsigned long uaddr)
+{
+	union futex_key key;
+	struct mm_struct *mm = current->mm;
+	struct list_head *head = NULL;
+	struct mutex *mutex = NULL;
+	struct vm_area_struct *vma;
+	struct address_space *addr;
+	struct futex_robust *this, *next;
+	int ret;
+
+	down_read(&mm->mmap_sem);
+
+	ret = get_futex_key(uaddr, &key);
+	if (unlikely(ret != 0))
+		goto out;
+	vma = find_extend_vma(mm, uaddr);
+	if (vma->vm_file) {
+		addr = vma->vm_file->f_mapping;
+		if (addr && addr->robust_head) {
+			mutex = &addr->robust_head->robust_mutex;
+			head = &addr->robust_head->robust_list;
+		}
+	}
+	if (head == NULL) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	mutex_lock(mutex);
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (match_futex(&this->key, &key)) {
+			futex_wake(uaddr, 1);
+			list_del(&this->list);
+			kmem_cache_free(robust_futex_cachep, this);
+			break;
+		}
+	}
+
+	mutex_unlock(mutex);
+out:
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+
+/**
+ * futex_recover - Recover a futex after its owner died
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall).
+ * When a task dies while owning a robust futex, the futex is
+ * marked with %FUTEX_OWNER_DIED and ownership is transferred
+ * to the next waiting task. That task can choose to restore
+ * the futex to a useful state by calling this function.
+ */
+static int futex_recover(unsigned long uaddr)
+{
+	int ret = 0;
+	int value = 0;
+	union futex_key key;
+
+	down_read(&current->mm->mmap_sem);
+	ret = get_futex_key(uaddr, &key);
+	up_read(&current->mm->mmap_sem);
+	if (ret != 0)
+		return ret;
+
+	if (get_user(value, (int *__user)uaddr))
+		return ret;
+
+	value &= ~FUTEX_OWNER_DIED;
+	return put_user(value, (int *__user)uaddr);
+}
+#endif
+
 long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,
 		unsigned long uaddr2, int val2, int val3)
 {
@@ -854,6 +1235,17 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE_OP:
 		ret = futex_wake_op(uaddr, uaddr2, val, val2, val3);
 		break;
+#ifdef CONFIG_ROBUST_FUTEX
+	case FUTEX_REGISTER:
+		ret = futex_register(uaddr, val);
+		break;
+	case FUTEX_DEREGISTER:
+		ret = futex_deregister(uaddr);
+		break;
+	case FUTEX_RECOVER:
+		ret = futex_recover(uaddr);
+		break;
+#endif
 	default:
 		ret = -ENOSYS;
 	}
@@ -901,6 +1293,12 @@ static int __init init(void)
 {
 	unsigned int i;
 
+#ifdef CONFIG_ROBUST_FUTEX
+	robust_futex_cachep = kmem_cache_create("robust_futex",
+			       sizeof(struct futex_robust), 0, 0, NULL, NULL);
+	file_futex_cachep = kmem_cache_create("file_futex",
+			       sizeof(struct futex_head), 0, 0, NULL, NULL);
+#endif
 	register_filesystem(&futex_fs_type);
 	futex_mnt = kern_mount(&futex_fs_type);
 
Index: linux-2.6.15/init/Kconfig
===================================================================
--- linux-2.6.15.orig/init/Kconfig
+++ linux-2.6.15/init/Kconfig
@@ -348,6 +348,15 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config ROBUST_FUTEX
+	bool "Enable robust futex support"
+	depends on FUTEX
+	default y
+	help
+	  Enable this option if you want to use robust user space mutexes.
+	  Enabling this option slows down the exit path of the kernel for
+	  all processes.  Robust futexes will run glibc-based applications correctly.
+
 config EPOLL
 	bool "Enable eventpoll support" if EMBEDDED
 	default y
Index: linux-2.6.15/fs/inode.c
===================================================================
--- linux-2.6.15.orig/fs/inode.c
+++ linux-2.6.15/fs/inode.c
@@ -23,6 +23,7 @@
 #include <linux/bootmem.h>
 #include <linux/inotify.h>
 #include <linux/mount.h>
+#include <linux/futex.h>
 
 /*
  * This is needed for the following functions:
@@ -175,6 +176,7 @@ void destroy_inode(struct inode *inode) 
 	if (inode_has_buffers(inode))
 		BUG();
 	security_inode_free(inode);
+	futex_free_robust_list(inode);
 	if (inode->i_sb->s_op->destroy_inode)
 		inode->i_sb->s_op->destroy_inode(inode);
 	else

[-- Attachment #3: Type: text/plain, Size: 9008 bytes --]




On Jan 18, 2006, at 9:22 PM, Andrew Morton wrote:

> david singleton <dsingleton@mvista.com> wrote:
>>
>> [-ENOCHANGELOG]
>>
>
>> @@ -383,6 +384,7 @@ struct address_space {
>>  	spinlock_t		private_lock;	/* for use by the address_space */
>>  	struct list_head	private_list;	/* ditto */
>>  	struct address_space	*assoc_mapping;	/* ditto */
>> +	struct futex_head	*robust_head;	/* list of robust futexes */
>>  } __attribute__((aligned(sizeof(long))));
>
> It's worth putting this field inside CONFIG_ROBUST_FUTEX
>
>> +#else
>> +# define futex_free_robust_list(a)      do { } while (0)
>> +# define exit_futex(b)                  do { } while (0)
>> +#define futex_init_inode(a) 		do { } while (0)
>> +#endif
>
> Indenting went wonky.
>
>> +/*
>> + * Robust futexes provide a locking mechanism that can be shared 
>> between
>> + * user mode processes. The major difference between robust futexes 
>> and
>> + * regular futexes is that when the owner of a robust futex dies, the
>> + * next task waiting on the futex will be awakened, will get 
>> ownership
>> + * of the futex lock, and will receive the error status EOWNERDEAD.
>> + *
>> + * A robust futex is a 32 bit integer stored in user mode shared 
>> memory.
>> + * Bit 31 indicates that there are tasks waiting on the futex.
>> + * Bit 30 indicates that the task that owned the futex has died.
>> + * Bit 29 indicates that the futex is not recoverable and cannot be 
>> used.
>> + * Bits 0-28 are the pid of the task that owns the futex lock, or 
>> zero if
>> + * the futex is not locked.
>> + */
>
> Nice comment!
>
>> +/**
>> + * futex_free_robust_list - release the list of registered futexes.
>> + * @inode: inode that may be a memory mapped file
>> + *
>> + * Called from dput() when a dentry reference count reaches zero.
>> + * If the dentry is associated with a memory mapped file, then
>> + * release the list of registered robust futexes that are contained
>> + * in that mapping.
>> + */
>> +void futex_free_robust_list(struct inode *inode)
>> +{
>> +	struct address_space *mapping;
>> +	struct list_head *head;
>> + 	struct futex_robust *this, *next;
>> +	struct futex_head *futex_head = NULL;
>> +
>> +	if (inode == NULL)
>> +		return;
>
> Is this test needed?
>
> This function is called when a dentry's refcount falls to zero.  But 
> there
> could be other refs to this inode which might get upset at having their
> robust futexes thrown away.  Shouldn't this be based on inode 
> destruction
> rather than dentry?
>
>> +	mapping = inode->i_mapping;
>> +	if (mapping == NULL)
>> +		return;
>
> inodes never have a null ->i_mapping
>
>> +	if (mapping->robust_head == NULL)
>> +		return;
>> +
>> +	if (list_empty(&mapping->robust_head->robust_list))
>> +		return;
>> +
>> +	mutex_lock(&mapping->robust_head->robust_mutex);
>> +
>> +	head = &mapping->robust_head->robust_list;
>> +	futex_head = mapping->robust_head;
>> +
>> +	list_for_each_entry_safe(this, next, head, list) {
>> +		list_del(&this->list);
>> +		kmem_cache_free(robust_futex_cachep, this);
>> +	}
>
> If we're throwing away the entire contents of the list, there's no 
> need to
> detach items as we go.
>
>> +/**
>> + * get_shared_uaddr - convert a shared futex_key to a user addr.
>> + * @key: a futex_key that identifies a futex.
>> + * @vma: a vma that may contain the futex
>> + *
>> + * Shared futex_keys identify a futex that is contained in a vma,
>> + * and so may be shared.
>> + * Returns zero if futex is not contained in @vma
>> + */
>> +static unsigned long get_shared_uaddr(union futex_key *key,
>> +				      struct vm_area_struct *vma)
>> +{
>> +	unsigned long uaddr = 0;
>> +	unsigned long tmpaddr;
>> +	struct address_space *mapping;
>> +
>> +	mapping = vma->vm_file->f_mapping;
>> +	if (key->shared.inode == mapping->host) {
>> +		tmpaddr = ((key->shared.pgoff - vma->vm_pgoff) << PAGE_SHIFT)
>> +				+ (key->shared.offset & ~0x1)
>> +				+ vma->vm_start;
>> +		if (tmpaddr >= vma->vm_start && tmpaddr < vma->vm_end)
>> +			uaddr = tmpaddr;
>> +	}
>> +
>> +	return uaddr;
>> +}
>
> What locking guarantees that vma->vm_file->f_mapping continues to 
> exist?
> Bear in mind that another thread which shares this thread's files can 
> close
> fds, unlink files, munmap files, etc.

This is called under the mmap_sem.

>
>> +/**
>> + * find_owned_futex - find futexes owned by the current task
>> + * @tsk: task that is exiting
>> + * @vma: vma containing robust futexes
>> + * @head: list head for list of robust futexes
>> + * @mutex: mutex that protects the list
>> + *
>> + * Walk the list of registered robust futexes for this @vma,
>> + * setting the %FUTEX_OWNER_DIED flag on those futexes owned
>> + * by the current, exiting task.
>> + */
>> +static void
>> +find_owned_futex(struct task_struct *tsk, struct vm_area_struct *vma,
>> +		 struct list_head *head, struct mutex *mutex)
>> +{
>> +	struct futex_robust *this, *next;
>> + 	unsigned long uaddr;
>> +	int value;
>> +
>> +	mutex_lock(mutex);
>> +
>> +	list_for_each_entry_safe(this, next, head, list) {
>> +
>> +		uaddr = get_futex_uaddr(&this->key, vma);
>> +		if (uaddr == 0)
>> +			continue;
>> +
>> +		mutex_unlock(mutex);
>> +		up_read(&tsk->mm->mmap_sem);
>> +		get_user(value, (int __user *)uaddr);
>> +		if ((value & FUTEX_PID) == tsk->pid) {
>> +			value |= FUTEX_OWNER_DIED;
>> +			futex_wake(uaddr, 1);
>> +			put_user(value, (int *__user)uaddr);
>
> That's a bit unconventional - normally we'd perform the 
> other-task-visible
> write and _then_ wake up the other task.  What prevents the woken task 
> from
> waking then seeing the not-yet-written-to value?
>
>> +void exit_futex(struct task_struct *tsk)
>> +{
>> +	struct mm_struct *mm;
>> +	struct list_head *robust;
>> +	struct vm_area_struct *vma;
>> +	struct mutex *mutex;
>> +
>> +	mm = current->mm;
>> +	if (mm==NULL)
>> +		return;
>> +
>> +	down_read(&mm->mmap_sem);
>> +
>> +	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
>> +		if (vma->vm_file == NULL)
>> +			continue;
>> +
>> +		if (vma->vm_file->f_mapping == NULL)
>> +			continue;
>> +
>> +		if (vma->vm_file->f_mapping->robust_head == NULL)
>> +			continue;
>> +
>> +		robust = &vma->vm_file->f_mapping->robust_head->robust_list;
>> +
>> +		if (list_empty(robust))
>> +			continue;
>> +
>> +		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
>> +
>> +		find_owned_futex(tsk, vma, robust, mutex);
>
> The name "find_owned_mutex" is a bit misleading - it implies that it is
> some lookup function which has no side-effects.  But find_owned_futex()
> actually makes significant state changes.
>
>> +
>> +	if (vma->vm_file && vma->vm_file->f_mapping) {
>> +		if (vma->vm_file->f_mapping->robust_head == NULL)
>> +			init_robust_list(vma->vm_file->f_mapping, file_futex);
>> +		else
>> +			kmem_cache_free(file_futex_cachep, file_futex);
>> +		head = &vma->vm_file->f_mapping->robust_head->robust_list;
>> +		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
>
> The patch in general does an awful lot of these lengthy pointer chases.
> It's more readable to create temporaries to avoid this.  Sometimes it
> generates better code, too.
>
> The kmem_cache_free above is a bit sad.  It means that in the common 
> case
> we'll allocate a file_futex and then free it again.  It's legal to do
> GFP_KERNEL allocations within mmap_sem, so I suggest you switch this to
> allocate-only-if-needed.
>
>> +	} else {
>> +		ret = -EADDRNOTAVAIL;
>> +		kmem_cache_free(robust_futex_cachep, robust);
>> +		kmem_cache_free(file_futex_cachep, file_futex);
>> +		goto out_unlock;
>> +	}
>
> Again, we could have checked whether we needed to allocate these before
> allocating them.
>
>> +	if (vma->vm_file && vma->vm_file->f_mapping &&
>> +	    vma->vm_file->f_mapping->robust_head) {
>> +		mutex = &vma->vm_file->f_mapping->robust_head->robust_mutex;
>> +		head = &vma->vm_file->f_mapping->robust_head->robust_list;
>
> Pointer chasing, again...
>
>> +
>> +	list_for_each_entry_safe(this, next, head, list) {
>> +		if (match_futex (&this->key, &key)) {
>                                ^
>                                 A stray space got in there.
>
>>
>> +#ifdef CONFIG_ROBUST_FUTEX
>> +	robust_futex_cachep = kmem_cache_create("robust_futex", 
>> sizeof(struct futex_robust), 0, 0, NULL, NULL);
>> +	file_futex_cachep = kmem_cache_create("file_futex", sizeof(struct 
>> futex_head), 0, 0, NULL, NULL);
>> +#endif
>
> A bit of 80-column wrapping needed there please.
>
> Are futex_heads likely to be allocated in sufficient volume to justify
> their own slab cache, rather than using kmalloc()?  The speed is the 
> same -
> if anything, kmalloc() will be faster because its text and data are 
> more
> likely to be in CPU cache.

My tests typically use a slab to a slab and a half of futex_heads.  In 
the real world
I honestly don't know how many will be allocated.  Can we leave it in 
it's own
cache for testing?  It sure helps debugging if the entries in the 
futex_head cache match
exactly what the test application is using.


>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-4] futex: robust futex support
  2006-01-19  5:22             ` Andrew Morton
  2006-01-20  0:47               ` [robust-futex-5] " david singleton
@ 2006-01-20 17:41               ` Ingo Oeser
  2006-01-20 22:18                 ` Andrew Morton
  2006-01-23 18:20               ` Todd Kneisel
  2 siblings, 1 reply; 15+ messages in thread
From: Ingo Oeser @ 2006-01-20 17:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, david singleton, drepper, mingo

[-- Attachment #1: Type: text/plain, Size: 1604 bytes --]

On Thursday 19 January 2006 06:22, Andrew Morton wrote:
> david singleton <dsingleton@mvista.com> wrote:
> > +	if (mapping->robust_head == NULL)
> > +		return;
> > +
> > +	if (list_empty(&mapping->robust_head->robust_list))
> > +		return;
> > +
> > +	mutex_lock(&mapping->robust_head->robust_mutex);
> > +
> > +	head = &mapping->robust_head->robust_list;
> > +	futex_head = mapping->robust_head;
> > +
> > +	list_for_each_entry_safe(this, next, head, list) {
> > +		list_del(&this->list);
> > +		kmem_cache_free(robust_futex_cachep, this);
> > +	}
> 
> If we're throwing away the entire contents of the list, there's no need to
> detach items as we go.
 
Couldn't even detach the list elements first by

list_splice_init(&mapping->robust_head->robust_list, head);

and free the list from "head" after releasing the mutex? 
This would reduce lock contention, no?

> > +#ifdef CONFIG_ROBUST_FUTEX
> > +	robust_futex_cachep = kmem_cache_create("robust_futex", sizeof(struct futex_robust), 0, 0, NULL, NULL);
> > +	file_futex_cachep = kmem_cache_create("file_futex", sizeof(struct futex_head), 0, 0, NULL, NULL);
> > +#endif
> 
> A bit of 80-column wrapping needed there please.
> 
> Are futex_heads likely to be allocated in sufficient volume to justify
> their own slab cache, rather than using kmalloc()?  The speed is the same -
> if anything, kmalloc() will be faster because its text and data are more
> likely to be in CPU cache.
 
The goal here was to do cheap futex accounting, as described in the 
documentation to this patch.


Regards

Ingo Oeser


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-4] futex: robust futex support
  2006-01-20 17:41               ` [robust-futex-4] " Ingo Oeser
@ 2006-01-20 22:18                 ` Andrew Morton
  2006-01-21  2:30                   ` david singleton
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2006-01-20 22:18 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-kernel, dsingleton, drepper, mingo

Ingo Oeser <ioe-lkml@rameria.de> wrote:
>
> > > +	list_for_each_entry_safe(this, next, head, list) {
> > > +		list_del(&this->list);
> > > +		kmem_cache_free(robust_futex_cachep, this);
> > > +	}
> > 
> > If we're throwing away the entire contents of the list, there's no need to
> > detach items as we go.
>  
> Couldn't even detach the list elements first by
> 
> list_splice_init(&mapping->robust_head->robust_list, head);
> 
> and free the list from "head" after releasing the mutex? 
> This would reduce lock contention, no?

Yes, it would reduce lock contention nicely.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-4] futex: robust futex support
  2006-01-20 22:18                 ` Andrew Morton
@ 2006-01-21  2:30                   ` david singleton
  0 siblings, 0 replies; 15+ messages in thread
From: david singleton @ 2006-01-21  2:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Oeser, mingo, linux-kernel, drepper

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]


On Jan 20, 2006, at 2:18 PM, Andrew Morton wrote:

> Ingo Oeser <ioe-lkml@rameria.de> wrote:
>>
>>>> +	list_for_each_entry_safe(this, next, head, list) {
>>>> +		list_del(&this->list);
>>>> +		kmem_cache_free(robust_futex_cachep, this);
>>>> +	}
>>>
>>> If we're throwing away the entire contents of the list, there's no 
>>> need to
>>> detach items as we go.
>>
>> Couldn't even detach the list elements first by
>>
>> list_splice_init(&mapping->robust_head->robust_list, head);
>>
>> and free the list from "head" after releasing the mutex?
>> This would reduce lock contention, no?
>
> Yes, it would reduce lock contention nicely.
>

         Incorporated changes suggested by Andrew Morton and Ingo Oeser.

  fs/inode.c            |    2
  include/linux/fs.h    |    4
  include/linux/futex.h |   33 ++++
  init/Kconfig          |    9 +
  kernel/exit.c         |    2
  kernel/futex.c        |  399 
++++++++++++++++++++++++++++++++++++++++++++++++++
  6 files changed, 449 insertions(+)



David




[-- Attachment #2: robust-futex-6 --]
[-- Type: application/octet-stream, Size: 16363 bytes --]

Signed-off-by: David Singleton <dsingleton@mvista.com>

	Incorporated changes suggested by Andrew Morton and Ingo Oeser.

 fs/inode.c            |    2 
 include/linux/fs.h    |    4 
 include/linux/futex.h |   33 ++++
 init/Kconfig          |    9 +
 kernel/exit.c         |    2 
 kernel/futex.c        |  401 ++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 451 insertions(+)

Index: linux-2.6.15/include/linux/fs.h
===================================================================
--- linux-2.6.15.orig/include/linux/fs.h
+++ linux-2.6.15/include/linux/fs.h
@@ -9,6 +9,7 @@
 #include <linux/config.h>
 #include <linux/limits.h>
 #include <linux/ioctl.h>
+#include <linux/futex.h>
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -383,6 +384,9 @@ struct address_space {
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	struct address_space	*assoc_mapping;	/* ditto */
+#ifdef CONFIG_ROBUST_FUTEX
+	struct futex_head	*robust_head;	/* list of robust futexes */
+#endif
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
Index: linux-2.6.15/include/linux/futex.h
===================================================================
--- linux-2.6.15.orig/include/linux/futex.h
+++ linux-2.6.15/include/linux/futex.h
@@ -10,6 +10,38 @@
 #define FUTEX_REQUEUE		3
 #define FUTEX_CMP_REQUEUE	4
 #define FUTEX_WAKE_OP		5
+#define FUTEX_REGISTER          6
+#define FUTEX_DEREGISTER        7
+#define FUTEX_RECOVER           8
+
+#define FUTEX_WAITERS				0x80000000
+#define FUTEX_OWNER_DIED			0x40000000
+#define FUTEX_NOT_RECOVERABLE			0x20000000
+#define FUTEX_FLAGS (FUTEX_WAITERS | FUTEX_OWNER_DIED | FUTEX_NOT_RECOVERABLE)
+#define FUTEX_PID                             ~(FUTEX_FLAGS)
+
+#define FUTEX_ATTR_SHARED                       0x10000000
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#ifdef CONFIG_ROBUST_FUTEX
+
+struct futex_head {
+	struct list_head robust_list;
+	struct mutex robust_mutex;
+};
+
+struct inode;
+struct task_struct;
+extern void futex_free_robust_list(struct inode *inode);
+extern void exit_futex(struct task_struct *tsk);
+#else
+# define futex_free_robust_list(a)	do { } while (0)
+# define exit_futex(b)			do { } while (0)
+#define futex_init_inode(a)		do { } while (0)
+#endif
 
 long do_futex(unsigned long uaddr, int op, int val,
 		unsigned long timeout, unsigned long uaddr2, int val2,
@@ -41,3 +73,4 @@ long do_futex(unsigned long uaddr, int o
    | ((oparg & 0xfff) << 12) | (cmparg & 0xfff))
 
 #endif
+#endif
Index: linux-2.6.15/kernel/exit.c
===================================================================
--- linux-2.6.15.orig/kernel/exit.c
+++ linux-2.6.15/kernel/exit.c
@@ -31,6 +31,7 @@
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/mutex.h>
+#include <linux/futex.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -847,6 +848,7 @@ fastcall NORET_TYPE void do_exit(long co
 		exit_itimers(tsk->signal);
 		acct_process(code);
 	}
+	exit_futex(tsk);
 	exit_mm(tsk);
 
 	exit_sem(tsk);
Index: linux-2.6.15/kernel/futex.c
===================================================================
--- linux-2.6.15.orig/kernel/futex.c
+++ linux-2.6.15/kernel/futex.c
@@ -8,6 +8,9 @@
  *  Removed page pinning, fix privately mapped COW pages and other cleanups
  *  (C) Copyright 2003, 2004 Jamie Lokier
  *
+ *  Robust futexes added by Todd Kneisel
+ *  (C) Copyright 2005, Bull HN.
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -40,7 +43,9 @@
 #include <linux/pagemap.h>
 #include <linux/syscalls.h>
 #include <linux/signal.h>
+#include <linux/mutex.h>
 #include <asm/futex.h>
+#include <asm/uaccess.h>
 
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
@@ -829,6 +834,385 @@ error:
 	goto out;
 }
 
+#ifdef CONFIG_ROBUST_FUTEX
+/*
+ * Robust futexes provide a locking mechanism that can be shared between
+ * user mode processes. The major difference between robust futexes and
+ * regular futexes is that when the owner of a robust futex dies, the
+ * next task waiting on the futex will be awakened, will get ownership
+ * of the futex lock, and will receive the error status EOWNERDEAD.
+ *
+ * A robust futex is a 32 bit integer stored in user mode shared memory.
+ * Bit 31 indicates that there are tasks waiting on the futex.
+ * Bit 30 indicates that the task that owned the futex has died.
+ * Bit 29 indicates that the futex is not recoverable and cannot be used.
+ * Bits 0-28 are the pid of the task that owns the futex lock, or zero if
+ * the futex is not locked.
+ */
+
+static kmem_cache_t *robust_futex_cachep;
+static kmem_cache_t *file_futex_cachep;
+/*
+ * Used to track registered robust futexes. Attached to linked list in inodes.
+ */
+struct futex_robust {
+	struct list_head list;
+	union futex_key key;
+};
+
+/**
+ * futex_free_robust_list - release the list of registered futexes.
+ * @inode: inode that may be a memory mapped file
+ *
+ * Called from dput() when a dentry reference count reaches zero.
+ * If the dentry is associated with a memory mapped file, then
+ * release the list of registered robust futexes that are contained
+ * in that mapping.
+ */
+void futex_free_robust_list(struct inode *inode)
+{
+	struct address_space *mapping = inode->i_mapping;
+ 	struct futex_robust *this, *next;
+	struct futex_head *futex_head = NULL;
+	struct list_head head;
+
+	futex_head = mapping->robust_head;
+
+	if (futex_head == NULL)
+		return;
+
+	if (list_empty(&futex_head->robust_list))
+		return;
+
+	INIT_LIST_HEAD(&head);
+	mutex_lock(&futex_head->robust_mutex);
+	list_splice_init(&futex_head->robust_list, &head);
+	mutex_unlock(&futex_head->robust_mutex);
+
+	list_for_each_entry_safe(this, next, &head, list) {
+		kmem_cache_free(robust_futex_cachep, this);
+	}
+
+	kmem_cache_free(file_futex_cachep, futex_head);
+	mapping->robust_head = NULL;
+
+	return;
+}
+
+/**
+ * get_private_uaddr - convert a private futex_key to a user addr
+ * @key: the futex_key that identifies a futex.
+ *
+ * Private futex_keys identify a futex that is in non-shared memory.
+ * Robust futexes should never result in private futex_keys, but keep
+ * this code for completeness.
+ * Returns zero if futex is not contained in current task's mm
+ */
+static unsigned long get_private_uaddr(union futex_key *key)
+{
+	unsigned long uaddr = 0;
+
+	if (key->private.mm == current->mm)
+		uaddr = key->private.uaddr;
+	return uaddr;
+}
+
+/**
+ * get_shared_uaddr - convert a shared futex_key to a user addr.
+ * @key: a futex_key that identifies a futex.
+ * @vma: a vma that may contain the futex
+ *
+ * Shared futex_keys identify a futex that is contained in a vma,
+ * and so may be shared.
+ * Returns zero if futex is not contained in @vma
+ */
+static unsigned long get_shared_uaddr(union futex_key *key,
+				      struct vm_area_struct *vma)
+{
+	unsigned long uaddr = 0;
+	unsigned long tmpaddr;
+	struct address_space *mapping;
+
+	mapping = vma->vm_file->f_mapping;
+	if (key->shared.inode == mapping->host) {
+		tmpaddr = ((key->shared.pgoff - vma->vm_pgoff) << PAGE_SHIFT)
+				+ (key->shared.offset & ~0x1)
+				+ vma->vm_start;
+		if (tmpaddr >= vma->vm_start && tmpaddr < vma->vm_end)
+			uaddr = tmpaddr;
+	}
+
+	return uaddr;
+}
+
+/**
+ * get_futex_uaddr - convert a futex_key to a user addr.
+ * @key: futex_key that identifies a futex
+ * @vma: vma that may contain the futex
+ *
+ * Converts both shared and private futex_keys.
+ * Returns zero if futex is not contained in @vma or in the current
+ * task's mm.
+ */
+static unsigned long get_futex_uaddr(union futex_key *key,
+				     struct vm_area_struct *vma)
+{
+	unsigned long uaddr;
+
+	if ((key->both.offset & 0x1) == 0)
+		uaddr = get_private_uaddr(key);
+	else
+		uaddr = get_shared_uaddr(key,vma);
+
+	return uaddr;
+}
+
+/**
+ * release_owned_futex - find futexes owned by the current task
+ * @tsk: task that is exiting
+ * @vma: vma containing robust futexes
+ * @head: list head for list of robust futexes
+ * @mutex: mutex that protects the list
+ *
+ * Walk the list of registered robust futexes for this @vma,
+ * setting the %FUTEX_OWNER_DIED flag on those futexes owned
+ * by the current, exiting task.
+ */
+static void
+release_owned_futex(struct task_struct *tsk, struct vm_area_struct *vma,
+		    struct futex_head *robust)
+{
+	struct futex_robust *this, *next;
+	struct mutex *mutex = &robust->robust_mutex;
+	struct list_head *head = &robust->robust_list;
+ 	unsigned long uaddr;
+	int value;
+
+	mutex_lock(mutex);
+
+	list_for_each_entry_safe(this, next, head, list) {
+
+		uaddr = get_futex_uaddr(&this->key, vma);
+		if (uaddr == 0)
+			continue;
+
+		mutex_unlock(mutex);
+		up_read(&tsk->mm->mmap_sem);
+		get_user(value, (int __user *)uaddr);
+		if ((value & FUTEX_PID) == tsk->pid) {
+			value |= FUTEX_OWNER_DIED;
+			put_user(value, (int *__user)uaddr);
+			futex_wake(uaddr, 1);
+		}
+		down_read(&tsk->mm->mmap_sem);
+		mutex_lock(mutex);
+	}
+
+	mutex_unlock(mutex);
+}
+
+/**
+ * exit_futex - futex processing when a task exits.
+ *
+ * Called from do_exit() when a task exits. Mark all robust futexes
+ * that are owned by the current terminating task as %FUTEX_OWNER_DIED.
+ */
+
+void exit_futex(struct task_struct *tsk)
+{
+	struct mm_struct *mm = tsk->mm;
+	struct list_head *robust;
+	struct futex_head *head;
+	struct vm_area_struct *vma;
+	struct address_space *addr;
+
+	if (mm == NULL)
+		return;
+
+	down_read(&mm->mmap_sem);
+
+	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
+		if (vma->vm_file == NULL)
+			continue;
+
+		addr = vma->vm_file->f_mapping;
+		if (addr == NULL)
+			continue;
+
+		if (addr->robust_head == NULL)
+			continue;
+
+		head = addr->robust_head;
+		robust = &head->robust_list;
+
+		if (list_empty(robust))
+			continue;
+
+		release_owned_futex(tsk, vma, head);
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+static void init_robust_list(struct address_space *f, struct futex_head *head)
+{
+	f->robust_head = head;
+	INIT_LIST_HEAD(&f->robust_head->robust_list);
+	mutex_init(&f->robust_head->robust_mutex);
+}
+
+/**
+ * futex_register - Record the existence of a robust futex in a vma.
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is created. Looks up the vma that contains the futex and
+ * adds an entry to the list of all robust futexes in the vma.
+ */
+static int futex_register(unsigned long uaddr, unsigned int attr)
+{
+	int ret = 0;
+	struct futex_robust *robust;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	struct address_space *addr = NULL;
+	struct list_head *head = NULL;
+	struct mutex *mutex = NULL;
+	struct futex_head *rb, *file_futex = NULL;
+
+	if ((attr & FUTEX_ATTR_SHARED) == 0)
+		return -EADDRNOTAVAIL;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, uaddr);
+
+	if (vma->vm_file) {
+		addr = vma->vm_file->f_mapping;
+		if (addr == NULL) {
+			ret = -EADDRNOTAVAIL;
+			goto out;
+		}
+		rb = addr->robust_head;
+		robust = kmem_cache_alloc(robust_futex_cachep, GFP_KERNEL);
+		if (!robust) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (rb == NULL) {
+			file_futex = kmem_cache_alloc(file_futex_cachep, GFP_KERNEL);
+			if (!file_futex) {
+				kmem_cache_free(robust_futex_cachep, robust);
+				ret = -ENOMEM;
+				goto out;
+			}
+			init_robust_list(addr, file_futex);
+			rb = addr->robust_head;
+		}
+		head = &rb->robust_list;
+		mutex = &rb->robust_mutex;
+	} else {
+		ret = -EADDRNOTAVAIL;
+		goto out;
+	}
+
+	mutex_lock(mutex);
+	list_add_tail(&robust->list, head);
+	mutex_unlock(mutex);
+
+out:
+	up_read(&mm->mmap_sem);
+
+	return ret;
+}
+
+/**
+ * futex_deregister - Delete robust futex registration from a vma
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall) when a robust
+ * futex is destroyed. Looks up the vma that contains the futex and
+ * removes the futex entry from the list of all robust futexes in
+ * the vma.
+ */
+static int futex_deregister(unsigned long uaddr)
+{
+	union futex_key key;
+	struct mm_struct *mm = current->mm;
+	struct list_head *head = NULL;
+	struct mutex *mutex = NULL;
+	struct vm_area_struct *vma;
+	struct address_space *addr;
+	struct futex_head *robust;
+	struct futex_robust *this, *next;
+	int ret;
+
+	down_read(&mm->mmap_sem);
+
+	ret = get_futex_key(uaddr, &key);
+	if (unlikely(ret != 0))
+		goto out;
+	vma = find_extend_vma(mm, uaddr);
+	if (vma->vm_file) {
+		addr = vma->vm_file->f_mapping;
+		if (addr) {
+			robust = addr->robust_head;
+			mutex = &robust->robust_mutex;
+			head = &robust->robust_list;
+		}
+	}
+	if (head == NULL) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	mutex_lock(mutex);
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (match_futex(&this->key, &key)) {
+			futex_wake(uaddr, 1);
+			list_del(&this->list);
+			kmem_cache_free(robust_futex_cachep, this);
+			break;
+		}
+	}
+
+	mutex_unlock(mutex);
+out:
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+
+/**
+ * futex_recover - Recover a futex after its owner died
+ * @uaddr: user space address of the robust futex
+ *
+ * Called from user space (through sys_futex syscall).
+ * When a task dies while owning a robust futex, the futex is
+ * marked with %FUTEX_OWNER_DIED and ownership is transferred
+ * to the next waiting task. That task can choose to restore
+ * the futex to a useful state by calling this function.
+ */
+static int futex_recover(unsigned long uaddr)
+{
+	int ret = 0;
+	int value = 0;
+	union futex_key key;
+
+	down_read(&current->mm->mmap_sem);
+	ret = get_futex_key(uaddr, &key);
+	up_read(&current->mm->mmap_sem);
+	if (ret != 0)
+		return ret;
+
+	if (get_user(value, (int *__user)uaddr))
+		return ret;
+
+	value &= ~FUTEX_OWNER_DIED;
+	return put_user(value, (int *__user)uaddr);
+}
+#endif
+
 long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,
 		unsigned long uaddr2, int val2, int val3)
 {
@@ -854,6 +1238,17 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE_OP:
 		ret = futex_wake_op(uaddr, uaddr2, val, val2, val3);
 		break;
+#ifdef CONFIG_ROBUST_FUTEX
+	case FUTEX_REGISTER:
+		ret = futex_register(uaddr, val);
+		break;
+	case FUTEX_DEREGISTER:
+		ret = futex_deregister(uaddr);
+		break;
+	case FUTEX_RECOVER:
+		ret = futex_recover(uaddr);
+		break;
+#endif
 	default:
 		ret = -ENOSYS;
 	}
@@ -901,6 +1296,12 @@ static int __init init(void)
 {
 	unsigned int i;
 
+#ifdef CONFIG_ROBUST_FUTEX
+	robust_futex_cachep = kmem_cache_create("robust_futex",
+			       sizeof(struct futex_robust), 0, 0, NULL, NULL);
+	file_futex_cachep = kmem_cache_create("file_futex",
+			       sizeof(struct futex_head), 0, 0, NULL, NULL);
+#endif
 	register_filesystem(&futex_fs_type);
 	futex_mnt = kern_mount(&futex_fs_type);
 
Index: linux-2.6.15/init/Kconfig
===================================================================
--- linux-2.6.15.orig/init/Kconfig
+++ linux-2.6.15/init/Kconfig
@@ -348,6 +348,15 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config ROBUST_FUTEX
+	bool "Enable robust futex support"
+	depends on FUTEX
+	default y
+	help
+	  Enable this option if you want to use robust user space mutexes.
+	  Enabling this option slows down the exit path of the kernel for
+	  all processes.  Robust futexes will run glibc-based applications correctly.
+
 config EPOLL
 	bool "Enable eventpoll support" if EMBEDDED
 	default y
Index: linux-2.6.15/fs/inode.c
===================================================================
--- linux-2.6.15.orig/fs/inode.c
+++ linux-2.6.15/fs/inode.c
@@ -23,6 +23,7 @@
 #include <linux/bootmem.h>
 #include <linux/inotify.h>
 #include <linux/mount.h>
+#include <linux/futex.h>
 
 /*
  * This is needed for the following functions:
@@ -175,6 +176,7 @@ void destroy_inode(struct inode *inode) 
 	if (inode_has_buffers(inode))
 		BUG();
 	security_inode_free(inode);
+	futex_free_robust_list(inode);
 	if (inode->i_sb->s_op->destroy_inode)
 		inode->i_sb->s_op->destroy_inode(inode);
 	else

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [robust-futex-4] futex: robust futex support
  2006-01-19  5:22             ` Andrew Morton
  2006-01-20  0:47               ` [robust-futex-5] " david singleton
  2006-01-20 17:41               ` [robust-futex-4] " Ingo Oeser
@ 2006-01-23 18:20               ` Todd Kneisel
  2 siblings, 0 replies; 15+ messages in thread
From: Todd Kneisel @ 2006-01-23 18:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: david singleton, drepper, mingo, linux-kernel

On 1/18/06, Andrew Morton <akpm@osdl.org> wrote:
> > +/**
> > + * futex_free_robust_list - release the list of registered futexes.
> > + * @inode: inode that may be a memory mapped file
> > + *
> > + * Called from dput() when a dentry reference count reaches zero.
> > + * If the dentry is associated with a memory mapped file, then
> > + * release the list of registered robust futexes that are contained
> > + * in that mapping.
> > + */
> > +void futex_free_robust_list(struct inode *inode)
> > +{
> > +     struct address_space *mapping;
> > +     struct list_head *head;
> > +     struct futex_robust *this, *next;
> > +     struct futex_head *futex_head = NULL;
> > +
> > +     if (inode == NULL)
> > +             return;
>
> Is this test needed?
>
> This function is called when a dentry's refcount falls to zero.  But there
> could be other refs to this inode which might get upset at having their
> robust futexes thrown away.  Shouldn't this be based on inode destruction
> rather than dentry?
>

In an early version, it was based on inode destruction. But inodes are
not destroyed
at process termination. If I understand the code, they're cached so that another
process that opens the same file will not have to build the inode from scratch.

So I based it on the dentry's refcount falling to zero, which occurs at process
termination. I did consider the problem of other references to the
inode. The only
scenario I could come up with was mapping the file using hard links. Then there
would be multiple dentrys referencing the same inode. This could be fixed by
adding a refcount to the futex_robust structure, but I never got around to doing
this.

Todd.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-01-23 18:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-14  1:00 [robust-futex-1] futex: robust futex support David Singleton
2006-01-15  0:02 ` Ulrich Drepper
2006-01-15  0:04   ` david singleton
2006-01-15  5:18     ` Ulrich Drepper
2006-01-15 20:00       ` David Singleton
2006-01-17  2:27       ` [robust-futex-3] " david singleton
2006-01-17 17:32         ` Ulrich Drepper
2006-01-17 17:50         ` Ulrich Drepper
2006-01-19  2:26           ` [robust-futex-4] " david singleton
2006-01-19  5:22             ` Andrew Morton
2006-01-20  0:47               ` [robust-futex-5] " david singleton
2006-01-20 17:41               ` [robust-futex-4] " Ingo Oeser
2006-01-20 22:18                 ` Andrew Morton
2006-01-21  2:30                   ` david singleton
2006-01-23 18:20               ` Todd Kneisel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).