linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
@ 2008-09-26 17:32 Peter Zijlstra
  2008-09-26 17:32 ` [PATCH 1/4] futex: rely on get_user_pages() " Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Peter Zijlstra @ 2008-09-26 17:32 UTC (permalink / raw)
  To: Nick Piggin, Eric Dumazet, Ingo Molnar, Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, linux-mm

Since get_user_pages_fast() made it in, I thought to give this another try.
Lightly tested by disabling the private futexes and running some java proglets.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] futex: rely on get_user_pages() for shared futexes
  2008-09-26 17:32 [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Peter Zijlstra
@ 2008-09-26 17:32 ` Peter Zijlstra
  2008-09-26 17:32 ` [PATCH 2/4] futex: reduce mmap_sem usage Peter Zijlstra
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2008-09-26 17:32 UTC (permalink / raw)
  To: Nick Piggin, Eric Dumazet, Ingo Molnar, Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, linux-mm

[-- Attachment #1: futex-gup.patch --]
[-- Type: text/plain, Size: 8882 bytes --]

On the way of getting rid of the mmap_sem requirement for shared futexes,
start by relying on get_user_pages().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/futex.h |    2 
 kernel/futex.c        |  162 ++++++++++++++++++++++++--------------------------
 2 files changed, 82 insertions(+), 82 deletions(-)

Index: linux-2.6/kernel/futex.c
===================================================================
--- linux-2.6.orig/kernel/futex.c
+++ linux-2.6/kernel/futex.c
@@ -161,6 +161,45 @@ static inline int match_futex(union fute
 		&& key1->both.offset == key2->both.offset);
 }
 
+/*
+ * Take a reference to the resource addressed by a key.
+ * Can be called while holding spinlocks.
+ *
+ */
+static void get_futex_key_refs(union futex_key *key)
+{
+	if (!key->both.ptr)
+		return;
+
+	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
+	case FUT_OFF_INODE:
+		atomic_inc(&key->shared.inode->i_count);
+		break;
+	case FUT_OFF_MMSHARED:
+		atomic_inc(&key->private.mm->mm_count);
+		break;
+	}
+}
+
+/*
+ * Drop a reference to the resource addressed by a key.
+ * The hash bucket spinlock must not be held.
+ */
+static void drop_futex_key_refs(union futex_key *key)
+{
+	if (!key->both.ptr)
+		return;
+
+	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
+	case FUT_OFF_INODE:
+		iput(key->shared.inode);
+		break;
+	case FUT_OFF_MMSHARED:
+		mmdrop(key->private.mm);
+		break;
+	}
+}
+
 /**
  * get_futex_key - Get parameters which are the keys for a futex.
  * @uaddr: virtual address of the futex
@@ -184,7 +223,6 @@ static int get_futex_key(u32 __user *uad
 {
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
-	struct vm_area_struct *vma;
 	struct page *page;
 	int err;
 
@@ -210,98 +248,47 @@ static int get_futex_key(u32 __user *uad
 		key->private.address = address;
 		return 0;
 	}
-	/*
-	 * The futex is hashed differently depending on whether
-	 * it's in a shared or private mapping.  So check vma first.
-	 */
-	vma = find_extend_vma(mm, address);
-	if (unlikely(!vma))
-		return -EFAULT;
 
-	/*
-	 * Permissions.
-	 */
-	if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ))
-		return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES;
+again:
+	err = get_user_pages(current, mm, address, 1, 0, 0, &page, NULL);
+	if (err < 0)
+		return err;
+
+	lock_page(page);
+	if (!page->mapping) {
+		unlock_page(page);
+		put_page(page);
+		goto again;
+	}
 
 	/*
 	 * Private mappings are handled in a simple way.
 	 *
 	 * NOTE: When userspace waits on a MAP_SHARED mapping, even if
 	 * it's a read-only handle, it's expected that futexes attach to
-	 * the object not the particular process.  Therefore we use
-	 * VM_MAYSHARE here, not VM_SHARED which is restricted to shared
-	 * mappings of _writable_ handles.
+	 * the object not the particular process.
 	 */
-	if (likely(!(vma->vm_flags & VM_MAYSHARE))) {
-		key->both.offset |= FUT_OFF_MMSHARED; /* reference taken on mm */
+	if (PageAnon(page)) {
+		key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
 		key->private.mm = mm;
 		key->private.address = address;
-		return 0;
-	}
-
-	/*
-	 * Linear file mappings are also simple.
-	 */
-	key->shared.inode = vma->vm_file->f_path.dentry->d_inode;
-	key->both.offset |= FUT_OFF_INODE; /* inode-based key. */
-	if (likely(!(vma->vm_flags & VM_NONLINEAR))) {
-		key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT)
-				     + vma->vm_pgoff);
-		return 0;
+	} else {
+		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
+		key->shared.inode = page->mapping->host;
+		key->shared.pgoff = page->index;
 	}
 
-	/*
-	 * We could walk the page table to read the non-linear
-	 * pte, and get the page index without fetching the page
-	 * from swap.  But that's a lot of code to duplicate here
-	 * for a rare case, so we simply fetch the page.
-	 */
-	err = get_user_pages(current, mm, address, 1, 0, 0, &page, NULL);
-	if (err >= 0) {
-		key->shared.pgoff =
-			page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-		put_page(page);
-		return 0;
-	}
-	return err;
-}
+	get_futex_key_refs(key);
 
-/*
- * Take a reference to the resource addressed by a key.
- * Can be called while holding spinlocks.
- *
- */
-static void get_futex_key_refs(union futex_key *key)
-{
-	if (key->both.ptr == NULL)
-		return;
-	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
-		case FUT_OFF_INODE:
-			atomic_inc(&key->shared.inode->i_count);
-			break;
-		case FUT_OFF_MMSHARED:
-			atomic_inc(&key->private.mm->mm_count);
-			break;
-	}
+	unlock_page(page);
+	put_page(page);
+	return 0;
 }
 
-/*
- * Drop a reference to the resource addressed by a key.
- * The hash bucket spinlock must not be held.
- */
-static void drop_futex_key_refs(union futex_key *key)
+static inline
+void put_futex_key(struct rw_semaphore *fshared, union futex_key *key)
 {
-	if (!key->both.ptr)
-		return;
-	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
-		case FUT_OFF_INODE:
-			iput(key->shared.inode);
-			break;
-		case FUT_OFF_MMSHARED:
-			mmdrop(key->private.mm);
-			break;
-	}
+	drop_futex_key_refs(key);
 }
 
 static u32 cmpxchg_futex_value_locked(u32 __user *uaddr, u32 uval, u32 newval)
@@ -385,6 +372,7 @@ static int refill_pi_state_cache(void)
 	/* pi_mutex gets initialized later */
 	pi_state->owner = NULL;
 	atomic_set(&pi_state->refcount, 1);
+	pi_state->key = FUTEX_KEY_INIT;
 
 	current->pi_state_cache = pi_state;
 
@@ -462,7 +450,7 @@ void exit_pi_state_list(struct task_stru
 	struct list_head *next, *head = &curr->pi_state_list;
 	struct futex_pi_state *pi_state;
 	struct futex_hash_bucket *hb;
-	union futex_key key;
+	union futex_key key = FUTEX_KEY_INIT;
 
 	if (!futex_cmpxchg_enabled)
 		return;
@@ -725,7 +713,7 @@ static int futex_wake(u32 __user *uaddr,
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
 	struct plist_head *head;
-	union futex_key key;
+	union futex_key key = FUTEX_KEY_INIT;
 	int ret;
 
 	if (!bitset)
@@ -760,6 +748,7 @@ static int futex_wake(u32 __user *uaddr,
 
 	spin_unlock(&hb->lock);
 out:
+	put_futex_key(fshared, &key);
 	futex_unlock_mm(fshared);
 	return ret;
 }
@@ -773,7 +762,7 @@ futex_wake_op(u32 __user *uaddr1, struct
 	      u32 __user *uaddr2,
 	      int nr_wake, int nr_wake2, int op)
 {
-	union futex_key key1, key2;
+	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
 	struct futex_hash_bucket *hb1, *hb2;
 	struct plist_head *head;
 	struct futex_q *this, *next;
@@ -873,6 +862,8 @@ retry:
 	if (hb1 != hb2)
 		spin_unlock(&hb2->lock);
 out:
+	put_futex_key(fshared, &key2);
+	put_futex_key(fshared, &key1);
 	futex_unlock_mm(fshared);
 
 	return ret;
@@ -886,7 +877,7 @@ static int futex_requeue(u32 __user *uad
 			 u32 __user *uaddr2,
 			 int nr_wake, int nr_requeue, u32 *cmpval)
 {
-	union futex_key key1, key2;
+	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
 	struct futex_hash_bucket *hb1, *hb2;
 	struct plist_head *head1;
 	struct futex_q *this, *next;
@@ -974,6 +965,8 @@ out_unlock:
 		drop_futex_key_refs(&key1);
 
 out:
+	put_futex_key(fshared, &key2);
+	put_futex_key(fshared, &key1);
 	futex_unlock_mm(fshared);
 	return ret;
 }
@@ -1220,6 +1213,7 @@ static int futex_wait(u32 __user *uaddr,
  retry:
 	futex_lock_mm(fshared);
 
+	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, fshared, &q.key);
 	if (unlikely(ret != 0))
 		goto out_release_sem;
@@ -1359,6 +1353,7 @@ static int futex_wait(u32 __user *uaddr,
 	queue_unlock(&q, hb);
 
  out_release_sem:
+	put_futex_key(fshared, &q.key);
 	futex_unlock_mm(fshared);
 	return ret;
 }
@@ -1410,6 +1405,7 @@ static int futex_lock_pi(u32 __user *uad
  retry:
 	futex_lock_mm(fshared);
 
+	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, fshared, &q.key);
 	if (unlikely(ret != 0))
 		goto out_release_sem;
@@ -1624,6 +1620,7 @@ static int futex_lock_pi(u32 __user *uad
 	queue_unlock(&q, hb);
 
  out_release_sem:
+	put_futex_key(fshared, &q.key);
 	futex_unlock_mm(fshared);
 	if (to)
 		destroy_hrtimer_on_stack(&to->timer);
@@ -1670,7 +1667,7 @@ static int futex_unlock_pi(u32 __user *u
 	struct futex_q *this, *next;
 	u32 uval;
 	struct plist_head *head;
-	union futex_key key;
+	union futex_key key = FUTEX_KEY_INIT;
 	int ret, attempt = 0;
 
 retry:
@@ -1743,6 +1740,7 @@ retry_unlocked:
 out_unlock:
 	spin_unlock(&hb->lock);
 out:
+	put_futex_key(fshared, &key);
 	futex_unlock_mm(fshared);
 
 	return ret;
Index: linux-2.6/include/linux/futex.h
===================================================================
--- linux-2.6.orig/include/linux/futex.h
+++ linux-2.6/include/linux/futex.h
@@ -164,6 +164,8 @@ union futex_key {
 	} both;
 };
 
+#define FUTEX_KEY_INIT (union futex_key) { .both = { .ptr = NULL } }
+
 #ifdef CONFIG_FUTEX
 extern void exit_robust_list(struct task_struct *curr);
 extern void exit_pi_state_list(struct task_struct *curr);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/4] futex: reduce mmap_sem usage
  2008-09-26 17:32 [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Peter Zijlstra
  2008-09-26 17:32 ` [PATCH 1/4] futex: rely on get_user_pages() " Peter Zijlstra
@ 2008-09-26 17:32 ` Peter Zijlstra
  2008-09-26 17:32 ` [PATCH 3/4] futex: use fast_gup() Peter Zijlstra
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2008-09-26 17:32 UTC (permalink / raw)
  To: Nick Piggin, Eric Dumazet, Ingo Molnar, Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, linux-mm

[-- Attachment #1: futex-reduce-mmap_sem.patch --]
[-- Type: text/plain, Size: 6652 bytes --]

now that we rely on get_user_pages() for the shared key handling
move all the mmap_sem stuff closely around the slow paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/futex.c |   83 ++-------------------------------------------------------
 1 file changed, 4 insertions(+), 79 deletions(-)

Index: linux-2.6/kernel/futex.c
===================================================================
--- linux-2.6.orig/kernel/futex.c
+++ linux-2.6/kernel/futex.c
@@ -123,24 +123,6 @@ struct futex_hash_bucket {
 static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
 
 /*
- * Take mm->mmap_sem, when futex is shared
- */
-static inline void futex_lock_mm(struct rw_semaphore *fshared)
-{
-	if (fshared)
-		down_read(fshared);
-}
-
-/*
- * Release mm->mmap_sem, when the futex is shared
- */
-static inline void futex_unlock_mm(struct rw_semaphore *fshared)
-{
-	if (fshared)
-		up_read(fshared);
-}
-
-/*
  * We hash on the keys returned from get_futex_key (see below).
  */
 static struct futex_hash_bucket *hash_futex(union futex_key *key)
@@ -250,7 +232,9 @@ static int get_futex_key(u32 __user *uad
 	}
 
 again:
+	down_read(&mm->mmap_sem);
 	err = get_user_pages(current, mm, address, 1, 0, 0, &page, NULL);
+	up_read(&mm->mmap_sem);
 	if (err < 0)
 		return err;
 
@@ -327,8 +311,7 @@ static int futex_handle_fault(unsigned l
 	if (attempt > 2)
 		return ret;
 
-	if (!fshared)
-		down_read(&mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
 	if (vma && address >= vma->vm_start &&
 	    (vma->vm_flags & VM_WRITE)) {
@@ -348,8 +331,7 @@ static int futex_handle_fault(unsigned l
 				current->min_flt++;
 		}
 	}
-	if (!fshared)
-		up_read(&mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return ret;
 }
 
@@ -719,8 +701,6 @@ static int futex_wake(u32 __user *uaddr,
 	if (!bitset)
 		return -EINVAL;
 
-	futex_lock_mm(fshared);
-
 	ret = get_futex_key(uaddr, fshared, &key);
 	if (unlikely(ret != 0))
 		goto out;
@@ -749,7 +729,6 @@ static int futex_wake(u32 __user *uaddr,
 	spin_unlock(&hb->lock);
 out:
 	put_futex_key(fshared, &key);
-	futex_unlock_mm(fshared);
 	return ret;
 }
 
@@ -769,8 +748,6 @@ futex_wake_op(u32 __user *uaddr1, struct
 	int ret, op_ret, attempt = 0;
 
 retryfull:
-	futex_lock_mm(fshared);
-
 	ret = get_futex_key(uaddr1, fshared, &key1);
 	if (unlikely(ret != 0))
 		goto out;
@@ -821,12 +798,6 @@ retry:
 			goto retry;
 		}
 
-		/*
-		 * If we would have faulted, release mmap_sem,
-		 * fault it in and start all over again.
-		 */
-		futex_unlock_mm(fshared);
-
 		ret = get_user(dummy, uaddr2);
 		if (ret)
 			return ret;
@@ -864,7 +835,6 @@ retry:
 out:
 	put_futex_key(fshared, &key2);
 	put_futex_key(fshared, &key1);
-	futex_unlock_mm(fshared);
 
 	return ret;
 }
@@ -884,8 +854,6 @@ static int futex_requeue(u32 __user *uad
 	int ret, drop_count = 0;
 
  retry:
-	futex_lock_mm(fshared);
-
 	ret = get_futex_key(uaddr1, fshared, &key1);
 	if (unlikely(ret != 0))
 		goto out;
@@ -908,12 +876,6 @@ static int futex_requeue(u32 __user *uad
 			if (hb1 != hb2)
 				spin_unlock(&hb2->lock);
 
-			/*
-			 * If we would have faulted, release mmap_sem, fault
-			 * it in and start all over again.
-			 */
-			futex_unlock_mm(fshared);
-
 			ret = get_user(curval, uaddr1);
 
 			if (!ret)
@@ -967,7 +929,6 @@ out_unlock:
 out:
 	put_futex_key(fshared, &key2);
 	put_futex_key(fshared, &key1);
-	futex_unlock_mm(fshared);
 	return ret;
 }
 
@@ -1211,8 +1172,6 @@ static int futex_wait(u32 __user *uaddr,
 	q.pi_state = NULL;
 	q.bitset = bitset;
  retry:
-	futex_lock_mm(fshared);
-
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, fshared, &q.key);
 	if (unlikely(ret != 0))
@@ -1245,12 +1204,6 @@ static int futex_wait(u32 __user *uaddr,
 	if (unlikely(ret)) {
 		queue_unlock(&q, hb);
 
-		/*
-		 * If we would have faulted, release mmap_sem, fault it in and
-		 * start all over again.
-		 */
-		futex_unlock_mm(fshared);
-
 		ret = get_user(uval, uaddr);
 
 		if (!ret)
@@ -1265,12 +1218,6 @@ static int futex_wait(u32 __user *uaddr,
 	queue_me(&q, hb);
 
 	/*
-	 * Now the futex is queued and we have checked the data, we
-	 * don't want to hold mmap_sem while we sleep.
-	 */
-	futex_unlock_mm(fshared);
-
-	/*
 	 * There might have been scheduling since the queue_me(), as we
 	 * cannot hold a spinlock across the get_user() in case it
 	 * faults, and we cannot just set TASK_INTERRUPTIBLE state when
@@ -1354,7 +1301,6 @@ static int futex_wait(u32 __user *uaddr,
 
  out_release_sem:
 	put_futex_key(fshared, &q.key);
-	futex_unlock_mm(fshared);
 	return ret;
 }
 
@@ -1403,8 +1349,6 @@ static int futex_lock_pi(u32 __user *uad
 
 	q.pi_state = NULL;
  retry:
-	futex_lock_mm(fshared);
-
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, fshared, &q.key);
 	if (unlikely(ret != 0))
@@ -1494,7 +1438,6 @@ static int futex_lock_pi(u32 __user *uad
 			 * exit to complete.
 			 */
 			queue_unlock(&q, hb);
-			futex_unlock_mm(fshared);
 			cond_resched();
 			goto retry;
 
@@ -1526,12 +1469,6 @@ static int futex_lock_pi(u32 __user *uad
 	 */
 	queue_me(&q, hb);
 
-	/*
-	 * Now the futex is queued and we have checked the data, we
-	 * don't want to hold mmap_sem while we sleep.
-	 */
-	futex_unlock_mm(fshared);
-
 	WARN_ON(!q.pi_state);
 	/*
 	 * Block on the PI mutex:
@@ -1544,7 +1481,6 @@ static int futex_lock_pi(u32 __user *uad
 		ret = ret ? 0 : -EWOULDBLOCK;
 	}
 
-	futex_lock_mm(fshared);
 	spin_lock(q.lock_ptr);
 
 	if (!ret) {
@@ -1610,7 +1546,6 @@ static int futex_lock_pi(u32 __user *uad
 
 	/* Unqueue and drop the lock */
 	unqueue_me_pi(&q);
-	futex_unlock_mm(fshared);
 
 	if (to)
 		destroy_hrtimer_on_stack(&to->timer);
@@ -1621,7 +1556,6 @@ static int futex_lock_pi(u32 __user *uad
 
  out_release_sem:
 	put_futex_key(fshared, &q.key);
-	futex_unlock_mm(fshared);
 	if (to)
 		destroy_hrtimer_on_stack(&to->timer);
 	return ret;
@@ -1645,8 +1579,6 @@ static int futex_lock_pi(u32 __user *uad
 		goto retry_unlocked;
 	}
 
-	futex_unlock_mm(fshared);
-
 	ret = get_user(uval, uaddr);
 	if (!ret && (uval != -EFAULT))
 		goto retry;
@@ -1678,10 +1610,6 @@ retry:
 	 */
 	if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current))
 		return -EPERM;
-	/*
-	 * First take all the futex related locks:
-	 */
-	futex_lock_mm(fshared);
 
 	ret = get_futex_key(uaddr, fshared, &key);
 	if (unlikely(ret != 0))
@@ -1741,7 +1669,6 @@ out_unlock:
 	spin_unlock(&hb->lock);
 out:
 	put_futex_key(fshared, &key);
-	futex_unlock_mm(fshared);
 
 	return ret;
 
@@ -1765,8 +1692,6 @@ pi_faulted:
 		goto retry_unlocked;
 	}
 
-	futex_unlock_mm(fshared);
-
 	ret = get_user(uval, uaddr);
 	if (!ret && (uval != -EFAULT))
 		goto retry;



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4] futex: use fast_gup()
  2008-09-26 17:32 [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Peter Zijlstra
  2008-09-26 17:32 ` [PATCH 1/4] futex: rely on get_user_pages() " Peter Zijlstra
  2008-09-26 17:32 ` [PATCH 2/4] futex: reduce mmap_sem usage Peter Zijlstra
@ 2008-09-26 17:32 ` Peter Zijlstra
  2008-09-26 17:32 ` [PATCH 4/4] futex: cleanup fshared Peter Zijlstra
  2008-09-27 16:17 ` [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Ingo Molnar
  4 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2008-09-26 17:32 UTC (permalink / raw)
  To: Nick Piggin, Eric Dumazet, Ingo Molnar, Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, linux-mm

[-- Attachment #1: futex-fast_gup.patch --]
[-- Type: text/plain, Size: 725 bytes --]

Change the get_user_pages() call with fast_gup() which doesn't require holding
the mmap_sem thereby removing the mmap_sem from all fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/futex.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Index: linux-2.6/kernel/futex.c
===================================================================
--- linux-2.6.orig/kernel/futex.c
+++ linux-2.6/kernel/futex.c
@@ -232,9 +232,7 @@ static int get_futex_key(u32 __user *uad
 	}
 
 again:
-	down_read(&mm->mmap_sem);
-	err = get_user_pages(current, mm, address, 1, 0, 0, &page, NULL);
-	up_read(&mm->mmap_sem);
+	err = get_user_pages_fast(address, 1, 0, &page);
 	if (err < 0)
 		return err;
 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] futex: cleanup fshared
  2008-09-26 17:32 [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Peter Zijlstra
                   ` (2 preceding siblings ...)
  2008-09-26 17:32 ` [PATCH 3/4] futex: use fast_gup() Peter Zijlstra
@ 2008-09-26 17:32 ` Peter Zijlstra
  2008-09-27 16:17 ` [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Ingo Molnar
  4 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2008-09-26 17:32 UTC (permalink / raw)
  To: Nick Piggin, Eric Dumazet, Ingo Molnar, Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, linux-mm

[-- Attachment #1: futex-cleanup.patch --]
[-- Type: text/plain, Size: 6262 bytes --]

fshared doesn't need to be a rw_sem pointer anymore, so clean that up.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/futex.c |   48 +++++++++++++++++++-----------------------------
 1 file changed, 19 insertions(+), 29 deletions(-)

Index: linux-2.6/kernel/futex.c
===================================================================
--- linux-2.6.orig/kernel/futex.c
+++ linux-2.6/kernel/futex.c
@@ -200,8 +200,7 @@ static void drop_futex_key_refs(union fu
  * For other futexes, it points to &current->mm->mmap_sem and
  * caller must have taken the reader lock. but NOT any spinlocks.
  */
-static int get_futex_key(u32 __user *uaddr, struct rw_semaphore *fshared,
-			 union futex_key *key)
+static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
 {
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
@@ -268,7 +267,7 @@ again:
 }
 
 static inline
-void put_futex_key(struct rw_semaphore *fshared, union futex_key *key)
+void put_futex_key(int fshared, union futex_key *key)
 {
 	drop_futex_key_refs(key);
 }
@@ -297,10 +296,8 @@ static int get_futex_value_locked(u32 *d
 
 /*
  * Fault handling.
- * if fshared is non NULL, current->mm->mmap_sem is already held
  */
-static int futex_handle_fault(unsigned long address,
-			      struct rw_semaphore *fshared, int attempt)
+static int futex_handle_fault(unsigned long address, int attempt)
 {
 	struct vm_area_struct * vma;
 	struct mm_struct *mm = current->mm;
@@ -687,8 +684,7 @@ double_lock_hb(struct futex_hash_bucket 
  * Wake up all waiters hashed on the physical page that is mapped
  * to this virtual address:
  */
-static int futex_wake(u32 __user *uaddr, struct rw_semaphore *fshared,
-		      int nr_wake, u32 bitset)
+static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
@@ -735,8 +731,7 @@ out:
  * to this virtual address:
  */
 static int
-futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared,
-	      u32 __user *uaddr2,
+futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
 	      int nr_wake, int nr_wake2, int op)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -790,7 +785,7 @@ retry:
 		 */
 		if (attempt++) {
 			ret = futex_handle_fault((unsigned long)uaddr2,
-						 fshared, attempt);
+						 attempt);
 			if (ret)
 				goto out;
 			goto retry;
@@ -841,8 +836,7 @@ out:
  * Requeue all waiters hashed on one physical page to another
  * physical page.
  */
-static int futex_requeue(u32 __user *uaddr1, struct rw_semaphore *fshared,
-			 u32 __user *uaddr2,
+static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
 			 int nr_wake, int nr_requeue, u32 *cmpval)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -1048,8 +1042,7 @@ static void unqueue_me_pi(struct futex_q
  * private futexes.
  */
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
-				struct task_struct *newowner,
-				struct rw_semaphore *fshared)
+				struct task_struct *newowner, int fshared)
 {
 	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
@@ -1128,7 +1121,7 @@ retry:
 handle_fault:
 	spin_unlock(q->lock_ptr);
 
-	ret = futex_handle_fault((unsigned long)uaddr, fshared, attempt++);
+	ret = futex_handle_fault((unsigned long)uaddr, attempt++);
 
 	spin_lock(q->lock_ptr);
 
@@ -1152,7 +1145,7 @@ handle_fault:
 
 static long futex_wait_restart(struct restart_block *restart);
 
-static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
+static int futex_wait(u32 __user *uaddr, int fshared,
 		      u32 val, ktime_t *abs_time, u32 bitset)
 {
 	struct task_struct *curr = current;
@@ -1306,13 +1299,13 @@ static int futex_wait(u32 __user *uaddr,
 static long futex_wait_restart(struct restart_block *restart)
 {
 	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
-	struct rw_semaphore *fshared = NULL;
+	int fshared = 0;
 	ktime_t t;
 
 	t.tv64 = restart->futex.time;
 	restart->fn = do_no_restart_syscall;
 	if (restart->futex.flags & FLAGS_SHARED)
-		fshared = &current->mm->mmap_sem;
+		fshared = 1;
 	return (long)futex_wait(uaddr, fshared, restart->futex.val, &t,
 				restart->futex.bitset);
 }
@@ -1324,7 +1317,7 @@ static long futex_wait_restart(struct re
  * if there are waiters then it will block, it does PI, etc. (Due to
  * races the kernel might see a 0 value of the futex too.)
  */
-static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
+static int futex_lock_pi(u32 __user *uaddr, int fshared,
 			 int detect, ktime_t *time, int trylock)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
@@ -1570,8 +1563,7 @@ static int futex_lock_pi(u32 __user *uad
 	queue_unlock(&q, hb);
 
 	if (attempt++) {
-		ret = futex_handle_fault((unsigned long)uaddr, fshared,
-					 attempt);
+		ret = futex_handle_fault((unsigned long)uaddr, attempt);
 		if (ret)
 			goto out_release_sem;
 		goto retry_unlocked;
@@ -1591,7 +1583,7 @@ static int futex_lock_pi(u32 __user *uad
  * This is the in-kernel slowpath: we look up the PI state (if any),
  * and do the rt-mutex unlock.
  */
-static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared)
+static int futex_unlock_pi(u32 __user *uaddr, int fshared)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
@@ -1682,8 +1674,7 @@ pi_faulted:
 	spin_unlock(&hb->lock);
 
 	if (attempt++) {
-		ret = futex_handle_fault((unsigned long)uaddr, fshared,
-					 attempt);
+		ret = futex_handle_fault((unsigned long)uaddr, attempt);
 		if (ret)
 			goto out;
 		uval = 0;
@@ -1815,8 +1806,7 @@ retry:
 		 * PI futexes happens in exit_pi_state():
 		 */
 		if (!pi && (uval & FUTEX_WAITERS))
-			futex_wake(uaddr, &curr->mm->mmap_sem, 1,
-				   FUTEX_BITSET_MATCH_ANY);
+			futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
 	}
 	return 0;
 }
@@ -1912,10 +1902,10 @@ long do_futex(u32 __user *uaddr, int op,
 {
 	int ret = -ENOSYS;
 	int cmd = op & FUTEX_CMD_MASK;
-	struct rw_semaphore *fshared = NULL;
+	int fshared = 0;
 
 	if (!(op & FUTEX_PRIVATE_FLAG))
-		fshared = &current->mm->mmap_sem;
+		fshared = 1;
 
 	switch (cmd) {
 	case FUTEX_WAIT:



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-26 17:32 [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Peter Zijlstra
                   ` (3 preceding siblings ...)
  2008-09-26 17:32 ` [PATCH 4/4] futex: cleanup fshared Peter Zijlstra
@ 2008-09-27 16:17 ` Ingo Molnar
  2008-09-30  7:21   ` Nick Piggin
  4 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2008-09-27 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, Eric Dumazet, Thomas Gleixner, linux-kernel, linux-mm


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> Since get_user_pages_fast() made it in, I thought to give this another 
> try. Lightly tested by disabling the private futexes and running some 
> java proglets.

hm, very interesting. Since this is an important futex usecase i started 
testing it in tip/core/futexes:

 cd33272: futex: cleanup fshared
 a135356: futex: use fast_gup()
 39ce77b: futex: reduce mmap_sem usage
 0d7a336: futex: rely on get_user_pages() for shared futexes

Nick, it would be nice to get an Acked-by/Reviewed-by from you, before 
we think about whether it should go upstream.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-27 16:17 ` [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Ingo Molnar
@ 2008-09-30  7:21   ` Nick Piggin
  2008-09-30  8:51     ` Peter Zijlstra
  2008-09-30 10:39     ` Ingo Molnar
  0 siblings, 2 replies; 14+ messages in thread
From: Nick Piggin @ 2008-09-30  7:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Eric Dumazet, Thomas Gleixner, linux-kernel, linux-mm

On Sunday 28 September 2008 02:17, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > Since get_user_pages_fast() made it in, I thought to give this another
> > try. Lightly tested by disabling the private futexes and running some
> > java proglets.
>
> hm, very interesting. Since this is an important futex usecase i started
> testing it in tip/core/futexes:
>
>  cd33272: futex: cleanup fshared
>  a135356: futex: use fast_gup()
>  39ce77b: futex: reduce mmap_sem usage
>  0d7a336: futex: rely on get_user_pages() for shared futexes
>
> Nick, it would be nice to get an Acked-by/Reviewed-by from you, before
> we think about whether it should go upstream.

Yeah, these all look pretty good. It's nice to get rid of mmap sem here.

Which reminds me, we need to put a might_lock mmap_sem into
get_user_pages_fast...

But these patches look good to me (last time we discussed them I thought
there was a race with page truncate, but it looks like you've closed that
by holding page lock over the whole operation...)

Nice work, Peter.

BTW. what kinds of things use inter-process futexes as of now?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-30  7:21   ` Nick Piggin
@ 2008-09-30  8:51     ` Peter Zijlstra
  2008-09-30 10:39       ` Ingo Molnar
                         ` (2 more replies)
  2008-09-30 10:39     ` Ingo Molnar
  1 sibling, 3 replies; 14+ messages in thread
From: Peter Zijlstra @ 2008-09-30  8:51 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Eric Dumazet, Thomas Gleixner, linux-kernel, linux-mm

On Tue, 2008-09-30 at 17:21 +1000, Nick Piggin wrote:
> On Sunday 28 September 2008 02:17, Ingo Molnar wrote:
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > Since get_user_pages_fast() made it in, I thought to give this another
> > > try. Lightly tested by disabling the private futexes and running some
> > > java proglets.
> >
> > hm, very interesting. Since this is an important futex usecase i started
> > testing it in tip/core/futexes:
> >
> >  cd33272: futex: cleanup fshared
> >  a135356: futex: use fast_gup()
> >  39ce77b: futex: reduce mmap_sem usage
> >  0d7a336: futex: rely on get_user_pages() for shared futexes
> >
> > Nick, it would be nice to get an Acked-by/Reviewed-by from you, before
> > we think about whether it should go upstream.
> 
> Yeah, these all look pretty good. It's nice to get rid of mmap sem here.
> 
> Which reminds me, we need to put a might_lock mmap_sem into
> get_user_pages_fast...

Yeah..

> But these patches look good to me (last time we discussed them I thought
> there was a race with page truncate, but it looks like you've closed that
> by holding page lock over the whole operation...)

Just to be sure, I only hold the page lock over the get_futex_key() op,
and drop it after getting a ref on the futex key.

I then drop the futex key ref after the futex op is complete.

This assumes the futex key ref is suffucient to guarantee whatever is
needed - which is the point I'm still not quite sure about myself.

The futex key ref was used between futex ops, with I assume the intent
to ensure the futex backing stays valid. However, the key ref only takes
a ref on either the inode or the mm, neither which avoid the specific
address of the futex to get unmapped between ops.

So in that respect we're not worse off than before, and any application
doing: futex_wait(), munmap(), futex_wake() is going to suffer. And as
far as I understand it get the waiting task stuck in D state for
ever-more or somesuch.

By now not holding the mmap_sem over the full futex op, but only over
the get_futex_key(), that munmap() race gets larger and the actual futex
could disappear while we're working on it, but in all cases I looked at
that will make the futex op return -EFAULT, so we should be good there.

Gah, now that I look at it, it looks like I made get_futex_key()
asymetric wrt private futexes, they don't take a ref on the key, but
then do drop one... ouch.. Patch below.

> Nice work, Peter.

Thanks!

> BTW. what kinds of things use inter-process futexes as of now?

On a regular modern Linux system, not much. But I've been told there are
applications out there that do indeed make heavy use of them - as
they're part of POSIX etc.. blah blah :-)

Also some legacy stuff that's stuck on an ancient glibc (but somehow did
manage to upgrade the kernel) might benefit.


---
Subject: futex: fixup get_futex_key() for private futexes
From: Peter Zijlstra <a.p.zijlstra@chello.nl>

With the get_user_pages_fast() patches we made get_futex_key() obtain a
reference on the returned key, but failed to do so for private futexes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/kernel/futex.c b/kernel/futex.c
index 197fdab..beee9af 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -227,6 +227,7 @@ static int get_futex_key(u32 __user *uaddr, int
fshared, union futex_key *key)
 			return -EFAULT;
 		key->private.mm = mm;
 		key->private.address = address;
+		get_futex_key_refs(key);
 		return 0;
 	}
 



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-30  7:21   ` Nick Piggin
  2008-09-30  8:51     ` Peter Zijlstra
@ 2008-09-30 10:39     ` Ingo Molnar
  1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2008-09-30 10:39 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, Eric Dumazet, Thomas Gleixner, linux-kernel, linux-mm


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Sunday 28 September 2008 02:17, Ingo Molnar wrote:
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > Since get_user_pages_fast() made it in, I thought to give this another
> > > try. Lightly tested by disabling the private futexes and running some
> > > java proglets.
> >
> > hm, very interesting. Since this is an important futex usecase i started
> > testing it in tip/core/futexes:
> >
> >  cd33272: futex: cleanup fshared
> >  a135356: futex: use fast_gup()
> >  39ce77b: futex: reduce mmap_sem usage
> >  0d7a336: futex: rely on get_user_pages() for shared futexes
> >
> > Nick, it would be nice to get an Acked-by/Reviewed-by from you, before
> > we think about whether it should go upstream.
> 
> Yeah, these all look pretty good. It's nice to get rid of mmap sem here.
> 
> Which reminds me, we need to put a might_lock mmap_sem into
> get_user_pages_fast...
> 
> But these patches look good to me (last time we discussed them I thought
> there was a race with page truncate, but it looks like you've closed that
> by holding page lock over the whole operation...)
> 
> Nice work, Peter.

great - i've added your Acked-by to the patches and have activated the 
tip/core/futexes branch for tip/auto-core-next linux-next integration.

here are the commits:

 42569c3: futex: fixup get_futex_key() for private futexes
 c2f9f20: futex: cleanup fshared
 734b05b: futex: use fast_gup()
 6127070: futex: reduce mmap_sem usage
 38d47c1: futex: rely on get_user_pages() for shared futexes

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-30  8:51     ` Peter Zijlstra
@ 2008-09-30 10:39       ` Ingo Molnar
  2008-09-30 10:42       ` Nick Piggin
  2008-09-30 10:55       ` Eric Dumazet
  2 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2008-09-30 10:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, Eric Dumazet, Thomas Gleixner, linux-kernel, linux-mm


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2008-09-30 at 17:21 +1000, Nick Piggin wrote:
> > On Sunday 28 September 2008 02:17, Ingo Molnar wrote:
> > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > > Since get_user_pages_fast() made it in, I thought to give this another
> > > > try. Lightly tested by disabling the private futexes and running some
> > > > java proglets.
> > >
> > > hm, very interesting. Since this is an important futex usecase i started
> > > testing it in tip/core/futexes:
> > >
> > >  cd33272: futex: cleanup fshared
> > >  a135356: futex: use fast_gup()
> > >  39ce77b: futex: reduce mmap_sem usage
> > >  0d7a336: futex: rely on get_user_pages() for shared futexes
> > >
> > > Nick, it would be nice to get an Acked-by/Reviewed-by from you, before
> > > we think about whether it should go upstream.
> > 
> > Yeah, these all look pretty good. It's nice to get rid of mmap sem here.
> > 
> > Which reminds me, we need to put a might_lock mmap_sem into
> > get_user_pages_fast...
> 
> Yeah..
> 
> > But these patches look good to me (last time we discussed them I thought
> > there was a race with page truncate, but it looks like you've closed that
> > by holding page lock over the whole operation...)
> 
> Just to be sure, I only hold the page lock over the get_futex_key() op,
> and drop it after getting a ref on the futex key.
> 
> I then drop the futex key ref after the futex op is complete.
> 
> This assumes the futex key ref is suffucient to guarantee whatever is
> needed - which is the point I'm still not quite sure about myself.
> 
> The futex key ref was used between futex ops, with I assume the intent
> to ensure the futex backing stays valid. However, the key ref only takes
> a ref on either the inode or the mm, neither which avoid the specific
> address of the futex to get unmapped between ops.
> 
> So in that respect we're not worse off than before, and any application
> doing: futex_wait(), munmap(), futex_wake() is going to suffer. And as
> far as I understand it get the waiting task stuck in D state for
> ever-more or somesuch.
> 
> By now not holding the mmap_sem over the full futex op, but only over
> the get_futex_key(), that munmap() race gets larger and the actual futex
> could disappear while we're working on it, but in all cases I looked at
> that will make the futex op return -EFAULT, so we should be good there.
> 
> Gah, now that I look at it, it looks like I made get_futex_key()
> asymetric wrt private futexes, they don't take a ref on the key, but
> then do drop one... ouch.. Patch below.
> 
> > Nice work, Peter.
> 
> Thanks!
> 
> > BTW. what kinds of things use inter-process futexes as of now?
> 
> On a regular modern Linux system, not much. But I've been told there are
> applications out there that do indeed make heavy use of them - as
> they're part of POSIX etc.. blah blah :-)
> 
> Also some legacy stuff that's stuck on an ancient glibc (but somehow did
> manage to upgrade the kernel) might benefit.
> 
> 
> ---
> Subject: futex: fixup get_futex_key() for private futexes
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> With the get_user_pages_fast() patches we made get_futex_key() obtain a
> reference on the returned key, but failed to do so for private futexes.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

applied to tip/core/futexes, thanks Peter!

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-30  8:51     ` Peter Zijlstra
  2008-09-30 10:39       ` Ingo Molnar
@ 2008-09-30 10:42       ` Nick Piggin
  2008-09-30 10:55       ` Eric Dumazet
  2 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2008-09-30 10:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Eric Dumazet, Thomas Gleixner, linux-kernel, linux-mm

On Tuesday 30 September 2008 18:51, Peter Zijlstra wrote:
> On Tue, 2008-09-30 at 17:21 +1000, Nick Piggin wrote:

> > But these patches look good to me (last time we discussed them I thought
> > there was a race with page truncate, but it looks like you've closed that
> > by holding page lock over the whole operation...)
>
> Just to be sure, I only hold the page lock over the get_futex_key() op,
> and drop it after getting a ref on the futex key.
>
> I then drop the futex key ref after the futex op is complete.

I think that's fine.


> This assumes the futex key ref is suffucient to guarantee whatever is
> needed - which is the point I'm still not quite sure about myself.

It is enough to guarantee enough to function normally I guess. Actually
futex syscalls are interesting in that they logically perform 2 totally
different operations and actually want 2 keys but it manages to mash
them into one (the user address).

They need a futex identifier, on which to wait/wake/etc. For anonymous
futexes, this is basically an arbitrary number which is taken from the
user address (but note that if that address is subsequently mremap()ed
for example, then the futex identifier does not follow the address). For
shared futexes, the pagecache address is used for the futex, which is
derived from the address.

Then they also need a memory address which is the target of the
particular operation requested. This doesn't actually logically have to
be directly related to the futex identifier...

I guess for all practical purposes, this is fine. "if you mremap a live
mutex, you lose" and similar probably doesn't constrain too many people
(although I don't think any of that is actually documented). But anyway,
just to point out that we do already have constraints on how existing
futexes are going to work.


> The futex key ref was used between futex ops, with I assume the intent
> to ensure the futex backing stays valid. However, the key ref only takes
> a ref on either the inode or the mm, neither which avoid the specific
> address of the futex to get unmapped between ops.
>
> So in that respect we're not worse off than before, and any application
> doing: futex_wait(), munmap(), futex_wake() is going to suffer. And as
> far as I understand it get the waiting task stuck in D state for
> ever-more or somesuch.

Yeah, I think you're fine there (the sleep is interruptible, so it should
not be a DoS or anything). Just so long as you always pin things like the
inode correctly while taking a ref on it...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-30  8:51     ` Peter Zijlstra
  2008-09-30 10:39       ` Ingo Molnar
  2008-09-30 10:42       ` Nick Piggin
@ 2008-09-30 10:55       ` Eric Dumazet
  2008-09-30 11:16         ` Peter Zijlstra
  2008-10-01  3:13         ` Ulrich Drepper
  2 siblings, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2008-09-30 10:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, Ingo Molnar, Thomas Gleixner, linux-kernel, linux-mm

Peter Zijlstra a écrit :
> Just to be sure, I only hold the page lock over the get_futex_key() op,
> and drop it after getting a ref on the futex key.
> 
> I then drop the futex key ref after the futex op is complete.
> 
> This assumes the futex key ref is suffucient to guarantee whatever is
> needed - which is the point I'm still not quite sure about myself.
> 
> The futex key ref was used between futex ops, with I assume the intent
> to ensure the futex backing stays valid. However, the key ref only takes
> a ref on either the inode or the mm, neither which avoid the specific
> address of the futex to get unmapped between ops.
> 
> So in that respect we're not worse off than before, and any application
> doing: futex_wait(), munmap(), futex_wake() is going to suffer. And as
> far as I understand it get the waiting task stuck in D state for
> ever-more or somesuch.
> 
> By now not holding the mmap_sem over the full futex op, but only over
> the get_futex_key(), that munmap() race gets larger and the actual futex
> could disappear while we're working on it, but in all cases I looked at
> that will make the futex op return -EFAULT, so we should be good there.
> 
> Gah, now that I look at it, it looks like I made get_futex_key()
> asymetric wrt private futexes, they don't take a ref on the key, but
> then do drop one... ouch.. Patch below.
> 
>> Nice work, Peter.
> 
> Thanks!
> 
>> BTW. what kinds of things use inter-process futexes as of now?
> 
> On a regular modern Linux system, not much. But I've been told there are
> applications out there that do indeed make heavy use of them - as
> they're part of POSIX etc.. blah blah :-)

inter-process futexes are still used for pthread creation/join 
(aka clear_child_tid / CLONE_CHILD_CLEARTID)

kernel/fork.c, functions mm_release() & sys_set_tid_address()

I am not sure how it could be converted to private futexes, since
old binaries (static glibc) will use FUTEX_WAKE like calls.

> 
> Also some legacy stuff that's stuck on an ancient glibc (but somehow did
> manage to upgrade the kernel) might benefit.
> 
> 
> ---
> Subject: futex: fixup get_futex_key() for private futexes
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> With the get_user_pages_fast() patches we made get_futex_key() obtain a
> reference on the returned key, but failed to do so for private futexes.
> 

Sorry I am lost...
private futexes dont need to get references at all...

> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 197fdab..beee9af 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -227,6 +227,7 @@ static int get_futex_key(u32 __user *uaddr, int
> fshared, union futex_key *key)
>  			return -EFAULT;
>  		key->private.mm = mm;
>  		key->private.address = address;
> +		get_futex_key_refs(key);
>  		return 0;
>  	}
>  
> 
> 
> 





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-30 10:55       ` Eric Dumazet
@ 2008-09-30 11:16         ` Peter Zijlstra
  2008-10-01  3:13         ` Ulrich Drepper
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2008-09-30 11:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nick Piggin, Ingo Molnar, Thomas Gleixner, linux-kernel, linux-mm

On Tue, 2008-09-30 at 12:55 +0200, Eric Dumazet wrote:
> Peter Zijlstra a écrit :

> > On a regular modern Linux system, not much. But I've been told there are
> > applications out there that do indeed make heavy use of them - as
> > they're part of POSIX etc.. blah blah :-)
> 
> inter-process futexes are still used for pthread creation/join 
> (aka clear_child_tid / CLONE_CHILD_CLEARTID)
> 
> kernel/fork.c, functions mm_release() & sys_set_tid_address()
> 
> I am not sure how it could be converted to private futexes, since
> old binaries (static glibc) will use FUTEX_WAKE like calls.

Ah, thanks, didn't know that.

> > ---
> > Subject: futex: fixup get_futex_key() for private futexes
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > 
> > With the get_user_pages_fast() patches we made get_futex_key() obtain a
> > reference on the returned key, but failed to do so for private futexes.
> > 
> 
> Sorry I am lost...
> private futexes dont need to get references at all...

Ah, right - its a NOP, that's why it didn't show up in testing.

The thing is, I changed the semantics of get_futex_key() to return a key
with reference taken. And then noticed I didn't take one in the private
futex path, and failed to notice the ref ops are nops for private
futexes.

So yeah, the below patch is basically a NOP, but we might consider
retaining it to maintain the symmetry... dunno

> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 197fdab..beee9af 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -227,6 +227,7 @@ static int get_futex_key(u32 __user *uaddr, int
> > fshared, union futex_key *key)
> >  			return -EFAULT;
> >  		key->private.mm = mm;
> >  		key->private.address = address;
> > +		get_futex_key_refs(key);
> >  		return 0;
> >  	}



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] futex: get_user_pages_fast() for shared futexes
  2008-09-30 10:55       ` Eric Dumazet
  2008-09-30 11:16         ` Peter Zijlstra
@ 2008-10-01  3:13         ` Ulrich Drepper
  1 sibling, 0 replies; 14+ messages in thread
From: Ulrich Drepper @ 2008-10-01  3:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Nick Piggin, Ingo Molnar, Thomas Gleixner,
	linux-kernel, linux-mm

On Tue, Sep 30, 2008 at 3:55 AM, Eric Dumazet <dada1@cosmosbay.com> wrote:
> I am not sure how it could be converted to private futexes, since
> old binaries (static glibc) will use FUTEX_WAKE like calls.

We considered this back when but any effort seems too much.  We'd
either need a clone flag (a scarce resource) or replace the set_tid
_address syscall.  Given that the futex is woken once per thread
lifetime it shouldn't be an issue.  If the semaphore shows up even
after this patch feel free to introduce a new set_tid_address syscall.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-10-01  3:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-26 17:32 [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Peter Zijlstra
2008-09-26 17:32 ` [PATCH 1/4] futex: rely on get_user_pages() " Peter Zijlstra
2008-09-26 17:32 ` [PATCH 2/4] futex: reduce mmap_sem usage Peter Zijlstra
2008-09-26 17:32 ` [PATCH 3/4] futex: use fast_gup() Peter Zijlstra
2008-09-26 17:32 ` [PATCH 4/4] futex: cleanup fshared Peter Zijlstra
2008-09-27 16:17 ` [PATCH 0/4] futex: get_user_pages_fast() for shared futexes Ingo Molnar
2008-09-30  7:21   ` Nick Piggin
2008-09-30  8:51     ` Peter Zijlstra
2008-09-30 10:39       ` Ingo Molnar
2008-09-30 10:42       ` Nick Piggin
2008-09-30 10:55       ` Eric Dumazet
2008-09-30 11:16         ` Peter Zijlstra
2008-10-01  3:13         ` Ulrich Drepper
2008-09-30 10:39     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).