All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] futex: Remove requirement for lock_page in
@ 2016-01-20  0:30 Davidlohr Bueso
  2016-01-20 14:03 ` Mel Gorman
  2016-01-20 19:55 ` Davidlohr Bueso
  0 siblings, 2 replies; 5+ messages in thread
From: Davidlohr Bueso @ 2016-01-20  0:30 UTC (permalink / raw)
  To: Mel Gorman, Peter Zijlstra, Thomas Gleixner, Ingo Molnar
  Cc: Sebastian Andrzej Siewior, Chris Mason, Darren Hart, dave,
	linux-kernel, Davidlohr Bueso

From: Mel Gorman <mgorman@suse.de>

When dealing with key handling for shared futexes, we can drastically reduce
the usage/need of the page lock. 1) For anonymous pages, the associated futex
object is the mm_struct which does not require the page lock. 2) For inode
based, keys, we can check under RCU read lock if the page mapping is still
valid and take reference to the inode. This just leaves one rare race that
requires the page lock in the slow path when examining the swapcache.

Additionally realtime users currently have a problem with the page lock being
contended for unbounded periods of time during futex operations.

Task A
     get_futex_key()
     lock_page()
    ---> preempted

Now any other task trying to lock that page will have to wait until
task A gets scheduled back in, which is an unbound time.

With this patch, we pretty much have a lockless futex_get_key().

Experiments show that this patch can boost/speedup the hashing of shared
futexes with the perf futex benchmarks (which is good for measuring such
change) by up to 45% when there are high (> 100) thread counts on a 60 core
Westmere. Lower counts are pretty much in the noise range or less than 10%,
but mid range can be seen at over 30% overall throughput (hash ops/sec).
This makes anon-mem shared futexes much closer to its private counterpart.

Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Not-yet-signed-off-by: Mel Gorman <mgorman@suse.de>
[ported on top of thp refcount rework, changelog, comments, fixes]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---

Changes from v2:

- Minor adjustments by peterz.
- Applies on top of -next-20160118

Changes from v1:

- Remove unnecesary mb, as atomic_inc returning does what we need.
- Fix bogus mapping load.
- Minor code cleanups/comments.

Note that I've not tested inode-based key changes as much as I would like.
And afaik still needs reviewing from fs people.

 kernel/futex.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index c6f5145..e241d21 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -520,7 +520,20 @@ again:
 	else
 		err = 0;
 
-	lock_page(page);
+	/*
+	 * The treatment of mapping from this point on is critical. The page
+	 * lock protects many things but in this context the page lock
+	 * stabilizes mapping, prevents inode freeing in the shared
+	 * file-backed region case and guards against movement to swap cache.
+	 *
+	 * Strictly speaking the page lock is not needed in all cases being
+	 * considered here and page lock forces unnecessarily serialization
+	 * From this point on, mapping will be re-verified if necessary and
+	 * page lock will be acquired only if it is unavoidable
+	 */
+	page = compound_head(page);
+	mapping = READ_ONCE(page->mapping);
+
 	/*
 	 * If page->mapping is NULL, then it cannot be a PageAnon
 	 * page; but it might be the ZERO_PAGE or in the gate area or
@@ -536,19 +549,32 @@ again:
 	 * shmem_writepage move it from filecache to swapcache beneath us:
 	 * an unlikely race, but we do need to retry for page->mapping.
 	 */
-	mapping = compound_head(page)->mapping;
-	if (!mapping) {
-		int shmem_swizzled = PageSwapCache(page);
+	if (unlikely(!mapping)) {
+		int shmem_swizzled;
+
+		/*
+		 * Page lock is required to identify which special case above
+		 * applies. If this is really a shmem page then the page lock
+		 * will prevent unexpected transitions.
+		 */
+		lock_page(page);
+		shmem_swizzled = PageSwapCache(page);
 		unlock_page(page);
 		put_page(page);
+		WARN_ON_ONCE(READ_ONCE(page->mapping));
+
 		if (shmem_swizzled)
 			goto again;
+
 		return -EFAULT;
 	}
 
 	/*
 	 * Private mappings are handled in a simple way.
 	 *
+	 * If the futex key is stored on an anonymous page, then the associated
+	 * object is the mm which is implicitly pinned by the calling process.
+	 *
 	 * NOTE: When userspace waits on a MAP_SHARED mapping, even if
 	 * it's a read-only handle, it's expected that futexes attach to
 	 * the object not the particular process.
@@ -566,16 +592,61 @@ again:
 		key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
 		key->private.mm = mm;
 		key->private.address = address;
+
+		get_futex_key_refs(key); /* implies MB (B) */
+
 	} else {
+		struct inode *inode;
+
+		/*
+		 * The associtated futex object in this case is the inode and
+		 * the page->mapping must be traversed. Ordinarily this should
+		 * be stabilised under page lock but it's not strictly
+		 * necessary in this case as we just want to pin the inode, not
+		 * update radix tree or anything like that.
+		 *
+		 * The RCU read lock is taken as the inode is finally freed
+		 * under RCU. If the mapping still matches expectations then the
+		 * mapping->host can be safely accessed as being a valid inode.
+		 */
+		rcu_read_lock();
+		if (READ_ONCE(page->mapping) != mapping ||
+		    !mapping->host) {
+			rcu_read_unlock();
+			put_page(page);
+
+			goto again;
+		}
+		inode = READ_ONCE(mapping->host);
+
+		/*
+		 * Take a reference unless it is about to be freed. Previously
+		 * this reference was taken by ihold under the page lock
+		 * pinning the inode in place so i_lock was unnecessary. The
+		 * only way for this check to fail is if the inode was
+		 * truncated in parallel so warn for now if this happens.
+		 *
+		 * We are not calling into get_futex_key_refs() in file-backed
+		 * cases, therefore a successful atomic_inc return below will
+		 * guarantee that get_futex_key() will continue to imply MB (B).
+		 */
+		if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
+			rcu_read_unlock();
+			put_page(page);
+
+			goto again;
+		}
+
+		/* Should be impossible but lets be paranoid for now */
+		BUG_ON(inode->i_mapping != mapping);
+
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
-		key->shared.inode = mapping->host;
+		key->shared.inode = inode;
 		key->shared.pgoff = basepage_index(page);
+		rcu_read_unlock();
 	}
 
-	get_futex_key_refs(key); /* implies MB (B) */
-
 out:
-	unlock_page(page);
 	put_page(page);
 	return err;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] futex: Remove requirement for lock_page in
  2016-01-20  0:30 [PATCH v3] futex: Remove requirement for lock_page in Davidlohr Bueso
@ 2016-01-20 14:03 ` Mel Gorman
  2016-01-20 19:55 ` Davidlohr Bueso
  1 sibling, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2016-01-20 14:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	linux-kernel, Davidlohr Bueso

On Tue, Jan 19, 2016 at 04:30:53PM -0800, Davidlohr Bueso wrote:
> From: Mel Gorman <mgorman@suse.de>
> 

The subject is truncated but other than that I cannot see a problem with
the patch. It's ok to convert my Not-yet-signed-off-by to a signed-off.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] futex: Remove requirement for lock_page in
  2016-01-20  0:30 [PATCH v3] futex: Remove requirement for lock_page in Davidlohr Bueso
  2016-01-20 14:03 ` Mel Gorman
@ 2016-01-20 19:55 ` Davidlohr Bueso
  2016-01-20 20:01   ` Thomas Gleixner
  1 sibling, 1 reply; 5+ messages in thread
From: Davidlohr Bueso @ 2016-01-20 19:55 UTC (permalink / raw)
  To: Mel Gorman, Peter Zijlstra, Thomas Gleixner, Ingo Molnar
  Cc: Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	linux-kernel, Davidlohr Bueso

On Tue, 19 Jan 2016, Bueso wrote:
> 	/*
> 	 * Private mappings are handled in a simple way.
> 	 *
>+	 * If the futex key is stored on an anonymous page, then the associated
>+	 * object is the mm which is implicitly pinned by the calling process.
>+	 *
> 	 * NOTE: When userspace waits on a MAP_SHARED mapping, even if
> 	 * it's a read-only handle, it's expected that futexes attach to
> 	 * the object not the particular process.
>@@ -566,16 +592,61 @@ again:
> 		key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
> 		key->private.mm = mm;
> 		key->private.address = address;
>+
>+		get_futex_key_refs(key); /* implies MB (B) */
>+
> 	} else {
>+		struct inode *inode;
>+
>+		/*
>+		 * The associtated futex object in this case is the inode and
>+		 * the page->mapping must be traversed. Ordinarily this should
>+		 * be stabilised under page lock but it's not strictly
>+		 * necessary in this case as we just want to pin the inode, not
>+		 * update radix tree or anything like that.
>+		 *
>+		 * The RCU read lock is taken as the inode is finally freed
>+		 * under RCU. If the mapping still matches expectations then the
>+		 * mapping->host can be safely accessed as being a valid inode.
>+		 */
>+		rcu_read_lock();
>+		if (READ_ONCE(page->mapping) != mapping ||
>+		    !mapping->host) {
>+			rcu_read_unlock();
>+			put_page(page);
>+
>+			goto again;
>+		}
>+		inode = READ_ONCE(mapping->host);
>+
>+		/*
>+		 * Take a reference unless it is about to be freed. Previously
>+		 * this reference was taken by ihold under the page lock
>+		 * pinning the inode in place so i_lock was unnecessary. The
>+		 * only way for this check to fail is if the inode was
>+		 * truncated in parallel so warn for now if this happens.
>+		 *
>+		 * We are not calling into get_futex_key_refs() in file-backed
>+		 * cases, therefore a successful atomic_inc return below will
>+		 * guarantee that get_futex_key() will continue to imply MB (B).
>+		 */
>+		if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
>+			rcu_read_unlock();
>+			put_page(page);
>+
>+			goto again;
>+		}
>+
>+		/* Should be impossible but lets be paranoid for now */
>+		BUG_ON(inode->i_mapping != mapping);

Hmm, do we want to transform this into an if and do rcu unlock and then just
call BUG()? I't doesn't matter at this point _anyway_, but it would be the right
thing to do, no?

>+
> 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
>-		key->shared.inode = mapping->host;
>+		key->shared.inode = inode;
> 		key->shared.pgoff = basepage_index(page);
>+		rcu_read_unlock();
> 	}
>
>-	get_futex_key_refs(key); /* implies MB (B) */
>-
> out:
>-	unlock_page(page);
> 	put_page(page);
> 	return err;
> }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] futex: Remove requirement for lock_page in
  2016-01-20 19:55 ` Davidlohr Bueso
@ 2016-01-20 20:01   ` Thomas Gleixner
  2016-01-20 20:20     ` Davidlohr Bueso
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2016-01-20 20:01 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Mel Gorman, Peter Zijlstra, Ingo Molnar,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	linux-kernel, Davidlohr Bueso

On Wed, 20 Jan 2016, Davidlohr Bueso wrote:
> On Tue, 19 Jan 2016, Bueso wrote:
> > +
> > +		/* Should be impossible but lets be paranoid for now */
> > +		BUG_ON(inode->i_mapping != mapping);
> 
> Hmm, do we want to transform this into an if and do rcu unlock and then just
> call BUG()? I't doesn't matter at this point _anyway_, but it would be the
> right
> thing to do, no?

The better solution is to err out gracefully.

    	   if (WARN_ON_ONCE(inode->i_mapping != mapping) {
	      	   err = -EFAULT;
		   rcu_read_unlock();
		   goto out;
    	   }

Hmm?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] futex: Remove requirement for lock_page in
  2016-01-20 20:01   ` Thomas Gleixner
@ 2016-01-20 20:20     ` Davidlohr Bueso
  0 siblings, 0 replies; 5+ messages in thread
From: Davidlohr Bueso @ 2016-01-20 20:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mel Gorman, Peter Zijlstra, Ingo Molnar,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	linux-kernel, Davidlohr Bueso

On Wed, 20 Jan 2016, Thomas Gleixner wrote:

>On Wed, 20 Jan 2016, Davidlohr Bueso wrote:
>> On Tue, 19 Jan 2016, Bueso wrote:
>> > +
>> > +		/* Should be impossible but lets be paranoid for now */
>> > +		BUG_ON(inode->i_mapping != mapping);
>>
>> Hmm, do we want to transform this into an if and do rcu unlock and then just
>> call BUG()? I't doesn't matter at this point _anyway_, but it would be the
>> right
>> thing to do, no?
>
>The better solution is to err out gracefully.
>
>    	   if (WARN_ON_ONCE(inode->i_mapping != mapping) {
>	      	   err = -EFAULT;
>		   rcu_read_unlock();
>		   goto out;
>    	   }
>
>Hmm?

Ok, Linus would probably like that as well. If we're going this way,
we also need to release inode reference, before the rcu unlock.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-20 20:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-20  0:30 [PATCH v3] futex: Remove requirement for lock_page in Davidlohr Bueso
2016-01-20 14:03 ` Mel Gorman
2016-01-20 19:55 ` Davidlohr Bueso
2016-01-20 20:01   ` Thomas Gleixner
2016-01-20 20:20     ` Davidlohr Bueso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.