All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5] futex: Remove requirement for lock_page in get_futex_key
@ 2016-02-04  6:28 Davidlohr Bueso
  2016-02-04  9:06 ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2016-02-04  6:28 UTC (permalink / raw)
  To: Mel Gorman, Peter Zijlstra, Thomas Gleixner, Ingo Molnar
  Cc: Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	Hugh Dickins, dave, linux-kernel, Davidlohr Bueso, Mel Gorman

From: Davidlohr Bueso <dbueso@suse.de>

From: Mel Gorman <mgorman@suse.de>

When dealing with key handling for shared futexes, we can drastically reduce
the usage/need of the page lock. 1) For anonymous pages, the associated futex
object is the mm_struct which does not require the page lock. 2) For inode
based, keys, we can check under RCU read lock if the page mapping is still
valid and take reference to the inode. This just leaves one rare race that
requires the page lock in the slow path when examining the swapcache.

Additionally realtime users currently have a problem with the page lock being
contended for unbounded periods of time during futex operations.

Task A
     get_futex_key()
     lock_page()
    ---> preempted

Now any other task trying to lock that page will have to wait until
task A gets scheduled back in, which is an unbound time.

With this patch, we pretty much have a lockless futex_get_key().

Experiments show that this patch can boost/speedup the hashing of shared
futexes with the perf futex benchmarks (which is good for measuring such
change) by up to 45% when there are high (> 100) thread counts on a 60 core
Westmere. Lower counts are pretty much in the noise range or less than 10%,
but mid range can be seen at over 30% overall throughput (hash ops/sec).
This makes anon-mem shared futexes much closer to its private counterpart.

Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
[ported on top of thp refcount rework, changelog, comments, fixes]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---

Changes from v4:
- Integrated feedback from Hugh, specifically:
   * check for page->mapping change with page lock held when
     detecting shm race.
   * move inode assign, if the mapping still matches what
     we expect should we have a valid inode.
   * move iput out of rcu lock.

Changes from v3:
- Redo mapping sanity check, now do not halt the kernel.

Changes from v2:

- Minor adjustments by peterz.
- Applies on top of -next-20160118

Changes from v1:
- Remove unnecesary mb, as atomic_inc returning does what we need.
- Fix bogus mapping load.
- Minor code cleanups/comments.

 kernel/futex.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 91 insertions(+), 8 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 5d6ce64..a00a15e 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -520,7 +520,20 @@ again:
 	else
 		err = 0;
 
-	lock_page(page);
+	/*
+	 * The treatment of mapping from this point on is critical. The page
+	 * lock protects many things but in this context the page lock
+	 * stabilizes mapping, prevents inode freeing in the shared
+	 * file-backed region case and guards against movement to swap cache.
+	 *
+	 * Strictly speaking the page lock is not needed in all cases being
+	 * considered here and page lock forces unnecessarily serialization
+	 * From this point on, mapping will be re-verified if necessary and
+	 * page lock will be acquired only if it is unavoidable
+	 */
+	page = compound_head(page);
+	mapping = READ_ONCE(page->mapping);
+
 	/*
 	 * If page->mapping is NULL, then it cannot be a PageAnon
 	 * page; but it might be the ZERO_PAGE or in the gate area or
@@ -536,19 +549,31 @@ again:
 	 * shmem_writepage move it from filecache to swapcache beneath us:
 	 * an unlikely race, but we do need to retry for page->mapping.
 	 */
-	mapping = compound_head(page)->mapping;
-	if (!mapping) {
-		int shmem_swizzled = PageSwapCache(page);
+	if (unlikely(!mapping)) {
+		int shmem_swizzled;
+
+		/*
+		 * Page lock is required to identify which special case above
+		 * applies. If this is really a shmem page then the page lock
+		 * will prevent unexpected transitions.
+		 */
+		lock_page(page);
+		shmem_swizzled = PageSwapCache(page) || page->mapping;
 		unlock_page(page);
 		put_page(page);
+
 		if (shmem_swizzled)
 			goto again;
+
 		return -EFAULT;
 	}
 
 	/*
 	 * Private mappings are handled in a simple way.
 	 *
+	 * If the futex key is stored on an anonymous page, then the associated
+	 * object is the mm which is implicitly pinned by the calling process.
+	 *
 	 * NOTE: When userspace waits on a MAP_SHARED mapping, even if
 	 * it's a read-only handle, it's expected that futexes attach to
 	 * the object not the particular process.
@@ -566,16 +591,74 @@ again:
 		key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
 		key->private.mm = mm;
 		key->private.address = address;
+
+		get_futex_key_refs(key); /* implies MB (B) */
+
 	} else {
+		struct inode *inode;
+
+		/*
+		 * The associtated futex object in this case is the inode and
+		 * the page->mapping must be traversed. Ordinarily this should
+		 * be stabilised under page lock but it's not strictly
+		 * necessary in this case as we just want to pin the inode, not
+		 * update the radix tree or anything like that.
+		 *
+		 * The RCU read lock is taken as the inode is finally freed
+		 * under RCU. If the mapping still matches expectations then the
+		 * mapping->host can be safely accessed as being a valid inode.
+		 */
+		rcu_read_lock();
+
+		if (READ_ONCE(page->mapping) != mapping) {
+			rcu_read_unlock();
+			put_page(page);
+
+			goto again;
+		}
+
+		inode = READ_ONCE(mapping->host);
+		if (!inode) {
+			rcu_read_unlock();
+			put_page(page);
+
+			goto again;
+		}
+
+		/*
+		 * Take a reference unless it is about to be freed. Previously
+		 * this reference was taken by ihold under the page lock
+		 * pinning the inode in place so i_lock was unnecessary. The
+		 * only way for this check to fail is if the inode was
+		 * truncated in parallel so warn for now if this happens.
+		 *
+		 * We are not calling into get_futex_key_refs() in file-backed
+		 * cases, therefore a successful atomic_inc return below will
+		 * guarantee that get_futex_key() will continue to imply MB (B).
+		 */
+		if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
+			rcu_read_unlock();
+			put_page(page);
+
+			goto again;
+		}
+
+		/* Should be impossible but lets be paranoid for now */
+		if (WARN_ON_ONCE(inode->i_mapping != mapping)) {
+			err = -EFAULT;
+			rcu_read_unlock();
+			iput(inode);
+
+			goto out;
+		}
+
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
-		key->shared.inode = mapping->host;
+		key->shared.inode = inode;
 		key->shared.pgoff = basepage_index(page);
+		rcu_read_unlock();
 	}
 
-	get_futex_key_refs(key); /* implies MB (B) */
-
 out:
-	unlock_page(page);
 	put_page(page);
 	return err;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] futex: Remove requirement for lock_page in get_futex_key
  2016-02-04  6:28 [PATCH v5] futex: Remove requirement for lock_page in get_futex_key Davidlohr Bueso
@ 2016-02-04  9:06 ` Thomas Gleixner
  2016-02-04 17:43   ` Davidlohr Bueso
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2016-02-04  9:06 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Mel Gorman, Peter Zijlstra, Ingo Molnar,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	Hugh Dickins, linux-kernel, Davidlohr Bueso, Mel Gorman

On Wed, 3 Feb 2016, Davidlohr Bueso wrote:
> +		 * We are not calling into get_futex_key_refs() in file-backed
> +		 * cases, therefore a successful atomic_inc return below will
> +		 * guarantee that get_futex_key() will continue to imply MB (B).

Can you please make that "MB (B)" part a bit more outstanding. I really had to
search for it.

> +		 */
> +		if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
> +			rcu_read_unlock();
> +			put_page(page);
> +
> +			goto again;
> +		}

Don't we need 

		smp_mb__after_atomic();

here to make it a full barrier on all architectures?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] futex: Remove requirement for lock_page in get_futex_key
  2016-02-04  9:06 ` Thomas Gleixner
@ 2016-02-04 17:43   ` Davidlohr Bueso
  2016-02-04 17:50     ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2016-02-04 17:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mel Gorman, Peter Zijlstra, Ingo Molnar,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	Hugh Dickins, linux-kernel, Davidlohr Bueso, Mel Gorman

On Thu, 04 Feb 2016, Thomas Gleixner wrote:

>On Wed, 3 Feb 2016, Davidlohr Bueso wrote:
>> +		 * We are not calling into get_futex_key_refs() in file-backed
>> +		 * cases, therefore a successful atomic_inc return below will
>> +		 * guarantee that get_futex_key() will continue to imply MB (B).
>
>Can you please make that "MB (B)" part a bit more outstanding. I really had to
>search for it.

Hmm as you know this is mostly explained at the begining of the file, and we
sprinkle MB (B) around the code based on that description. So I'm a bit confused
as to why you don't like like that comment.

>> +		 */
>> +		if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
>> +			rcu_read_unlock();
>> +			put_page(page);
>> +
>> +			goto again;
>> +		}
>
>Don't we need
>
>		smp_mb__after_atomic();
>
>here to make it a full barrier on all architectures?

I had this initially but, as Peter mentioned, we get that barrier with the
successful atomic_inc_not_zero call anyway. Or is it something else you had
in mind?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] futex: Remove requirement for lock_page in get_futex_key
  2016-02-04 17:43   ` Davidlohr Bueso
@ 2016-02-04 17:50     ` Thomas Gleixner
  2016-02-05  9:44       ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2016-02-04 17:50 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Mel Gorman, Peter Zijlstra, Ingo Molnar,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	Hugh Dickins, linux-kernel, Davidlohr Bueso, Mel Gorman

On Thu, 4 Feb 2016, Davidlohr Bueso wrote:

> On Thu, 04 Feb 2016, Thomas Gleixner wrote:
> 
> > On Wed, 3 Feb 2016, Davidlohr Bueso wrote:
> > > +		 * We are not calling into get_futex_key_refs() in file-backed
> > > +		 * cases, therefore a successful atomic_inc return below will
> > > +		 * guarantee that get_futex_key() will continue to imply MB
> > > (B).
> > 
> > Can you please make that "MB (B)" part a bit more outstanding. I really had
> > to
> > search for it.
> 
> Hmm as you know this is mostly explained at the begining of the file, and we
> sprinkle MB (B) around the code based on that description. So I'm a bit
> confused
> as to why you don't like like that comment.

The other "MB (B)" places are more outstanding. It did not spring in my eye
immideately. So it's a pure cosmetic issue.

> > > +		 */
> > > +		if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
> > > +			rcu_read_unlock();
> > > +			put_page(page);
> > > +
> > > +			goto again;
> > > +		}
> > 
> > Don't we need
> > 
> > 		smp_mb__after_atomic();
> > 
> > here to make it a full barrier on all architectures?
> 
> I had this initially but, as Peter mentioned, we get that barrier with the
> successful atomic_inc_not_zero call anyway. Or is it something else you had
> in mind?

Oh. I missed that comment from Peter. And yes, I missed that
atomic_inc_not_zero() already has the barrier in the success case. :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] futex: Remove requirement for lock_page in get_futex_key
  2016-02-04 17:50     ` Thomas Gleixner
@ 2016-02-05  9:44       ` Ingo Molnar
  2016-02-05 18:02         ` Davidlohr Bueso
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2016-02-05  9:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Davidlohr Bueso, Mel Gorman, Peter Zijlstra,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	Hugh Dickins, linux-kernel, Davidlohr Bueso, Mel Gorman


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 4 Feb 2016, Davidlohr Bueso wrote:
> 
> > On Thu, 04 Feb 2016, Thomas Gleixner wrote:
> > 
> > > On Wed, 3 Feb 2016, Davidlohr Bueso wrote:
> > > > +		 * We are not calling into get_futex_key_refs() in file-backed
> > > > +		 * cases, therefore a successful atomic_inc return below will
> > > > +		 * guarantee that get_futex_key() will continue to imply MB
> > > > (B).
> > > 
> > > Can you please make that "MB (B)" part a bit more outstanding. I really had
> > > to
> > > search for it.
> > 
> > Hmm as you know this is mostly explained at the begining of the file, and we
> > sprinkle MB (B) around the code based on that description. So I'm a bit
> > confused
> > as to why you don't like like that comment.
> 
> The other "MB (B)" places are more outstanding. It did not spring in my eye 
> immideately. So it's a pure cosmetic issue.

So I too didn't understand that sentence at first, because the capitalization 
really throws off quick parsing of that comment, as 'MB' ususally denotes 
megabytes.

So please change it to "mb(); (A)" or so - and I think all of these comments 
should be changed to use a standard API name for the barrier they imply, as the 
head of futex.c does:

 *   waiters++; (a)
 *   mb(); (A) <-- paired with -.
 *                              |
 *   lock(hash_bucket(futex));  |
 *                              |
 *   uval = *futex;             |
 *                              |        *futex = newval;
 *                              |        sys_futex(WAKE, futex);
 *                              |          futex_wake(futex);
 *                              |
 *                              `------->  mb(); (B)

Btw., pedantic: shouldn't that be smp_mb()? Futexes don't operate on IO spaces, so 
on UP they only need compiler barriers.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] futex: Remove requirement for lock_page in get_futex_key
  2016-02-05  9:44       ` Ingo Molnar
@ 2016-02-05 18:02         ` Davidlohr Bueso
  2016-02-06  8:22           ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2016-02-05 18:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Mel Gorman, Peter Zijlstra,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	Hugh Dickins, linux-kernel, Davidlohr Bueso, Mel Gorman

On Fri, 05 Feb 2016, Ingo Molnar wrote:

>So I too didn't understand that sentence at first, because the capitalization
>really throws off quick parsing of that comment, as 'MB' ususally denotes
>megabytes.

Sure, fair enough.

>
>So please change it to "mb(); (A)" or so - and I think all of these comments
>should be changed to use a standard API name for the barrier they imply, as the
>head of futex.c does:
>
> *   waiters++; (a)
> *   mb(); (A) <-- paired with -.
> *                              |
> *   lock(hash_bucket(futex));  |
> *                              |
> *   uval = *futex;             |
> *                              |        *futex = newval;
> *                              |        sys_futex(WAKE, futex);
> *                              |          futex_wake(futex);
> *                              |
> *                              `------->  mb(); (B)
>
>Btw., pedantic: shouldn't that be smp_mb()? Futexes don't operate on IO spaces, so
>on UP they only need compiler barriers.

Right, but we do in fact use smp barriers in this cases in the real code, that mb() is
just in the comments, I guess it would be desirable to change it to smp_mb nonetheless.

However, could these changes be in a followup? Mainly because the barrier B references
will be updated across all futex.c... unless there are still concerns about this particular
patch, of course.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5] futex: Remove requirement for lock_page in get_futex_key
  2016-02-05 18:02         ` Davidlohr Bueso
@ 2016-02-06  8:22           ` Ingo Molnar
  0 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2016-02-06  8:22 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Thomas Gleixner, Mel Gorman, Peter Zijlstra,
	Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
	Hugh Dickins, linux-kernel, Davidlohr Bueso, Mel Gorman


* Davidlohr Bueso <dave@stgolabs.net> wrote:

> On Fri, 05 Feb 2016, Ingo Molnar wrote:
> 
> >So I too didn't understand that sentence at first, because the capitalization
> >really throws off quick parsing of that comment, as 'MB' ususally denotes
> >megabytes.
> 
> Sure, fair enough.
> 
> >
> >So please change it to "mb(); (A)" or so - and I think all of these comments
> >should be changed to use a standard API name for the barrier they imply, as the
> >head of futex.c does:
> >
> >*   waiters++; (a)
> >*   mb(); (A) <-- paired with -.
> >*                              |
> >*   lock(hash_bucket(futex));  |
> >*                              |
> >*   uval = *futex;             |
> >*                              |        *futex = newval;
> >*                              |        sys_futex(WAKE, futex);
> >*                              |          futex_wake(futex);
> >*                              |
> >*                              `------->  mb(); (B)
> >
> >Btw., pedantic: shouldn't that be smp_mb()? Futexes don't operate on IO spaces, so
> >on UP they only need compiler barriers.
> 
> Right, but we do in fact use smp barriers in this cases in the real code, that 
> mb() is just in the comments, I guess it would be desirable to change it to 
> smp_mb nonetheless.
> 
> However, could these changes be in a followup? Mainly because the barrier B 
> references will be updated across all futex.c... unless there are still concerns 
> about this particular patch, of course.

How about doing it first in a preparatory patch? So that reviews of patches 
actually making substantial changes don't get derailed by hard to read comments 
and so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-06  8:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04  6:28 [PATCH v5] futex: Remove requirement for lock_page in get_futex_key Davidlohr Bueso
2016-02-04  9:06 ` Thomas Gleixner
2016-02-04 17:43   ` Davidlohr Bueso
2016-02-04 17:50     ` Thomas Gleixner
2016-02-05  9:44       ` Ingo Molnar
2016-02-05 18:02         ` Davidlohr Bueso
2016-02-06  8:22           ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.