linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] futex: Fix v4.6+ ltp futex_wait04 regression
@ 2016-06-06  9:40 Mike Galbraith
  2016-06-06 13:07 ` Peter Zijlstra
  2016-06-07 10:50 ` Mel Gorman
  0 siblings, 2 replies; 6+ messages in thread
From: Mike Galbraith @ 2016-06-06  9:40 UTC (permalink / raw)
  To: Mel Gorman; +Cc: lkml, Thomas Gleixner, Sebastian Andrzej Siewior

Hi Mel (who is out of the office today),

I initially reported this on the rt-users list, thinking at the time
that it was only showing up in -rt kernels, but that turned out to not
be the case, it just requires an enterprise config for some reason mm
folks will likely recognize instantly.  I just happen to use same when
building devel -rt trees to at least get some build warning coverage.

Below is a stab at fixing the thing up while you're off doing whatever
an Irishman does on a national holiday (hm;).  If it's ok as is, fine,
I saved you a couple minutes.  If not, oh well, consider it diagnosis.

65d8fc77 futex: Remove requirement for lock_page() in get_futex_key()
introduced a regression of futex_wait04 when an enterprise size config is
used.  Per trace_printk(), when we assign page = compound_head(page), we're
subsequently using a different page than the one we locked into memory,
thus mucking up references.

Fixes: 65d8fc77 futex: Remove requirement for lock_page() in get_futex_key()
Signed-off-by: Mike Galbraith <umgwanakikbuit@gmail.com>
---
 kernel/futex.c |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -469,7 +469,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 {
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
-	struct page *page;
+	struct page *page, *pinned = NULL;
 	struct address_space *mapping;
 	int err, ro = 0;
 
@@ -530,8 +530,18 @@ get_futex_key(u32 __user *uaddr, int fsh
 	 * considered here and page lock forces unnecessarily serialization
 	 * From this point on, mapping will be re-verified if necessary and
 	 * page lock will be acquired only if it is unavoidable
-	 */
+	 *
+	 * If we're dealing with a compound page, save our reference to the
+	 * page we locked in memory above, and take a new reference on the
+	 * page head, dropping the previously pinned page reference on retry.
+	 */
+	if (unlikely(pinned && page != pinned))
+		put_page(pinned);
+	pinned = page;
 	page = compound_head(page);
+	if (unlikely(pinned != page))
+		get_page(page);
+
 	mapping = READ_ONCE(page->mapping);
 
 	/*
@@ -560,12 +570,14 @@ get_futex_key(u32 __user *uaddr, int fsh
 		lock_page(page);
 		shmem_swizzled = PageSwapCache(page) || page->mapping;
 		unlock_page(page);
-		put_page(page);
 
-		if (shmem_swizzled)
+		if (shmem_swizzled) {
+			put_page(page);
 			goto again;
+		}
 
-		return -EFAULT;
+		err = -EFAULT;
+		goto out;
 	}
 
 	/*
@@ -654,12 +666,14 @@ get_futex_key(u32 __user *uaddr, int fsh
 
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
 		key->shared.inode = inode;
-		key->shared.pgoff = basepage_index(page);
+		key->shared.pgoff = basepage_index(pinned);
 		rcu_read_unlock();
 	}
 
 out:
 	put_page(page);
+	if (unlikely(pinned != page))
+		put_page(pinned);
 	return err;
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] futex: Fix v4.6+ ltp futex_wait04 regression
  2016-06-06  9:40 [patch] futex: Fix v4.6+ ltp futex_wait04 regression Mike Galbraith
@ 2016-06-06 13:07 ` Peter Zijlstra
  2016-06-06 13:34   ` Mike Galbraith
  2016-06-07 10:50 ` Mel Gorman
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-06-06 13:07 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Mel Gorman, lkml, Thomas Gleixner, Sebastian Andrzej Siewior, kirill

On Mon, Jun 06, 2016 at 11:40:41AM +0200, Mike Galbraith wrote:
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -469,7 +469,7 @@ get_futex_key(u32 __user *uaddr, int fsh
>  {
>  	unsigned long address = (unsigned long)uaddr;
>  	struct mm_struct *mm = current->mm;
> -	struct page *page;
> +	struct page *page, *pinned = NULL;
>  	struct address_space *mapping;
>  	int err, ro = 0;
>  
> @@ -530,8 +530,18 @@ get_futex_key(u32 __user *uaddr, int fsh
>  	 * considered here and page lock forces unnecessarily serialization
>  	 * From this point on, mapping will be re-verified if necessary and
>  	 * page lock will be acquired only if it is unavoidable
> -	 */
> +	 *
> +	 * If we're dealing with a compound page, save our reference to the
> +	 * page we locked in memory above, and take a new reference on the
> +	 * page head, dropping the previously pinned page reference on retry.
> +	 */
> +	if (unlikely(pinned && page != pinned))
> +		put_page(pinned);
> +	pinned = page;
>  	page = compound_head(page);
> +	if (unlikely(pinned != page))
> +		get_page(page);

Not needed, since {get,put}_page() explicitly use compound_head() to
track the reference count on.

> +
>  	mapping = READ_ONCE(page->mapping);
>  
>  	/*
> @@ -560,12 +570,14 @@ get_futex_key(u32 __user *uaddr, int fsh
>  		lock_page(page);
>  		shmem_swizzled = PageSwapCache(page) || page->mapping;
>  		unlock_page(page);
> -		put_page(page);
>  
> -		if (shmem_swizzled)
> +		if (shmem_swizzled) {
> +			put_page(page);
>  			goto again;
> +		}
>  
> -		return -EFAULT;
> +		err = -EFAULT;
> +		goto out;

This also simplifies away

>  	}
>  
>  	/*
> @@ -654,12 +666,14 @@ get_futex_key(u32 __user *uaddr, int fsh
>  
>  		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
>  		key->shared.inode = inode;
> -		key->shared.pgoff = basepage_index(page);
> +		key->shared.pgoff = basepage_index(pinned);

But this seems to be the actual fix; because while I think the compound
page cannot change from under us, it can change between futex
invocations, and then having a key change is _bad_.

But someone other than me would have to concurr, its been too long since
I looked at all this compound muck.

>  		rcu_read_unlock();
>  	}
>  
>  out:
>  	put_page(page);
> +	if (unlikely(pinned != page))
> +		put_page(pinned);

and that too can go.

>  	return err;
>  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] futex: Fix v4.6+ ltp futex_wait04 regression
  2016-06-06 13:07 ` Peter Zijlstra
@ 2016-06-06 13:34   ` Mike Galbraith
  2016-06-06 13:46     ` Mike Galbraith
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Galbraith @ 2016-06-06 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mel Gorman, lkml, Thomas Gleixner, Sebastian Andrzej Siewior, kirill

On Mon, 2016-06-06 at 15:07 +0200, Peter Zijlstra wrote:

> > @@ -654,12 +666,14 @@ get_futex_key(u32 __user *uaddr, int fsh
> >  
> >  > > 	> > 	> > key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> >  > > 	> > 	> > key->shared.inode = inode;
> > -> > 	> > 	> > key->shared.pgoff = basepage_index(page);
> > +> > 	> > 	> > key->shared.pgoff = basepage_index(pinned);
> 
> But this seems to be the actual fix;

Yup, the three 'pinned' lines alone work just fine.

	-Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] futex: Fix v4.6+ ltp futex_wait04 regression
  2016-06-06 13:34   ` Mike Galbraith
@ 2016-06-06 13:46     ` Mike Galbraith
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Galbraith @ 2016-06-06 13:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mel Gorman, lkml, Thomas Gleixner, Sebastian Andrzej Siewior, kirill

On Mon, 2016-06-06 at 15:34 +0200, Mike Galbraith wrote:
> On Mon, 2016-06-06 at 15:07 +0200, Peter Zijlstra wrote:
> 
> > > @@ -654,12 +666,14 @@ get_futex_key(u32 __user *uaddr, int fsh
> > >  
> > >  > > 	> > 	> > key->both.offset |= FUT_OFF_INODE;
> > > /* inode-based key */
> > >  > > 	> > 	> > key->shared.inode = inode;
> > > -> > 	> > 	> > key->shared.pgoff =
> > > basepage_index(page);
> > > +> > 	> > 	> > key->shared.pgoff =
> > > basepage_index(pinned);
> > 
> > But this seems to be the actual fix;
> 
> Yup, the three 'pinned' lines alone work just fine.

I was going by what the code used to look like.

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
...
#else
        page_head = compound_head(page);
        if (page != page_head) {
                get_page(page_head);
                put_page(page);
        }
#endif

That all went away, but the pinned page that we used in the end was
stable until Mel's optimization landed, leaving me thinking I needed to
do both.  Hohum, close but no banana.

	-Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] futex: Fix v4.6+ ltp futex_wait04 regression
  2016-06-06  9:40 [patch] futex: Fix v4.6+ ltp futex_wait04 regression Mike Galbraith
  2016-06-06 13:07 ` Peter Zijlstra
@ 2016-06-07 10:50 ` Mel Gorman
  2016-06-07 11:20   ` Mel Gorman
  1 sibling, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2016-06-07 10:50 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml, Thomas Gleixner, Sebastian Andrzej Siewior, dbueso

On Mon, Jun 06, 2016 at 11:40:41AM +0200, Mike Galbraith wrote:
> Hi Mel (who is out of the office today),
> 

Adding Davidlohr who did a lot of the work debugging my initial
prototype.

> I initially reported this on the rt-users list, thinking at the time
> that it was only showing up in -rt kernels, but that turned out to not
> be the case, it just requires an enterprise config for some reason mm
> folks will likely recognize instantly.  I just happen to use same when
> building devel -rt trees to at least get some build warning coverage.
> 

Can you post the exact config? I used an enterprise config and was unable
to reproduce the problem on 4.6. I'll be trying 4.7-rc1 after finishing
this mail even though the optimisation was introduced before 4.6.

> Below is a stab at fixing the thing up while you're off doing whatever
> an Irishman does on a national holiday (hm;).  If it's ok as is, fine,
> I saved you a couple minutes.  If not, oh well, consider it diagnosis.
> 
> 65d8fc77 futex: Remove requirement for lock_page() in get_futex_key()
> introduced a regression of futex_wait04 when an enterprise size config is
> used.  Per trace_printk(), when we assign page = compound_head(page), we're
> subsequently using a different page than the one we locked into memory,
> thus mucking up references.
> 
> Fixes: 65d8fc77 futex: Remove requirement for lock_page() in get_futex_key()
> Signed-off-by: Mike Galbraith <umgwanakikbuit@gmail.com>

As I cannot reproduce the problem yet, I cannot see the impact of the
fix but I'm scratching my head trying to figure out why this would work
and be explained by 65d8fc77. The primary path you alter covers the case
where a filesystem is backing the page containing the futex. If this was a
transparent hugepage anonymous mapping, it would be handled by the PageAnon
path so I'm ruling that out. The only case where a filesystem is using a
compound page is hugetlbfs and indeed the test case uses MAP_HUGETLB. The
compound state of such pages are stable between futex invocations as they
cannot split and the basepage_index should be returning the key for the
head page for hugetlbfs pages.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] futex: Fix v4.6+ ltp futex_wait04 regression
  2016-06-07 10:50 ` Mel Gorman
@ 2016-06-07 11:20   ` Mel Gorman
  0 siblings, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2016-06-07 11:20 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml, Thomas Gleixner, Sebastian Andrzej Siewior, dbueso

On Tue, Jun 07, 2016 at 11:50:56AM +0100, Mel Gorman wrote:
> On Mon, Jun 06, 2016 at 11:40:41AM +0200, Mike Galbraith wrote:
> > Hi Mel (who is out of the office today),
> > 
> 
> Adding Davidlohr who did a lot of the work debugging my initial
> prototype.
> 
> > I initially reported this on the rt-users list, thinking at the time
> > that it was only showing up in -rt kernels, but that turned out to not
> > be the case, it just requires an enterprise config for some reason mm
> > folks will likely recognize instantly.  I just happen to use same when
> > building devel -rt trees to at least get some build warning coverage.
> > 
> 
> Can you post the exact config? I used an enterprise config and was unable
> to reproduce the problem on 4.6. I'll be trying 4.7-rc1 after finishing
> this mail even though the optimisation was introduced before 4.6.
> 

Never mind, I reproduced the problem and reckon I know where the problem
is. I'll keep poking at it.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-07 11:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-06  9:40 [patch] futex: Fix v4.6+ ltp futex_wait04 regression Mike Galbraith
2016-06-06 13:07 ` Peter Zijlstra
2016-06-06 13:34   ` Mike Galbraith
2016-06-06 13:46     ` Mike Galbraith
2016-06-07 10:50 ` Mel Gorman
2016-06-07 11:20   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).