linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1
@ 2022-01-31 20:35 Will McVicker
  2022-01-31 20:49 ` John Hubbard
  0 siblings, 1 reply; 7+ messages in thread
From: Will McVicker @ 2022-01-31 20:35 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard
  Cc: kernel-team, Will McVicker, Minchan Kim, linux-mm, linux-kernel

This fixes commit 54d516b1d62f ("mm/gup: small refactoring: simplify
try_grab_page()") which refactors try_grab_page() to call
try_grab_compound_head() with refs=1. The refactor commit is causing
pin_user_pages() to return -ENOMEM when we try to pin one user page that
is migratable and not in the movable zone. Previously, try_grab_page()
didn't check if the page was pinnable for FOLL_PIN. To match the same
functionality, this fix adds the check `refs > 1 &&` to skip the call to
is_pinnable_page().

This issue is reproducible with the Pixel 6 on the 5.15 LTS kernel. Here
is the call stack to reproduce the -ENOMEM error:

Call trace:
        : dump_backtrace.cfi_jt+0x0/0x8
        : show_stack+0x1c/0x2c
        : dump_stack_lvl+0x68/0x98
        : try_grab_compound_head+0x298/0x3c4
        : follow_page_pte+0x1dc/0x330
        : follow_page_mask+0x174/0x340
        : __get_user_pages+0x158/0x34c
        : __gup_longterm_locked+0xfc/0x194
        : __gup_longterm_unlocked+0x80/0xf4
        : internal_get_user_pages_fast+0xf0/0x15c
        : pin_user_pages_fast+0x24/0x40
        : edgetpu_device_group_map+0x130/0x584 [abrolhos]
        : edgetpu_ioctl_map_buffer+0x110/0x3b4 [abrolhos]
        : edgetpu_ioctl+0x238/0x408 [abrolhos]
        : edgetpu_fs_ioctl+0x14/0x24 [abrolhos]

Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Minchan Kim <minchan@google.com>
Signed-off-by: Will McVicker <willmcvicker@google.com>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index f0af462ac1e2..0509c49c46a3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -135,7 +135,7 @@ struct page *try_grab_compound_head(struct page *page,
 		 * right zone, so fail and let the caller fall back to the slow
 		 * path.
 		 */
-		if (unlikely((flags & FOLL_LONGTERM) &&
+		if (refs > 1 && unlikely((flags & FOLL_LONGTERM) &&
 			     !is_pinnable_page(page)))
 			return NULL;
 
-- 
2.35.0.rc2.247.g8bbb082509-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1
  2022-01-31 20:35 [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1 Will McVicker
@ 2022-01-31 20:49 ` John Hubbard
  2022-01-31 21:37   ` Will McVicker
  2022-02-01  7:28   ` Minchan Kim
  0 siblings, 2 replies; 7+ messages in thread
From: John Hubbard @ 2022-01-31 20:49 UTC (permalink / raw)
  To: Will McVicker, Andrew Morton
  Cc: kernel-team, Minchan Kim, linux-mm, linux-kernel

On 1/31/22 12:35, Will McVicker wrote:
> This fixes commit 54d516b1d62f ("mm/gup: small refactoring: simplify
> try_grab_page()") which refactors try_grab_page() to call
> try_grab_compound_head() with refs=1. The refactor commit is causing
> pin_user_pages() to return -ENOMEM when we try to pin one user page that
> is migratable and not in the movable zone. Previously, try_grab_page()
> didn't check if the page was pinnable for FOLL_PIN. To match the same
> functionality, this fix adds the check `refs > 1 &&` to skip the call to
> is_pinnable_page().
>

That's a clear write-up of what you're seeing, what caused it, and how
you'd like to correct it. The previous code had a loophole, and you want
to keep that loophole. More below...

> This issue is reproducible with the Pixel 6 on the 5.15 LTS kernel. Here
> is the call stack to reproduce the -ENOMEM error:
...
> Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Minchan Kim <minchan@google.com>
> Signed-off-by: Will McVicker <willmcvicker@google.com>
> ---
>   mm/gup.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index f0af462ac1e2..0509c49c46a3 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -135,7 +135,7 @@ struct page *try_grab_compound_head(struct page *page,
>   		 * right zone, so fail and let the caller fall back to the slow
>   		 * path.
>   		 */
> -		if (unlikely((flags & FOLL_LONGTERM) &&
> +		if (refs > 1 && unlikely((flags & FOLL_LONGTERM) &&
>   			     !is_pinnable_page(page)))
>   			return NULL;
>   

...but are you really sure that this is the best way to "fix" the
problem? This trades correctness for "bug-for-bug compatibility" with
the previous code. It says, "it's OK to violate the pinnable and
longterm checks, as long as you do it one page at a time, rather than in
larger chunks.

Wouldn't it be better to try to fix up the calling code so that it's
not in violation of these zone rules?


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1
  2022-01-31 20:49 ` John Hubbard
@ 2022-01-31 21:37   ` Will McVicker
  2022-02-01  7:28   ` Minchan Kim
  1 sibling, 0 replies; 7+ messages in thread
From: Will McVicker @ 2022-01-31 21:37 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Cc: Android Kernel, Minchan Kim, linux-mm,
	Linux Kernel Mailing List

On Mon, Jan 31, 2022 at 12:49 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 1/31/22 12:35, Will McVicker wrote:
> > This fixes commit 54d516b1d62f ("mm/gup: small refactoring: simplify
> > try_grab_page()") which refactors try_grab_page() to call
> > try_grab_compound_head() with refs=1. The refactor commit is causing
> > pin_user_pages() to return -ENOMEM when we try to pin one user page that
> > is migratable and not in the movable zone. Previously, try_grab_page()
> > didn't check if the page was pinnable for FOLL_PIN. To match the same
> > functionality, this fix adds the check `refs > 1 &&` to skip the call to
> > is_pinnable_page().
> >
>
> That's a clear write-up of what you're seeing, what caused it, and how
> you'd like to correct it. The previous code had a loophole, and you want
> to keep that loophole. More below...
>
> > This issue is reproducible with the Pixel 6 on the 5.15 LTS kernel. Here
> > is the call stack to reproduce the -ENOMEM error:
> ...
> > Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Minchan Kim <minchan@google.com>
> > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > ---
> >   mm/gup.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index f0af462ac1e2..0509c49c46a3 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -135,7 +135,7 @@ struct page *try_grab_compound_head(struct page *page,
> >                * right zone, so fail and let the caller fall back to the slow
> >                * path.
> >                */
> > -             if (unlikely((flags & FOLL_LONGTERM) &&
> > +             if (refs > 1 && unlikely((flags & FOLL_LONGTERM) &&
> >                            !is_pinnable_page(page)))
> >                       return NULL;
> >
>
> ...but are you really sure that this is the best way to "fix" the
> problem? This trades correctness for "bug-for-bug compatibility" with
> the previous code. It says, "it's OK to violate the pinnable and
> longterm checks, as long as you do it one page at a time, rather than in
> larger chunks.
>
> Wouldn't it be better to try to fix up the calling code so that it's
> not in violation of these zone rules?
>
>
> thanks,
> --
> John Hubbard
> NVIDIA

Hi John,

Thanks for the prompt response! I'm not super familiar with what
PIN+LONGTERM conditions require, but if this was previously a bug,
then I definitely don't want to re-introduce it. Since you're
confirming that, let me sync-up with the driver owner to see how I can
fix this on the side.

Thanks!
Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1
  2022-01-31 20:49 ` John Hubbard
  2022-01-31 21:37   ` Will McVicker
@ 2022-02-01  7:28   ` Minchan Kim
  2022-02-01  7:35     ` John Hubbard
  1 sibling, 1 reply; 7+ messages in thread
From: Minchan Kim @ 2022-02-01  7:28 UTC (permalink / raw)
  To: John Hubbard
  Cc: Will McVicker, Andrew Morton, kernel-team, linux-mm, linux-kernel

Hi John,

On Mon, Jan 31, 2022 at 12:49:35PM -0800, John Hubbard wrote:
> On 1/31/22 12:35, Will McVicker wrote:
> > This fixes commit 54d516b1d62f ("mm/gup: small refactoring: simplify
> > try_grab_page()") which refactors try_grab_page() to call
> > try_grab_compound_head() with refs=1. The refactor commit is causing
> > pin_user_pages() to return -ENOMEM when we try to pin one user page that
> > is migratable and not in the movable zone. Previously, try_grab_page()
> > didn't check if the page was pinnable for FOLL_PIN. To match the same
> > functionality, this fix adds the check `refs > 1 &&` to skip the call to
> > is_pinnable_page().
> > 
> 
> That's a clear write-up of what you're seeing, what caused it, and how
> you'd like to correct it. The previous code had a loophole, and you want
> to keep that loophole. More below...
> 
> > This issue is reproducible with the Pixel 6 on the 5.15 LTS kernel. Here
> > is the call stack to reproduce the -ENOMEM error:
> ...
> > Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Minchan Kim <minchan@google.com>
> > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > ---
> >   mm/gup.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index f0af462ac1e2..0509c49c46a3 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -135,7 +135,7 @@ struct page *try_grab_compound_head(struct page *page,
> >   		 * right zone, so fail and let the caller fall back to the slow
> >   		 * path.
> >   		 */
> > -		if (unlikely((flags & FOLL_LONGTERM) &&
> > +		if (refs > 1 && unlikely((flags & FOLL_LONGTERM) &&
> >   			     !is_pinnable_page(page)))
> >   			return NULL;
> 
> ...but are you really sure that this is the best way to "fix" the
> problem? This trades correctness for "bug-for-bug compatibility" with
> the previous code. It says, "it's OK to violate the pinnable and
> longterm checks, as long as you do it one page at a time, rather than in
> larger chunks.
> 
> Wouldn't it be better to try to fix up the calling code so that it's
> not in violation of these zone rules?

I think the problem is before pin_user_pages can work with CMA pages
in the fallback path but now it doesn't work with CMA page. Driver
couldn't know whether it will work with CMA page or not in advance.

pin_user_pages
  __get_user_pages_locked
    follow_page_mask
      follow_page_pte    
        try_grab_page
          !is_pinnable_page(page) 
            return NULL;
        return ERR_PTR(-ENOMEM);
     return -ENOMEM without faultin_page

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1
  2022-02-01  7:28   ` Minchan Kim
@ 2022-02-01  7:35     ` John Hubbard
  2022-02-01  7:43       ` John Hubbard
  0 siblings, 1 reply; 7+ messages in thread
From: John Hubbard @ 2022-02-01  7:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Will McVicker, Andrew Morton, kernel-team, linux-mm, linux-kernel

On 1/31/22 23:28, Minchan Kim wrote:
...
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -135,7 +135,7 @@ struct page *try_grab_compound_head(struct page *page,
>>>    		 * right zone, so fail and let the caller fall back to the slow
>>>    		 * path.
>>>    		 */
>>> -		if (unlikely((flags & FOLL_LONGTERM) &&
>>> +		if (refs > 1 && unlikely((flags & FOLL_LONGTERM) &&
>>>    			     !is_pinnable_page(page)))
>>>    			return NULL;
>>
>> ...but are you really sure that this is the best way to "fix" the
>> problem? This trades correctness for "bug-for-bug compatibility" with
>> the previous code. It says, "it's OK to violate the pinnable and
>> longterm checks, as long as you do it one page at a time, rather than in
>> larger chunks.
>>
>> Wouldn't it be better to try to fix up the calling code so that it's
>> not in violation of these zone rules?
> 
> I think the problem is before pin_user_pages can work with CMA pages
> in the fallback path but now it doesn't work with CMA page. Driver

Actually, it "worked" only if the caller did it one page at a time.
(See how the above attempted fix restores the "make it work for
refs == 1.)

> couldn't know whether it will work with CMA page or not in advance.
> 
> pin_user_pages
>    __get_user_pages_locked
>      follow_page_mask
>        follow_page_pte
>          try_grab_page
>            !is_pinnable_page(page)
>              return NULL;
>          return ERR_PTR(-ENOMEM);
>       return -ENOMEM without faultin_page

Yes, that's all clear.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1
  2022-02-01  7:35     ` John Hubbard
@ 2022-02-01  7:43       ` John Hubbard
  2022-02-01  8:33         ` John Hubbard
  0 siblings, 1 reply; 7+ messages in thread
From: John Hubbard @ 2022-02-01  7:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Will McVicker, Andrew Morton, kernel-team, linux-mm, linux-kernel

On 1/31/22 23:35, John Hubbard wrote:
> On 1/31/22 23:28, Minchan Kim wrote:
> ...
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -135,7 +135,7 @@ struct page *try_grab_compound_head(struct page *page,
>>>>             * right zone, so fail and let the caller fall back to the slow
>>>>             * path.
>>>>             */
>>>> -        if (unlikely((flags & FOLL_LONGTERM) &&
>>>> +        if (refs > 1 && unlikely((flags & FOLL_LONGTERM) &&
>>>>                     !is_pinnable_page(page)))
>>>>                return NULL;
>>>
>>> ...but are you really sure that this is the best way to "fix" the
>>> problem? This trades correctness for "bug-for-bug compatibility" with
>>> the previous code. It says, "it's OK to violate the pinnable and
>>> longterm checks, as long as you do it one page at a time, rather than in
>>> larger chunks.
>>>
>>> Wouldn't it be better to try to fix up the calling code so that it's
>>> not in violation of these zone rules?
>>
>> I think the problem is before pin_user_pages can work with CMA pages
>> in the fallback path but now it doesn't work with CMA page. Driver
> 
> Actually, it "worked" only if the caller did it one page at a time.
> (See how the above attempted fix restores the "make it work for
> refs == 1.)
> 
>> couldn't know whether it will work with CMA page or not in advance.
>>
>> pin_user_pages
>>    __get_user_pages_locked
>>      follow_page_mask
>>        follow_page_pte
>>          try_grab_page
>>            !is_pinnable_page(page)
>>              return NULL;
>>          return ERR_PTR(-ENOMEM);
>>       return -ENOMEM without faultin_page
> 
> Yes, that's all clear.
> ...oh, but I guess you're pointing out that it's always going to be
page-at-a-time this deep in the pin_user_pages() call path. Which is true.

I hadn't worked through how to fix this yet, my initial reaction was
that allowing single refs to go through, while prohibiting multiple refs,
was clearly *not* the way to go.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1
  2022-02-01  7:43       ` John Hubbard
@ 2022-02-01  8:33         ` John Hubbard
  0 siblings, 0 replies; 7+ messages in thread
From: John Hubbard @ 2022-02-01  8:33 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Will McVicker, Andrew Morton, kernel-team, linux-mm, linux-kernel

On 1/31/22 23:43, John Hubbard wrote:
>>> pin_user_pages
>>>    __get_user_pages_locked
>>>      follow_page_mask
>>>        follow_page_pte
>>>          try_grab_page
>>>            !is_pinnable_page(page)
>>>              return NULL;
>>>          return ERR_PTR(-ENOMEM);
>>>       return -ENOMEM without faultin_page
>>
>> Yes, that's all clear.
>> ...oh, but I guess you're pointing out that it's always going to be
> page-at-a-time this deep in the pin_user_pages() call path. Which is true.
> 
> I hadn't worked through how to fix this yet, my initial reaction was
> that allowing single refs to go through, while prohibiting multiple refs,
> was clearly *not* the way to go.
> 

OK, so after looking at this some more, I think that the real problem
with commit 54d516b1d62f ("mm/gup: small refactoring: simplify
try_grab_page()") is that it funnels fast and slow gup through the same
routine (try_grab_compound_head()). And the problem with *that*, is that
try_grab_compound_head() is coded up with that assumption that it is
being called *only* by fast-gup:

		/*
		 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
		 * right zone, so fail and let the caller fall back to the slow
		 * path.
		 */
		if (unlikely((flags & FOLL_LONGTERM) &&
			     !is_pinnable_page(page)))
			return NULL;

Now, to fix that, I'd really rather not conflate "refs == 1" with "on
the slow path", because that's a conceptual mismatch.

So, other ideas:

a) Remove the above check, and fail fast gup at a different point for
the (FOLL_LONGTERM && !is_pinnable) case. Haven't looked closely at this
yet.

b) Pass in FOLL_FAST_ONLY from all call sites *except* try_grag_page().
Skip the above check in slow-gup (!FOLL_FAST_ONLY) cases.

This makes the code ugly, though, and I'm just listing it here for
completeness.

c) Just call the entire refactoring idea a mistake, and roll it back
either entirely, or enough to keep fast and slow gup separate.

Unless a better idea shows up, (c) is probably the way to go, I think...


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-01  8:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-31 20:35 [PATCH v1 1/1] mm/gup: skip pinnable check for refs==1 Will McVicker
2022-01-31 20:49 ` John Hubbard
2022-01-31 21:37   ` Will McVicker
2022-02-01  7:28   ` Minchan Kim
2022-02-01  7:35     ` John Hubbard
2022-02-01  7:43       ` John Hubbard
2022-02-01  8:33         ` John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).