All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages
@ 2019-06-04 16:48 ira.weiny
  2019-06-04 19:48 ` John Hubbard
  0 siblings, 1 reply; 6+ messages in thread
From: ira.weiny @ 2019-06-04 16:48 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: linux-mm, linux-kernel, John Hubbard, Jérôme Glisse,
	Dan Williams, Ira Weiny

From: Ira Weiny <ira.weiny@intel.com>

release_pages() is an optimized version of a loop around put_page().
Unfortunately for devmap pages the logic is not entirely correct in
release_pages().  This is because device pages can be more than type
MEMORY_DEVICE_PUBLIC.  There are in fact 4 types, private, public, FS
DAX, and PCI P2PDMA.  Some of these have specific needs to "put" the
page while others do not.

This logic to handle any special needs is contained in
put_devmap_managed_page().  Therefore all devmap pages should be
processed by this function where we can contain the correct logic for a
page put.

Handle all device type pages within release_pages() by calling
put_devmap_managed_page() on all devmap pages.  If
put_devmap_managed_page() returns true the page has been put and we
continue with the next page.  A false return of
put_devmap_managed_page() means the page did not require special
processing and should fall to "normal" processing.

This was found via code inspection while determining if release_pages()
and the new put_user_pages() could be interchangeable.[1]

[1] https://lore.kernel.org/lkml/20190523172852.GA27175@iweiny-DESK2.sc.intel.com/

Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V2:
	Update changelog for more clarity as requested by Michal
	Update comment WRT "failing" of put_devmap_managed_page()

Changes from V1:
	Add comment clarifying that put_devmap_managed_page() can still
	fail.
	Add Reviewed-by tags.

 mm/swap.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 7ede3eddc12a..6d153ce4cb8c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
 		if (is_huge_zero_page(page))
 			continue;
 
-		/* Device public page can not be huge page */
-		if (is_device_public_page(page)) {
+		if (is_zone_device_page(page)) {
 			if (locked_pgdat) {
 				spin_unlock_irqrestore(&locked_pgdat->lru_lock,
 						       flags);
 				locked_pgdat = NULL;
 			}
-			put_devmap_managed_page(page);
-			continue;
+			/*
+			 * Not all zone-device-pages require special
+			 * processing.  Those pages return 'false' from
+			 * put_devmap_managed_page() expecting a call to
+			 * put_page_testzero()
+			 */
+			if (put_devmap_managed_page(page))
+				continue;
 		}
 
 		page = compound_head(page);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages
  2019-06-04 16:48 [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages ira.weiny
@ 2019-06-04 19:48 ` John Hubbard
  2019-06-04 20:11   ` Dan Williams
  0 siblings, 1 reply; 6+ messages in thread
From: John Hubbard @ 2019-06-04 19:48 UTC (permalink / raw)
  To: ira.weiny, Andrew Morton, Michal Hocko
  Cc: linux-mm, linux-kernel, Jérôme Glisse, Dan Williams

On 6/4/19 9:48 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> release_pages() is an optimized version of a loop around put_page().
> Unfortunately for devmap pages the logic is not entirely correct in
> release_pages().  This is because device pages can be more than type
> MEMORY_DEVICE_PUBLIC.  There are in fact 4 types, private, public, FS
> DAX, and PCI P2PDMA.  Some of these have specific needs to "put" the
> page while others do not.
> 
> This logic to handle any special needs is contained in
> put_devmap_managed_page().  Therefore all devmap pages should be
> processed by this function where we can contain the correct logic for a
> page put.
> 
> Handle all device type pages within release_pages() by calling
> put_devmap_managed_page() on all devmap pages.  If
> put_devmap_managed_page() returns true the page has been put and we
> continue with the next page.  A false return of
> put_devmap_managed_page() means the page did not require special
> processing and should fall to "normal" processing.
> 
> This was found via code inspection while determining if release_pages()
> and the new put_user_pages() could be interchangeable.[1]
> 
> [1] https://lore.kernel.org/lkml/20190523172852.GA27175@iweiny-DESK2.sc.intel.com/
> 
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V2:
> 	Update changelog for more clarity as requested by Michal
> 	Update comment WRT "failing" of put_devmap_managed_page()
> 
> Changes from V1:
> 	Add comment clarifying that put_devmap_managed_page() can still
> 	fail.
> 	Add Reviewed-by tags.
> 
>  mm/swap.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 7ede3eddc12a..6d153ce4cb8c 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
>  		if (is_huge_zero_page(page))
>  			continue;
>  
> -		/* Device public page can not be huge page */
> -		if (is_device_public_page(page)) {
> +		if (is_zone_device_page(page)) {
>  			if (locked_pgdat) {
>  				spin_unlock_irqrestore(&locked_pgdat->lru_lock,
>  						       flags);
>  				locked_pgdat = NULL;
>  			}
> -			put_devmap_managed_page(page);
> -			continue;
> +			/*
> +			 * Not all zone-device-pages require special
> +			 * processing.  Those pages return 'false' from
> +			 * put_devmap_managed_page() expecting a call to
> +			 * put_page_testzero()
> +			 */

Just a documentation tweak: how about: 

			/*
			 * ZONE_DEVICE pages that return 'false' from 
			 * put_devmap_managed_page() do not require special 
			 * processing, and instead, expect a call to 
			 * put_page_testzero().
			 */


thanks,
-- 
John Hubbard
NVIDIA

> +			if (put_devmap_managed_page(page))
> +				continue;
>  		}
>  
>  		page = compound_head(page);
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages
  2019-06-04 19:48 ` John Hubbard
@ 2019-06-04 20:11   ` Dan Williams
  2019-06-04 20:17     ` John Hubbard
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2019-06-04 20:11 UTC (permalink / raw)
  To: John Hubbard
  Cc: Weiny, Ira, Andrew Morton, Michal Hocko, Linux MM,
	Linux Kernel Mailing List, Jérôme Glisse

On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 6/4/19 9:48 AM, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > release_pages() is an optimized version of a loop around put_page().
> > Unfortunately for devmap pages the logic is not entirely correct in
> > release_pages().  This is because device pages can be more than type
> > MEMORY_DEVICE_PUBLIC.  There are in fact 4 types, private, public, FS
> > DAX, and PCI P2PDMA.  Some of these have specific needs to "put" the
> > page while others do not.
> >
> > This logic to handle any special needs is contained in
> > put_devmap_managed_page().  Therefore all devmap pages should be
> > processed by this function where we can contain the correct logic for a
> > page put.
> >
> > Handle all device type pages within release_pages() by calling
> > put_devmap_managed_page() on all devmap pages.  If
> > put_devmap_managed_page() returns true the page has been put and we
> > continue with the next page.  A false return of
> > put_devmap_managed_page() means the page did not require special
> > processing and should fall to "normal" processing.
> >
> > This was found via code inspection while determining if release_pages()
> > and the new put_user_pages() could be interchangeable.[1]
> >
> > [1] https://lore.kernel.org/lkml/20190523172852.GA27175@iweiny-DESK2.sc.intel.com/
> >
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >
> > ---
> > Changes from V2:
> >       Update changelog for more clarity as requested by Michal
> >       Update comment WRT "failing" of put_devmap_managed_page()
> >
> > Changes from V1:
> >       Add comment clarifying that put_devmap_managed_page() can still
> >       fail.
> >       Add Reviewed-by tags.
> >
> >  mm/swap.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 7ede3eddc12a..6d153ce4cb8c 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> >               if (is_huge_zero_page(page))
> >                       continue;
> >
> > -             /* Device public page can not be huge page */
> > -             if (is_device_public_page(page)) {
> > +             if (is_zone_device_page(page)) {
> >                       if (locked_pgdat) {
> >                               spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> >                                                      flags);
> >                               locked_pgdat = NULL;
> >                       }
> > -                     put_devmap_managed_page(page);
> > -                     continue;
> > +                     /*
> > +                      * Not all zone-device-pages require special
> > +                      * processing.  Those pages return 'false' from
> > +                      * put_devmap_managed_page() expecting a call to
> > +                      * put_page_testzero()
> > +                      */
>
> Just a documentation tweak: how about:
>
>                         /*
>                          * ZONE_DEVICE pages that return 'false' from
>                          * put_devmap_managed_page() do not require special
>                          * processing, and instead, expect a call to
>                          * put_page_testzero().
>                          */

Looks better to me, but maybe just go ahead and list those
expectations explicitly. Something like:

                        /*
                         * put_devmap_managed_page() only handles
                         * ZONE_DEVICE (struct dev_pagemap managed)
                         * pages when the hosting dev_pagemap has the
                         * ->free() or ->fault() callback handlers
                         *  implemented as indicated by
                         *  dev_pagemap.type. Otherwise the expectation
                         *  is to fall back to a plain decrement /
                         *  put_page_testzero().
                         */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages
  2019-06-04 20:11   ` Dan Williams
@ 2019-06-04 20:17     ` John Hubbard
  2019-06-04 20:32       ` Ira Weiny
  2019-06-04 21:42       ` Dan Williams
  0 siblings, 2 replies; 6+ messages in thread
From: John Hubbard @ 2019-06-04 20:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Weiny, Ira, Andrew Morton, Michal Hocko, Linux MM,
	Linux Kernel Mailing List, Jérôme Glisse

On 6/4/19 1:11 PM, Dan Williams wrote:
> On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 6/4/19 9:48 AM, ira.weiny@intel.com wrote:
>>> From: Ira Weiny <ira.weiny@intel.com>
>>>
...
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 7ede3eddc12a..6d153ce4cb8c 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
>>>               if (is_huge_zero_page(page))
>>>                       continue;
>>>
>>> -             /* Device public page can not be huge page */
>>> -             if (is_device_public_page(page)) {
>>> +             if (is_zone_device_page(page)) {
>>>                       if (locked_pgdat) {
>>>                               spin_unlock_irqrestore(&locked_pgdat->lru_lock,
>>>                                                      flags);
>>>                               locked_pgdat = NULL;
>>>                       }
>>> -                     put_devmap_managed_page(page);
>>> -                     continue;
>>> +                     /*
>>> +                      * Not all zone-device-pages require special
>>> +                      * processing.  Those pages return 'false' from
>>> +                      * put_devmap_managed_page() expecting a call to
>>> +                      * put_page_testzero()
>>> +                      */
>>
>> Just a documentation tweak: how about:
>>
>>                         /*
>>                          * ZONE_DEVICE pages that return 'false' from
>>                          * put_devmap_managed_page() do not require special
>>                          * processing, and instead, expect a call to
>>                          * put_page_testzero().
>>                          */
> 
> Looks better to me, but maybe just go ahead and list those
> expectations explicitly. Something like:
> 
>                         /*
>                          * put_devmap_managed_page() only handles
>                          * ZONE_DEVICE (struct dev_pagemap managed)
>                          * pages when the hosting dev_pagemap has the
>                          * ->free() or ->fault() callback handlers
>                          *  implemented as indicated by
>                          *  dev_pagemap.type. Otherwise the expectation
>                          *  is to fall back to a plain decrement /
>                          *  put_page_testzero().
>                          */

I like it--but not here, because it's too much internal detail in a
call site that doesn't use that level of detail. The call site looks
at the return value, only.

Let's instead put that blurb above (or in) the put_devmap_managed_page() 
routine itself. And leave the blurb that I wrote where it is. And then I
think everything will have an appropriate level of detail in the right places.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages
  2019-06-04 20:17     ` John Hubbard
@ 2019-06-04 20:32       ` Ira Weiny
  2019-06-04 21:42       ` Dan Williams
  1 sibling, 0 replies; 6+ messages in thread
From: Ira Weiny @ 2019-06-04 20:32 UTC (permalink / raw)
  To: John Hubbard
  Cc: Dan Williams, Andrew Morton, Michal Hocko, Linux MM,
	Linux Kernel Mailing List, Jérôme Glisse

On Tue, Jun 04, 2019 at 01:17:42PM -0700, John Hubbard wrote:
> On 6/4/19 1:11 PM, Dan Williams wrote:
> > On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <jhubbard@nvidia.com> wrote:
> >>
> >> On 6/4/19 9:48 AM, ira.weiny@intel.com wrote:
> >>> From: Ira Weiny <ira.weiny@intel.com>
> >>>
> ...
> >>> diff --git a/mm/swap.c b/mm/swap.c
> >>> index 7ede3eddc12a..6d153ce4cb8c 100644
> >>> --- a/mm/swap.c
> >>> +++ b/mm/swap.c
> >>> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> >>>               if (is_huge_zero_page(page))
> >>>                       continue;
> >>>
> >>> -             /* Device public page can not be huge page */
> >>> -             if (is_device_public_page(page)) {
> >>> +             if (is_zone_device_page(page)) {
> >>>                       if (locked_pgdat) {
> >>>                               spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> >>>                                                      flags);
> >>>                               locked_pgdat = NULL;
> >>>                       }
> >>> -                     put_devmap_managed_page(page);
> >>> -                     continue;
> >>> +                     /*
> >>> +                      * Not all zone-device-pages require special
> >>> +                      * processing.  Those pages return 'false' from
> >>> +                      * put_devmap_managed_page() expecting a call to
> >>> +                      * put_page_testzero()
> >>> +                      */
> >>
> >> Just a documentation tweak: how about:
> >>
> >>                         /*
> >>                          * ZONE_DEVICE pages that return 'false' from
> >>                          * put_devmap_managed_page() do not require special
> >>                          * processing, and instead, expect a call to
> >>                          * put_page_testzero().
> >>                          */
> > 
> > Looks better to me, but maybe just go ahead and list those
> > expectations explicitly. Something like:
> > 
> >                         /*
> >                          * put_devmap_managed_page() only handles
> >                          * ZONE_DEVICE (struct dev_pagemap managed)
> >                          * pages when the hosting dev_pagemap has the
> >                          * ->free() or ->fault() callback handlers
> >                          *  implemented as indicated by
> >                          *  dev_pagemap.type. Otherwise the expectation
> >                          *  is to fall back to a plain decrement /
> >                          *  put_page_testzero().
> >                          */
> 
> I like it--but not here, because it's too much internal detail in a
> call site that doesn't use that level of detail. The call site looks
> at the return value, only.
> 
> Let's instead put that blurb above (or in) the put_devmap_managed_page() 
> routine itself. And leave the blurb that I wrote where it is. And then I
> think everything will have an appropriate level of detail in the right places.

I agree.  This leaves it open that this handles any special processing which is
required.

FWIW the same call is made in put_page() and has no comment so perhaps we are
getting wrapped around the axle for no reason?

Frankly I questioned myself when I mentioned put_page_testzero() as well.  But
I'm ok with Johns suggestion.  My wording was a bit "rushed".  Sorry about
that.  I wanted to remove the word 'fail' from the comment because I think it
is what caught Michal's eye.

Ira

> 
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages
  2019-06-04 20:17     ` John Hubbard
  2019-06-04 20:32       ` Ira Weiny
@ 2019-06-04 21:42       ` Dan Williams
  1 sibling, 0 replies; 6+ messages in thread
From: Dan Williams @ 2019-06-04 21:42 UTC (permalink / raw)
  To: John Hubbard
  Cc: Weiny, Ira, Andrew Morton, Michal Hocko, Linux MM,
	Linux Kernel Mailing List, Jérôme Glisse

On Tue, Jun 4, 2019 at 1:17 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 6/4/19 1:11 PM, Dan Williams wrote:
> > On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <jhubbard@nvidia.com> wrote:
> >>
> >> On 6/4/19 9:48 AM, ira.weiny@intel.com wrote:
> >>> From: Ira Weiny <ira.weiny@intel.com>
> >>>
> ...
> >>> diff --git a/mm/swap.c b/mm/swap.c
> >>> index 7ede3eddc12a..6d153ce4cb8c 100644
> >>> --- a/mm/swap.c
> >>> +++ b/mm/swap.c
> >>> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> >>>               if (is_huge_zero_page(page))
> >>>                       continue;
> >>>
> >>> -             /* Device public page can not be huge page */
> >>> -             if (is_device_public_page(page)) {
> >>> +             if (is_zone_device_page(page)) {
> >>>                       if (locked_pgdat) {
> >>>                               spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> >>>                                                      flags);
> >>>                               locked_pgdat = NULL;
> >>>                       }
> >>> -                     put_devmap_managed_page(page);
> >>> -                     continue;
> >>> +                     /*
> >>> +                      * Not all zone-device-pages require special
> >>> +                      * processing.  Those pages return 'false' from
> >>> +                      * put_devmap_managed_page() expecting a call to
> >>> +                      * put_page_testzero()
> >>> +                      */
> >>
> >> Just a documentation tweak: how about:
> >>
> >>                         /*
> >>                          * ZONE_DEVICE pages that return 'false' from
> >>                          * put_devmap_managed_page() do not require special
> >>                          * processing, and instead, expect a call to
> >>                          * put_page_testzero().
> >>                          */
> >
> > Looks better to me, but maybe just go ahead and list those
> > expectations explicitly. Something like:
> >
> >                         /*
> >                          * put_devmap_managed_page() only handles
> >                          * ZONE_DEVICE (struct dev_pagemap managed)
> >                          * pages when the hosting dev_pagemap has the
> >                          * ->free() or ->fault() callback handlers
> >                          *  implemented as indicated by
> >                          *  dev_pagemap.type. Otherwise the expectation
> >                          *  is to fall back to a plain decrement /
> >                          *  put_page_testzero().
> >                          */
>
> I like it--but not here, because it's too much internal detail in a
> call site that doesn't use that level of detail. The call site looks
> at the return value, only.
>
> Let's instead put that blurb above (or in) the put_devmap_managed_page()
> routine itself. And leave the blurb that I wrote where it is. And then I
> think everything will have an appropriate level of detail in the right places.

Ok.  Ideally there wouldn't be any commentary needed at the call site
and the put_page() could be handled internal to
put_devmap_managed_page(), but I did not see a way to do that without
breaking the compile out / static branch optimization when there are
no active ZONE_DEVICE users.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-06-04 21:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-04 16:48 [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages ira.weiny
2019-06-04 19:48 ` John Hubbard
2019-06-04 20:11   ` Dan Williams
2019-06-04 20:17     ` John Hubbard
2019-06-04 20:32       ` Ira Weiny
2019-06-04 21:42       ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.