linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/2] fix follow_page related issues
@ 2022-08-23 13:58 Haiyue Wang
  2022-08-23 13:58 ` [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page Haiyue Wang
  2022-08-23 13:58 ` [PATCH v7 2/2] mm: fix the handling Non-LRU pages returned by follow_page Haiyue Wang
  0 siblings, 2 replies; 7+ messages in thread
From: Haiyue Wang @ 2022-08-23 13:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: akpm, david, apopple, linmiaohe, ying.huang, songmuchun,
	naoya.horiguchi, alex.sierra, mike.kravetz, gerald.schaefer,
	Haiyue Wang

v7: Drop the zone device page check for transparent page.

v6: Simplify the multiple layers of conditionals for if {}

-               if (page) {
-                       err = !is_zone_device_page(page) ? page_to_nid(page)
-                                                        : -ENOENT;
-                       if (foll_flags & FOLL_GET)
-                               put_page(page);
-               } else {
-                       err = -ENOENT;
-               }
+               err = -ENOENT;
+               if (!page)
+                       goto set_status;
+
+               if (!is_zone_device_page(page))
+                       err = page_to_nid(page);
+
+               if (foll_flags & FOLL_GET)
+                       put_page(page);

v5: reword the commit message for FOLL_GET with more information.

v4: add '()' for the function for readability.
    add more words about the Non-LRU pages fix in commit message.

v3: Merge the fix for handling Non-LRU pages into one patch.
    Drop the break_ksm zone device page check.

v2: Add the Non-LRU pages fix with two patches, so that
    'mm: migration: fix the FOLL_GET' can be applied directly
    on linux-5.19 stable branch.

Haiyue Wang (2):
  mm: migration: fix the FOLL_GET failure on following huge page
  mm: fix the handling Non-LRU pages returned by follow_page

 mm/huge_memory.c |  2 +-
 mm/ksm.c         | 12 +++++++++---
 mm/migrate.c     | 23 +++++++++++++++++------
 3 files changed, 27 insertions(+), 10 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page
  2022-08-23 13:58 [PATCH v7 0/2] fix follow_page related issues Haiyue Wang
@ 2022-08-23 13:58 ` Haiyue Wang
  2022-08-24 18:38   ` Andrew Morton
  2022-08-23 13:58 ` [PATCH v7 2/2] mm: fix the handling Non-LRU pages returned by follow_page Haiyue Wang
  1 sibling, 1 reply; 7+ messages in thread
From: Haiyue Wang @ 2022-08-23 13:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: akpm, david, apopple, linmiaohe, ying.huang, songmuchun,
	naoya.horiguchi, alex.sierra, mike.kravetz, gerald.schaefer,
	Haiyue Wang, Baolin Wang

Not all huge page APIs support FOLL_GET option, so move_pages() syscall
will fail to get the page node information for some huge pages.

Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will
return NULL page for FOLL_GET when calling move_pages() syscall with the
NULL 'nodes' parameter, the 'status' parameter has '-2' error in array.

Note: follow_huge_pud() now supports FOLL_GET in linux 6.0.
      Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev

But these huge page APIs don't support FOLL_GET:
  1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c
  2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c
     It will cause WARN_ON_ONCE for FOLL_GET.
  3. follow_huge_pgd() in mm/hugetlb.c

This is an temporary solution to mitigate the side effect of the race
condition fix by calling follow_page() with FOLL_GET set for huge pages.

After supporting follow huge page by FOLL_GET is done, this fix can be
reverted safely.

Fixes: 4cd614841c06 ("mm: migration: fix possible do_pages_stat_array racing with memory offline")
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 6a1597c92261..581dfaad9257 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1848,6 +1848,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 
 	for (i = 0; i < nr_pages; i++) {
 		unsigned long addr = (unsigned long)(*pages);
+		unsigned int foll_flags = FOLL_DUMP;
 		struct vm_area_struct *vma;
 		struct page *page;
 		int err = -EFAULT;
@@ -1856,8 +1857,12 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 		if (!vma)
 			goto set_status;
 
+		/* Not all huge page follow APIs support 'FOLL_GET' */
+		if (!is_vm_hugetlb_page(vma))
+			foll_flags |= FOLL_GET;
+
 		/* FOLL_DUMP to ignore special (like zero) pages */
-		page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
+		page = follow_page(vma, addr, foll_flags);
 
 		err = PTR_ERR(page);
 		if (IS_ERR(page))
@@ -1865,7 +1870,8 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 
 		if (page && !is_zone_device_page(page)) {
 			err = page_to_nid(page);
-			put_page(page);
+			if (foll_flags & FOLL_GET)
+				put_page(page);
 		} else {
 			err = -ENOENT;
 		}
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v7 2/2] mm: fix the handling Non-LRU pages returned by follow_page
  2022-08-23 13:58 [PATCH v7 0/2] fix follow_page related issues Haiyue Wang
  2022-08-23 13:58 ` [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page Haiyue Wang
@ 2022-08-23 13:58 ` Haiyue Wang
  1 sibling, 0 replies; 7+ messages in thread
From: Haiyue Wang @ 2022-08-23 13:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: akpm, david, apopple, linmiaohe, ying.huang, songmuchun,
	naoya.horiguchi, alex.sierra, mike.kravetz, gerald.schaefer,
	Haiyue Wang, Felix Kuehling

The handling Non-LRU pages returned by follow_page() jumps directly, it
doesn't call put_page() to handle the reference count, since 'FOLL_GET'
flag for follow_page() has get_page() called. Fix the zone device page
check by handling the page reference count correctly before returning.

And as David reviewed, "device pages are never PageKsm pages". Drop this
zone device page check for break_ksm().

Since the zone device page can't be a transparent huge page, so drop the
redundant zone device page check for split_huge_pages_pid(). (by Miaohe)

Fixes: 3218f8712d6b ("mm: handling Non-LRU pages returned by vm_normal_pages")
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/huge_memory.c |  2 +-
 mm/ksm.c         | 12 +++++++++---
 mm/migrate.c     | 19 ++++++++++++-------
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8a7c1b344abe..2ee6d38a1426 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2963,7 +2963,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
 		/* FOLL_DUMP to ignore special (like zero) pages */
 		page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
 
-		if (IS_ERR_OR_NULL(page) || is_zone_device_page(page))
+		if (IS_ERR_OR_NULL(page))
 			continue;
 
 		if (!is_transparent_hugepage(page))
diff --git a/mm/ksm.c b/mm/ksm.c
index 42ab153335a2..e26f57fc1f0e 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -475,7 +475,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
 		cond_resched();
 		page = follow_page(vma, addr,
 				FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE);
-		if (IS_ERR_OR_NULL(page) || is_zone_device_page(page))
+		if (IS_ERR_OR_NULL(page))
 			break;
 		if (PageKsm(page))
 			ret = handle_mm_fault(vma, addr,
@@ -560,12 +560,15 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
 		goto out;
 
 	page = follow_page(vma, addr, FOLL_GET);
-	if (IS_ERR_OR_NULL(page) || is_zone_device_page(page))
+	if (IS_ERR_OR_NULL(page))
 		goto out;
+	if (is_zone_device_page(page))
+		goto out_putpage;
 	if (PageAnon(page)) {
 		flush_anon_page(vma, page, addr);
 		flush_dcache_page(page);
 	} else {
+out_putpage:
 		put_page(page);
 out:
 		page = NULL;
@@ -2308,11 +2311,13 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 			if (ksm_test_exit(mm))
 				break;
 			*page = follow_page(vma, ksm_scan.address, FOLL_GET);
-			if (IS_ERR_OR_NULL(*page) || is_zone_device_page(*page)) {
+			if (IS_ERR_OR_NULL(*page)) {
 				ksm_scan.address += PAGE_SIZE;
 				cond_resched();
 				continue;
 			}
+			if (is_zone_device_page(*page))
+				goto next_page;
 			if (PageAnon(*page)) {
 				flush_anon_page(vma, *page, ksm_scan.address);
 				flush_dcache_page(*page);
@@ -2327,6 +2332,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 				mmap_read_unlock(mm);
 				return rmap_item;
 			}
+next_page:
 			put_page(*page);
 			ksm_scan.address += PAGE_SIZE;
 			cond_resched();
diff --git a/mm/migrate.c b/mm/migrate.c
index 581dfaad9257..44e05ce41d49 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1672,9 +1672,12 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		goto out;
 
 	err = -ENOENT;
-	if (!page || is_zone_device_page(page))
+	if (!page)
 		goto out;
 
+	if (is_zone_device_page(page))
+		goto out_putpage;
+
 	err = 0;
 	if (page_to_nid(page) == node)
 		goto out_putpage;
@@ -1868,13 +1871,15 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 		if (IS_ERR(page))
 			goto set_status;
 
-		if (page && !is_zone_device_page(page)) {
+		err = -ENOENT;
+		if (!page)
+			goto set_status;
+
+		if (!is_zone_device_page(page))
 			err = page_to_nid(page);
-			if (foll_flags & FOLL_GET)
-				put_page(page);
-		} else {
-			err = -ENOENT;
-		}
+
+		if (foll_flags & FOLL_GET)
+			put_page(page);
 set_status:
 		*status = err;
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page
  2022-08-23 13:58 ` [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page Haiyue Wang
@ 2022-08-24 18:38   ` Andrew Morton
  2022-08-25 12:39     ` Gerald Schaefer
  2022-08-26 16:51     ` Mike Kravetz
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2022-08-24 18:38 UTC (permalink / raw)
  To: Haiyue Wang
  Cc: linux-mm, linux-kernel, david, apopple, linmiaohe, ying.huang,
	songmuchun, naoya.horiguchi, alex.sierra, mike.kravetz,
	gerald.schaefer, Baolin Wang

On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote:

> Not all huge page APIs support FOLL_GET option, so move_pages() syscall
> will fail to get the page node information for some huge pages.
> 
> Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will
> return NULL page for FOLL_GET when calling move_pages() syscall with the
> NULL 'nodes' parameter, the 'status' parameter has '-2' error in array.
> 
> Note: follow_huge_pud() now supports FOLL_GET in linux 6.0.
>       Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev
> 
> But these huge page APIs don't support FOLL_GET:
>   1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c
>   2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c
>      It will cause WARN_ON_ONCE for FOLL_GET.
>   3. follow_huge_pgd() in mm/hugetlb.c

What happened to the proposal to fix these three sites so this patch is
not needed?

> This is an temporary solution to mitigate the side effect of the race
> condition fix by calling follow_page() with FOLL_GET set for huge pages.
> 
> After supporting follow huge page by FOLL_GET is done, this fix can be
> reverted safely.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page
  2022-08-24 18:38   ` Andrew Morton
@ 2022-08-25 12:39     ` Gerald Schaefer
  2022-08-26  2:53       ` Andrew Morton
  2022-08-26 16:51     ` Mike Kravetz
  1 sibling, 1 reply; 7+ messages in thread
From: Gerald Schaefer @ 2022-08-25 12:39 UTC (permalink / raw)
  To: Andrew Morton, Haiyue Wang
  Cc: linux-mm, linux-kernel, david, apopple, linmiaohe, ying.huang,
	songmuchun, naoya.horiguchi, alex.sierra, mike.kravetz,
	Baolin Wang, Heiko Carstens, Vasily Gorbik, Alexander Gordeev

On Wed, 24 Aug 2022 11:38:58 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote:
> 
> > Not all huge page APIs support FOLL_GET option, so move_pages() syscall
> > will fail to get the page node information for some huge pages.
> > 
> > Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will
> > return NULL page for FOLL_GET when calling move_pages() syscall with the
> > NULL 'nodes' parameter, the 'status' parameter has '-2' error in array.
> > 
> > Note: follow_huge_pud() now supports FOLL_GET in linux 6.0.
> >       Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev
> > 
> > But these huge page APIs don't support FOLL_GET:
> >   1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c
> >   2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c
> >      It will cause WARN_ON_ONCE for FOLL_GET.
> >   3. follow_huge_pgd() in mm/hugetlb.c
> 
> What happened to the proposal to fix these three sites so this patch is
> not needed?

For s390, you can add my patch from
https://lore.kernel.org/linux-mm/20220818135717.609eef8a@thinkpad/
to this series.

Or we can bring it upstream via s390 tree, whatever suits best. It
certainly makes sense to have, also independent from this series.
Adding some s390 people on cc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page
  2022-08-25 12:39     ` Gerald Schaefer
@ 2022-08-26  2:53       ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2022-08-26  2:53 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Haiyue Wang, linux-mm, linux-kernel, david, apopple, linmiaohe,
	ying.huang, songmuchun, naoya.horiguchi, alex.sierra,
	mike.kravetz, Baolin Wang, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev

On Thu, 25 Aug 2022 14:39:17 +0200 Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:

> On Wed, 24 Aug 2022 11:38:58 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote:
> > 
> > > Not all huge page APIs support FOLL_GET option, so move_pages() syscall
> > > will fail to get the page node information for some huge pages.
> > > 
> > > Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will
> > > return NULL page for FOLL_GET when calling move_pages() syscall with the
> > > NULL 'nodes' parameter, the 'status' parameter has '-2' error in array.
> > > 
> > > Note: follow_huge_pud() now supports FOLL_GET in linux 6.0.
> > >       Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev
> > > 
> > > But these huge page APIs don't support FOLL_GET:
> > >   1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c
> > >   2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c
> > >      It will cause WARN_ON_ONCE for FOLL_GET.
> > >   3. follow_huge_pgd() in mm/hugetlb.c
> > 
> > What happened to the proposal to fix these three sites so this patch is
> > not needed?
> 
> For s390, you can add my patch from
> https://lore.kernel.org/linux-mm/20220818135717.609eef8a@thinkpad/
> to this series.
> 

Thanks, I added that.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page
  2022-08-24 18:38   ` Andrew Morton
  2022-08-25 12:39     ` Gerald Schaefer
@ 2022-08-26 16:51     ` Mike Kravetz
  1 sibling, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2022-08-26 16:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Haiyue Wang, linux-mm, linux-kernel, david, apopple, linmiaohe,
	ying.huang, songmuchun, naoya.horiguchi, alex.sierra,
	gerald.schaefer, Baolin Wang

On 08/24/22 11:38, Andrew Morton wrote:
> On Tue, 23 Aug 2022 21:58:40 +0800 Haiyue Wang <haiyue.wang@intel.com> wrote:
> 
> > Not all huge page APIs support FOLL_GET option, so move_pages() syscall
> > will fail to get the page node information for some huge pages.
> > 
> > Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will
> > return NULL page for FOLL_GET when calling move_pages() syscall with the
> > NULL 'nodes' parameter, the 'status' parameter has '-2' error in array.
> > 
> > Note: follow_huge_pud() now supports FOLL_GET in linux 6.0.
> >       Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev
> > 
> > But these huge page APIs don't support FOLL_GET:
> >   1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c
> >   2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c
> >      It will cause WARN_ON_ONCE for FOLL_GET.
> >   3. follow_huge_pgd() in mm/hugetlb.c
> 
> What happened to the proposal to fix these three sites so this patch is
> not needed?
> 

Note mpe's comments here:
https://lore.kernel.org/linux-mm/87r113jgqn.fsf@mpe.ellerman.id.au/

It has been a while since powerpc supported hugetlb pages at the PGD
level.  And, this was the only architecture which had pages at this
level.  Unless I am missing something, this means there is no issue with
follow_huge_pgd as there is no way this code could be invoked when
commit 4cd614841c06 was made.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-26 16:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23 13:58 [PATCH v7 0/2] fix follow_page related issues Haiyue Wang
2022-08-23 13:58 ` [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page Haiyue Wang
2022-08-24 18:38   ` Andrew Morton
2022-08-25 12:39     ` Gerald Schaefer
2022-08-26  2:53       ` Andrew Morton
2022-08-26 16:51     ` Mike Kravetz
2022-08-23 13:58 ` [PATCH v7 2/2] mm: fix the handling Non-LRU pages returned by follow_page Haiyue Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).