All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] A few cleanup and fixup patches for migration
@ 2022-03-04  9:33 Miaohe Lin
  2022-03-04  9:33 ` [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked Miaohe Lin
                   ` (15 more replies)
  0 siblings, 16 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:33 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

Hi everyone,
This series contains a few patches to remove unneeded variables, jump
label and use helper to simplify the code. Also we fix some bugs such
as page refcounts leak , pte_unmap on an not mapped pte and so on. More
details can be found in the respective changelogs. Thanks!

Miaohe Lin (16):
  mm/migration: remove unneeded local variable mapping_locked
  mm/migration: remove unneeded out label
  mm/migration: remove unneeded local variable page_lru
  mm/migration: reduce the rcu lock duration
  mm/migration: fix the confusing PageTransHuge check
  mm/migration: use helper function vma_lookup() in
    add_page_for_migration
  mm/migration: use helper macro min_t in do_pages_stat
  mm/migration: avoid unneeded nodemask_t initialization
  mm/migration: remove some duplicated codes in migrate_pages
  mm/migration: remove PG_writeback handle in folio_migrate_flags
  mm/migration: remove unneeded lock page and PageMovable check
  mm/migration: fix potential page refcounts leak in migrate_pages
  mm/migration: return errno when isolate_huge_page failed
  mm/migration: fix potential invalid node access for reclaim-based
    migration
  mm/migration: fix possible do_pages_stat_array racing with memory
    offline
  mm/migration: fix potential pte_unmap on an not mapped pte

 include/linux/migrate.h |   2 +-
 include/linux/swapops.h |   4 +-
 mm/filemap.c            |  10 +--
 mm/hugetlb.c            |   2 +-
 mm/migrate.c            | 138 ++++++++++++++++------------------------
 5 files changed, 65 insertions(+), 91 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
@ 2022-03-04  9:33 ` Miaohe Lin
  2022-03-04 13:48   ` Muchun Song
  2022-03-07  2:00   ` Huang, Ying
  2022-03-04  9:33 ` [PATCH 02/16] mm/migration: remove unneeded out label Miaohe Lin
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:33 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

When mapping_locked is true, TTU_RMAP_LOCKED is always set to ttu. We can
check ttu instead so mapping_locked can be removed. And ttu is either 0
or TTU_RMAP_LOCKED now. Change '|=' to '=' to reflect this.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 42e3a944f35c..50bc62d85eaf 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1230,7 +1230,6 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 		goto put_anon;
 
 	if (page_mapped(hpage)) {
-		bool mapping_locked = false;
 		enum ttu_flags ttu = 0;
 
 		if (!PageAnon(hpage)) {
@@ -1244,14 +1243,13 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 			if (unlikely(!mapping))
 				goto unlock_put_anon;
 
-			mapping_locked = true;
-			ttu |= TTU_RMAP_LOCKED;
+			ttu = TTU_RMAP_LOCKED;
 		}
 
 		try_to_migrate(src, ttu);
 		page_was_mapped = 1;
 
-		if (mapping_locked)
+		if (ttu & TTU_RMAP_LOCKED)
 			i_mmap_unlock_write(mapping);
 	}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 02/16] mm/migration: remove unneeded out label
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
  2022-03-04  9:33 ` [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked Miaohe Lin
@ 2022-03-04  9:33 ` Miaohe Lin
  2022-03-04 12:12   ` Muchun Song
  2022-03-07  2:03   ` Huang, Ying
  2022-03-04  9:33 ` [PATCH 03/16] mm/migration: remove unneeded local variable page_lru Miaohe Lin
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:33 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

We can do prep_transhuge_page when newpage is not NULL. Thus we can remove
out label to simplify the code.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 50bc62d85eaf..bc1867a5706c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2025,12 +2025,9 @@ static struct page *alloc_misplaced_dst_page_thp(struct page *page,
 
 	newpage = alloc_pages_node(nid, (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 				   HPAGE_PMD_ORDER);
-	if (!newpage)
-		goto out;
+	if (newpage)
+		prep_transhuge_page(newpage);
 
-	prep_transhuge_page(newpage);
-
-out:
 	return newpage;
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 03/16] mm/migration: remove unneeded local variable page_lru
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
  2022-03-04  9:33 ` [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked Miaohe Lin
  2022-03-04  9:33 ` [PATCH 02/16] mm/migration: remove unneeded out label Miaohe Lin
@ 2022-03-04  9:33 ` Miaohe Lin
  2022-03-07 10:58   ` Alistair Popple
  2022-03-04  9:33 ` [PATCH 04/16] mm/migration: reduce the rcu lock duration Miaohe Lin
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:33 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

We can use page_is_file_lru() directly to help account the isolated
pages to simplify the code a bit as same as local variable follflags.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index bc1867a5706c..da5a81052468 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1609,7 +1609,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 {
 	struct vm_area_struct *vma;
 	struct page *page;
-	unsigned int follflags;
 	int err;
 
 	mmap_read_lock(mm);
@@ -1619,8 +1618,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		goto out;
 
 	/* FOLL_DUMP to ignore special (like zero) pages */
-	follflags = FOLL_GET | FOLL_DUMP;
-	page = follow_page(vma, addr, follflags);
+	page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
 
 	err = PTR_ERR(page);
 	if (IS_ERR(page))
@@ -2033,7 +2031,6 @@ static struct page *alloc_misplaced_dst_page_thp(struct page *page,
 
 static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 {
-	int page_lru;
 	int nr_pages = thp_nr_pages(page);
 	int order = compound_order(page);
 
@@ -2060,8 +2057,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 	if (isolate_lru_page(page))
 		return 0;
 
-	page_lru = page_is_file_lru(page);
-	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru,
+	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
 			    nr_pages);
 
 	/*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 04/16] mm/migration: reduce the rcu lock duration
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (2 preceding siblings ...)
  2022-03-04  9:33 ` [PATCH 03/16] mm/migration: remove unneeded local variable page_lru Miaohe Lin
@ 2022-03-04  9:33 ` Miaohe Lin
  2022-03-04 12:16   ` Muchun Song
  2022-03-07  2:32   ` Huang, Ying
  2022-03-04  9:33 ` [PATCH 05/16] mm/migration: fix the confusing PageTransHuge check Miaohe Lin
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:33 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

rcu_read_lock is required by grabbing the task refcount but it's not
needed for ptrace_may_access. So we could release the rcu lock after
task refcount is successfully grabbed to reduce the rcu holding time.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index da5a81052468..26943bd819e8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1907,17 +1907,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
 		return ERR_PTR(-ESRCH);
 	}
 	get_task_struct(task);
+	rcu_read_unlock();
 
 	/*
 	 * Check if this process has the right to modify the specified
 	 * process. Use the regular "ptrace_may_access()" checks.
 	 */
 	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
-		rcu_read_unlock();
 		mm = ERR_PTR(-EPERM);
 		goto out;
 	}
-	rcu_read_unlock();
 
 	mm = ERR_PTR(security_task_movememory(task));
 	if (IS_ERR(mm))
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 05/16] mm/migration: fix the confusing PageTransHuge check
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (3 preceding siblings ...)
  2022-03-04  9:33 ` [PATCH 04/16] mm/migration: reduce the rcu lock duration Miaohe Lin
@ 2022-03-04  9:33 ` Miaohe Lin
  2022-03-04 12:20   ` Muchun Song
  2022-03-04  9:33 ` [PATCH 06/16] mm/migration: use helper function vma_lookup() in add_page_for_migration Miaohe Lin
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:33 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

prep_transhuge_page should be called when PageTransHuge(page) is true.
The newly allocated new_page is not yet PageTransHuge though it could
pass the check as PageTransHuge only checks PageHead now.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 26943bd819e8..15cac2dabc93 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1560,7 +1560,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
 
 	new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
 
-	if (new_page && PageTransHuge(new_page))
+	if (new_page && PageTransHuge(page))
 		prep_transhuge_page(new_page);
 
 	return new_page;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 06/16] mm/migration: use helper function vma_lookup() in add_page_for_migration
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (4 preceding siblings ...)
  2022-03-04  9:33 ` [PATCH 05/16] mm/migration: fix the confusing PageTransHuge check Miaohe Lin
@ 2022-03-04  9:33 ` Miaohe Lin
  2022-03-04  9:34 ` [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat Miaohe Lin
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:33 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

We could use helper function vma_lookup() to lookup the needed vma to
simplify the code.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 15cac2dabc93..bc79d7338780 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1613,8 +1613,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 
 	mmap_read_lock(mm);
 	err = -EFAULT;
-	vma = find_vma(mm, addr);
-	if (!vma || addr < vma->vm_start || !vma_migratable(vma))
+	vma = vma_lookup(mm, addr);
+	if (!vma || !vma_migratable(vma))
 		goto out;
 
 	/* FOLL_DUMP to ignore special (like zero) pages */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (5 preceding siblings ...)
  2022-03-04  9:33 ` [PATCH 06/16] mm/migration: use helper function vma_lookup() in add_page_for_migration Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-04 13:51   ` Muchun Song
  2022-03-07  1:14   ` Andrew Morton
  2022-03-04  9:34 ` [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization Miaohe Lin
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

We could use helper macro min_t to help set the chunk_nr to simplify
the code.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index bc79d7338780..c84eec19072a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1858,9 +1858,7 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
 	while (nr_pages) {
 		unsigned long chunk_nr;
 
-		chunk_nr = nr_pages;
-		if (chunk_nr > DO_PAGES_STAT_CHUNK_NR)
-			chunk_nr = DO_PAGES_STAT_CHUNK_NR;
+		chunk_nr = min_t(unsigned long, nr_pages, DO_PAGES_STAT_CHUNK_NR);
 
 		if (in_compat_syscall()) {
 			if (get_compat_pages_array(chunk_pages, pages,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (6 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-04 13:57   ` Muchun Song
  2022-03-07  2:31   ` Baolin Wang
  2022-03-04  9:34 ` [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages Miaohe Lin
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

Avoid unneeded next_pass and this_pass initialization as they're always
set before using to save possible cpu cycles when there are plenty of
nodes in the system.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index c84eec19072a..abb0c6715e1f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2378,8 +2378,8 @@ static int establish_migrate_target(int node, nodemask_t *used,
  */
 static void __set_migration_target_nodes(void)
 {
-	nodemask_t next_pass	= NODE_MASK_NONE;
-	nodemask_t this_pass	= NODE_MASK_NONE;
+	nodemask_t next_pass;
+	nodemask_t this_pass;
 	nodemask_t used_targets = NODE_MASK_NONE;
 	int node, best_distance;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (7 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-04 15:16   ` Zi Yan
  2022-03-07  1:44   ` Baolin Wang
  2022-03-04  9:34 ` [PATCH 10/16] mm/migration: remove PG_writeback handle in folio_migrate_flags Miaohe Lin
                   ` (6 subsequent siblings)
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

Remove the duplicated codes in migrate_pages to simplify the code. Minor
readability improvement. No functional change intended.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 36 ++++++++++++------------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index abb0c6715e1f..cb970d201147 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1414,14 +1414,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 						nr_thp_split++;
 						goto retry;
 					}
-
-					nr_failed_pages += nr_subpages;
-					break;
-				}
-
 				/* Hugetlb migration is unsupported */
-				if (!no_subpage_counting)
+				} else if (!no_subpage_counting) {
 					nr_failed++;
+				}
+
 				nr_failed_pages += nr_subpages;
 				break;
 			case -ENOMEM:
@@ -1436,28 +1433,22 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 						nr_thp_split++;
 						goto retry;
 					}
-
-					nr_failed_pages += nr_subpages;
-					goto out;
+				} else if (!no_subpage_counting) {
+					nr_failed++;
 				}
 
-				if (!no_subpage_counting)
-					nr_failed++;
 				nr_failed_pages += nr_subpages;
 				goto out;
 			case -EAGAIN:
-				if (is_thp) {
+				if (is_thp)
 					thp_retry++;
-					break;
-				}
-				retry++;
+				else
+					retry++;
 				break;
 			case MIGRATEPAGE_SUCCESS:
 				nr_succeeded += nr_subpages;
-				if (is_thp) {
+				if (is_thp)
 					nr_thp_succeeded++;
-					break;
-				}
 				break;
 			default:
 				/*
@@ -1466,14 +1457,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				 * removed from migration page list and not
 				 * retried in the next outer loop.
 				 */
-				if (is_thp) {
+				if (is_thp)
 					nr_thp_failed++;
-					nr_failed_pages += nr_subpages;
-					break;
-				}
-
-				if (!no_subpage_counting)
+				else if (!no_subpage_counting)
 					nr_failed++;
+
 				nr_failed_pages += nr_subpages;
 				break;
 			}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 10/16] mm/migration: remove PG_writeback handle in folio_migrate_flags
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (8 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-07  1:21   ` Andrew Morton
  2022-03-04  9:34 ` [PATCH 11/16] mm/migration: remove unneeded lock page and PageMovable check Miaohe Lin
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

When newfolio reaches here, it's guaranteed that PG_writeback is not set
because caller ensures writeback must have been completed. Remove this
unneeded check and cleanup the relevant comment.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index cb970d201147..1de5289a4af0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -549,18 +549,6 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
 	if (!folio_test_hugetlb(folio))
 		folio->private = NULL;
 
-	/*
-	 * If any waiters have accumulated on the new page then
-	 * wake them up.
-	 */
-	if (folio_test_writeback(newfolio))
-		folio_end_writeback(newfolio);
-
-	/*
-	 * PG_readahead shares the same bit with PG_reclaim.  The above
-	 * end_page_writeback() may clear PG_readahead mistakenly, so set the
-	 * bit after that.
-	 */
 	if (folio_test_readahead(folio))
 		folio_set_readahead(newfolio);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 11/16] mm/migration: remove unneeded lock page and PageMovable check
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (9 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 10/16] mm/migration: remove PG_writeback handle in folio_migrate_flags Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-04  9:34 ` [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages Miaohe Lin
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

When non-lru movable page was freed from under us, __ClearPageMovable must
have been done. Even if it's not done, __ClearPageIsolated here won't hurt
as page will be freed soon. So we can thus remove unneeded lock page and
PageMovable check here.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 1de5289a4af0..e0db06927f02 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1074,12 +1074,8 @@ static int unmap_and_move(new_page_t get_new_page,
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
-		if (unlikely(__PageMovable(page))) {
-			lock_page(page);
-			if (!PageMovable(page))
-				__ClearPageIsolated(page);
-			unlock_page(page);
-		}
+		if (unlikely(__PageMovable(page)))
+			__ClearPageIsolated(page);
 		goto out;
 	}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (10 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 11/16] mm/migration: remove unneeded lock page and PageMovable check Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-04 15:21   ` Zi Yan
                     ` (2 more replies)
  2022-03-04  9:34 ` [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
                   ` (3 subsequent siblings)
  15 siblings, 3 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
left in thp_split_pages list. We should move them back to migration
list so that they could be put back to the right list by the caller
otherwise the page refcnt will be leaked here. Also adjust nr_failed
and nr_thp_failed accordingly to make vm events account more accurate.

Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index e0db06927f02..6c2dfed2ddb8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				}
 
 				nr_failed_pages += nr_subpages;
+				/*
+				 * There might be some subpages of fail-to-migrate THPs
+				 * left in thp_split_pages list. Move them back to migration
+				 * list so that they could be put back to the right list by
+				 * the caller otherwise the page refcnt will be leaked.
+				 */
+				list_splice_init(&thp_split_pages, from);
+				nr_failed += retry;
+				nr_thp_failed += thp_retry;
 				goto out;
 			case -EAGAIN:
 				if (is_thp)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (11 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-05  2:23   ` Muchun Song
                     ` (2 more replies)
  2022-03-04  9:34 ` [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration Miaohe Lin
                   ` (2 subsequent siblings)
  15 siblings, 3 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

We should return errno (-EBUSY here) when failed to isolate the huge page
rather than always return 1 which could confuse the user.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 6c2dfed2ddb8..279940c0c064 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		goto out_putpage;
 
 	if (PageHuge(page)) {
-		if (PageHead(page)) {
-			isolate_huge_page(page, pagelist);
-			err = 1;
-		}
+		if (PageHead(page))
+			err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;
 	} else {
 		struct page *head;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (12 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-07  2:25   ` Baolin Wang
  2022-03-07  5:14   ` Huang, Ying
  2022-03-04  9:34 ` [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline Miaohe Lin
  2022-03-04  9:34 ` [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

If we failed to setup hotplug state callbacks for mm/demotion:online in
some corner cases, node_demotion will be left uninitialized. Invalid node
might be returned from the next_demotion_node() when doing reclaim-based
migration. Use kcalloc to allocate node_demotion to fix the issue.

Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 279940c0c064..7b1c0b988234 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2516,9 +2516,9 @@ static int __init migrate_on_reclaim_init(void)
 {
 	int ret;
 
-	node_demotion = kmalloc_array(nr_node_ids,
-				      sizeof(struct demotion_nodes),
-				      GFP_KERNEL);
+	node_demotion = kcalloc(nr_node_ids,
+				sizeof(struct demotion_nodes),
+				GFP_KERNEL);
 	WARN_ON(!node_demotion);
 
 	ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (13 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-07  5:21   ` Huang, Ying
  2022-03-04  9:34 ` [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
  15 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

When follow_page peeks a page, the page could be reclaimed under heavy
memory pressure and thus be offlined while it's still being used by the
do_pages_stat_array(). Use FOLL_GET to hold the page refcnt to fix this
potential issue.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 7b1c0b988234..98a968e6f465 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1788,13 +1788,18 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 			goto set_status;
 
 		/* FOLL_DUMP to ignore special (like zero) pages */
-		page = follow_page(vma, addr, FOLL_DUMP);
+		page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
 
 		err = PTR_ERR(page);
 		if (IS_ERR(page))
 			goto set_status;
 
-		err = page ? page_to_nid(page) : -ENOENT;
+		if (page) {
+			err = page_to_nid(page);
+			put_page(page);
+		} else {
+			err = -ENOENT;
+		}
 set_status:
 		*status = err;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (14 preceding siblings ...)
  2022-03-04  9:34 ` [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline Miaohe Lin
@ 2022-03-04  9:34 ` Miaohe Lin
  2022-03-07  5:37   ` Huang, Ying
  2022-03-07  7:35   ` Alistair Popple
  15 siblings, 2 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-04  9:34 UTC (permalink / raw)
  To: akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel, linmiaohe

__migration_entry_wait and migration_entry_wait_on_locked assume pte is
always mapped from caller. But this is not the case when it's called from
migration_entry_wait_huge and follow_huge_pmd. And a parameter unmap to
indicate whether pte needs to be unmapped to fix this issue.

Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/migrate.h |  2 +-
 include/linux/swapops.h |  4 ++--
 mm/filemap.c            | 10 +++++-----
 mm/hugetlb.c            |  2 +-
 mm/migrate.c            | 14 ++++++++------
 5 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 66a34eae8cb6..3ef4ff699bef 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -41,7 +41,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 extern int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page, int extra_count);
 void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
-				spinlock_t *ptl);
+				spinlock_t *ptl, bool unmap);
 void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
 void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
 int folio_migrate_mapping(struct address_space *mapping,
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index d356ab4047f7..d66556875d7d 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -213,7 +213,7 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
 }
 
 extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
-					spinlock_t *ptl);
+					spinlock_t *ptl, bool unmap);
 extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					unsigned long address);
 extern void migration_entry_wait_huge(struct vm_area_struct *vma,
@@ -235,7 +235,7 @@ static inline int is_migration_entry(swp_entry_t swp)
 }
 
 static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
-					spinlock_t *ptl) { }
+					spinlock_t *ptl, bool unmap) { }
 static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					 unsigned long address) { }
 static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
diff --git a/mm/filemap.c b/mm/filemap.c
index 8f7e6088ee2a..18c353d52aae 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1389,6 +1389,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
  * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
  *        for pte entries, pass NULL for pmd entries.
  * @ptl: already locked ptl. This function will drop the lock.
+ * @unmap: indicating whether ptep need to be unmapped.
  *
  * Wait for a migration entry referencing the given page to be removed. This is
  * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
@@ -1402,7 +1403,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
  * there.
  */
 void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
-				spinlock_t *ptl)
+				spinlock_t *ptl, bool unmap)
 {
 	struct wait_page_queue wait_page;
 	wait_queue_entry_t *wait = &wait_page.wait;
@@ -1439,10 +1440,9 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
 	 * a valid reference to the page, and it must take the ptl to remove the
 	 * migration entry. So the page is valid until the ptl is dropped.
 	 */
-	if (ptep)
-		pte_unmap_unlock(ptep, ptl);
-	else
-		spin_unlock(ptl);
+	spin_unlock(ptl);
+	if (unmap && ptep)
+		pte_unmap(ptep);
 
 	for (;;) {
 		unsigned int flags;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 07668781c246..8088128c25db 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6713,7 +6713,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 	} else {
 		if (is_hugetlb_entry_migration(pte)) {
 			spin_unlock(ptl);
-			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
+			__migration_entry_wait(mm, (pte_t *)pmd, ptl, false);
 			goto retry;
 		}
 		/*
diff --git a/mm/migrate.c b/mm/migrate.c
index 98a968e6f465..5519261f54fe 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -281,7 +281,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked)
  * When we return from this function the fault will be retried.
  */
 void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
-				spinlock_t *ptl)
+				spinlock_t *ptl, bool unmap)
 {
 	pte_t pte;
 	swp_entry_t entry;
@@ -295,10 +295,12 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 	if (!is_migration_entry(entry))
 		goto out;
 
-	migration_entry_wait_on_locked(entry, ptep, ptl);
+	migration_entry_wait_on_locked(entry, ptep, ptl, unmap);
 	return;
 out:
-	pte_unmap_unlock(ptep, ptl);
+	spin_unlock(ptl);
+	if (unmap)
+		pte_unmap(ptep);
 }
 
 void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
@@ -306,14 +308,14 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 {
 	spinlock_t *ptl = pte_lockptr(mm, pmd);
 	pte_t *ptep = pte_offset_map(pmd, address);
-	__migration_entry_wait(mm, ptep, ptl);
+	__migration_entry_wait(mm, ptep, ptl, true);
 }
 
 void migration_entry_wait_huge(struct vm_area_struct *vma,
 		struct mm_struct *mm, pte_t *pte)
 {
 	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
-	__migration_entry_wait(mm, pte, ptl);
+	__migration_entry_wait(mm, pte, ptl, false);
 }
 
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
@@ -324,7 +326,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
 	ptl = pmd_lock(mm, pmd);
 	if (!is_pmd_migration_entry(*pmd))
 		goto unlock;
-	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
+	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl, false);
 	return;
 unlock:
 	spin_unlock(ptl);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 02/16] mm/migration: remove unneeded out label
  2022-03-04  9:33 ` [PATCH 02/16] mm/migration: remove unneeded out label Miaohe Lin
@ 2022-03-04 12:12   ` Muchun Song
  2022-03-07  2:03   ` Huang, Ying
  1 sibling, 0 replies; 73+ messages in thread
From: Muchun Song @ 2022-03-04 12:12 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Fri, Mar 4, 2022 at 5:35 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> We can do prep_transhuge_page when newpage is not NULL. Thus we can remove
> out label to simplify the code.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/16] mm/migration: reduce the rcu lock duration
  2022-03-04  9:33 ` [PATCH 04/16] mm/migration: reduce the rcu lock duration Miaohe Lin
@ 2022-03-04 12:16   ` Muchun Song
  2022-03-07  2:32   ` Huang, Ying
  1 sibling, 0 replies; 73+ messages in thread
From: Muchun Song @ 2022-03-04 12:16 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Fri, Mar 4, 2022 at 5:35 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> rcu_read_lock is required by grabbing the task refcount but it's not
> needed for ptrace_may_access. So we could release the rcu lock after
> task refcount is successfully grabbed to reduce the rcu holding time.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Also simple.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 05/16] mm/migration: fix the confusing PageTransHuge check
  2022-03-04  9:33 ` [PATCH 05/16] mm/migration: fix the confusing PageTransHuge check Miaohe Lin
@ 2022-03-04 12:20   ` Muchun Song
  0 siblings, 0 replies; 73+ messages in thread
From: Muchun Song @ 2022-03-04 12:20 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Fri, Mar 4, 2022 at 5:35 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> prep_transhuge_page should be called when PageTransHuge(page) is true.
> The newly allocated new_page is not yet PageTransHuge though it could
> pass the check as PageTransHuge only checks PageHead now.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

At least I agree with you.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked
  2022-03-04  9:33 ` [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked Miaohe Lin
@ 2022-03-04 13:48   ` Muchun Song
  2022-03-07  2:00   ` Huang, Ying
  1 sibling, 0 replies; 73+ messages in thread
From: Muchun Song @ 2022-03-04 13:48 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Fri, Mar 4, 2022 at 5:35 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> When mapping_locked is true, TTU_RMAP_LOCKED is always set to ttu. We can
> check ttu instead so mapping_locked can be removed. And ttu is either 0
> or TTU_RMAP_LOCKED now. Change '|=' to '=' to reflect this.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat
  2022-03-04  9:34 ` [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat Miaohe Lin
@ 2022-03-04 13:51   ` Muchun Song
  2022-03-07  1:14   ` Andrew Morton
  1 sibling, 0 replies; 73+ messages in thread
From: Muchun Song @ 2022-03-04 13:51 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Fri, Mar 4, 2022 at 5:35 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> We could use helper macro min_t to help set the chunk_nr to simplify
> the code.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization
  2022-03-04  9:34 ` [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization Miaohe Lin
@ 2022-03-04 13:57   ` Muchun Song
  2022-03-07  2:31   ` Baolin Wang
  1 sibling, 0 replies; 73+ messages in thread
From: Muchun Song @ 2022-03-04 13:57 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Fri, Mar 4, 2022 at 5:35 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> Avoid unneeded next_pass and this_pass initialization as they're always
> set before using to save possible cpu cycles when there are plenty of
> nodes in the system.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages
  2022-03-04  9:34 ` [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages Miaohe Lin
@ 2022-03-04 15:16   ` Zi Yan
  2022-03-07  1:44   ` Baolin Wang
  1 sibling, 0 replies; 73+ messages in thread
From: Zi Yan @ 2022-03-04 15:16 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ying.huang, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On 4 Mar 2022, at 4:34, Miaohe Lin wrote:

> Remove the duplicated codes in migrate_pages to simplify the code. Minor
> readability improvement. No functional change intended.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 36 ++++++++++++------------------------
>  1 file changed, 12 insertions(+), 24 deletions(-)
>
LGTM. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-04  9:34 ` [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages Miaohe Lin
@ 2022-03-04 15:21   ` Zi Yan
  2022-03-07  1:57   ` Baolin Wang
  2022-03-07  5:01   ` Huang, Ying
  2 siblings, 0 replies; 73+ messages in thread
From: Zi Yan @ 2022-03-04 15:21 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ying.huang, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1459 bytes --]

On 4 Mar 2022, at 4:34, Miaohe Lin wrote:

> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
> left in thp_split_pages list. We should move them back to migration
> list so that they could be put back to the right list by the caller
> otherwise the page refcnt will be leaked here. Also adjust nr_failed
> and nr_thp_failed accordingly to make vm events account more accurate.
>
> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e0db06927f02..6c2dfed2ddb8 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				}
>
>  				nr_failed_pages += nr_subpages;
> +				/*
> +				 * There might be some subpages of fail-to-migrate THPs
> +				 * left in thp_split_pages list. Move them back to migration
> +				 * list so that they could be put back to the right list by
> +				 * the caller otherwise the page refcnt will be leaked.
> +				 */
> +				list_splice_init(&thp_split_pages, from);
> +				nr_failed += retry;
> +				nr_thp_failed += thp_retry;
>  				goto out;
>  			case -EAGAIN:
>  				if (is_thp)
> -- 
> 2.23.0

LGTM. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com>


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-04  9:34 ` [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
@ 2022-03-05  2:23   ` Muchun Song
  2022-03-07 11:46     ` Miaohe Lin
  2022-03-07  2:14   ` Baolin Wang
  2022-03-07  5:07   ` Huang, Ying
  2 siblings, 1 reply; 73+ messages in thread
From: Muchun Song @ 2022-03-05  2:23 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Fri, Mar 4, 2022 at 5:36 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> We should return errno (-EBUSY here) when failed to isolate the huge page
> rather than always return 1 which could confuse the user.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

IIUC, this should be a bug fix since the status reported to the user is
wrong.  A Fixes tag may be needed.  But anyway:

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat
  2022-03-04  9:34 ` [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat Miaohe Lin
  2022-03-04 13:51   ` Muchun Song
@ 2022-03-07  1:14   ` Andrew Morton
  2022-03-07 11:51     ` Miaohe Lin
  1 sibling, 1 reply; 73+ messages in thread
From: Andrew Morton @ 2022-03-07  1:14 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

On Fri, 4 Mar 2022 17:34:00 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> We could use helper macro min_t to help set the chunk_nr to simplify
> the code.
> 
> ...
>
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1858,9 +1858,7 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
>  	while (nr_pages) {
>  		unsigned long chunk_nr;
>  
> -		chunk_nr = nr_pages;
> -		if (chunk_nr > DO_PAGES_STAT_CHUNK_NR)
> -			chunk_nr = DO_PAGES_STAT_CHUNK_NR;
> +		chunk_nr = min_t(unsigned long, nr_pages, DO_PAGES_STAT_CHUNK_NR);
>  
>  		if (in_compat_syscall()) {
>  			if (get_compat_pages_array(chunk_pages, pages,

Getting the types correct is better than using min_t().

--- a/mm/migrate.c~mm-migration-use-helper-macro-min_t-in-do_pages_stat-fix
+++ a/mm/migrate.c
@@ -1844,14 +1844,12 @@ static int do_pages_stat(struct mm_struc
 			 const void __user * __user *pages,
 			 int __user *status)
 {
-#define DO_PAGES_STAT_CHUNK_NR 16
+#define DO_PAGES_STAT_CHUNK_NR 16UL
 	const void __user *chunk_pages[DO_PAGES_STAT_CHUNK_NR];
 	int chunk_status[DO_PAGES_STAT_CHUNK_NR];
 
 	while (nr_pages) {
-		unsigned long chunk_nr;
-
-		chunk_nr = min_t(unsigned long, nr_pages, DO_PAGES_STAT_CHUNK_NR);
+		unsigned long chunk_nr = min(nr_pages, DO_PAGES_STAT_CHUNK_NR);
 
 		if (in_compat_syscall()) {
 			if (get_compat_pages_array(chunk_pages, pages,
_


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 10/16] mm/migration: remove PG_writeback handle in folio_migrate_flags
  2022-03-04  9:34 ` [PATCH 10/16] mm/migration: remove PG_writeback handle in folio_migrate_flags Miaohe Lin
@ 2022-03-07  1:21   ` Andrew Morton
  2022-03-07 12:44     ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Andrew Morton @ 2022-03-07  1:21 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

On Fri, 4 Mar 2022 17:34:03 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> When newfolio reaches here, it's guaranteed that PG_writeback is not set
> because caller ensures writeback must have been completed. Remove this
> unneeded check and cleanup the relevant comment.

What guarantees that writeback cannot start after the caller has checked?

I see no such check in iomap_migrate_page()?

> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -549,18 +549,6 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
>  	if (!folio_test_hugetlb(folio))
>  		folio->private = NULL;
>  
> -	/*
> -	 * If any waiters have accumulated on the new page then
> -	 * wake them up.
> -	 */
> -	if (folio_test_writeback(newfolio))
> -		folio_end_writeback(newfolio);
> -
> -	/*
> -	 * PG_readahead shares the same bit with PG_reclaim.  The above
> -	 * end_page_writeback() may clear PG_readahead mistakenly, so set the
> -	 * bit after that.
> -	 */
>  	if (folio_test_readahead(folio))
>  		folio_set_readahead(newfolio);
>  

folio_migrate_flags() and folio_migrate_copy() are global,
export-to-modules functions but have no interface documentation.  That
was bad of us.  

I wonder why those two functions are exported to modules anyway.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages
  2022-03-04  9:34 ` [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages Miaohe Lin
  2022-03-04 15:16   ` Zi Yan
@ 2022-03-07  1:44   ` Baolin Wang
  1 sibling, 0 replies; 73+ messages in thread
From: Baolin Wang @ 2022-03-07  1:44 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel



On 3/4/2022 5:34 PM, Miaohe Lin wrote:
> Remove the duplicated codes in migrate_pages to simplify the code. Minor
> readability improvement. No functional change intended.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-04  9:34 ` [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages Miaohe Lin
  2022-03-04 15:21   ` Zi Yan
@ 2022-03-07  1:57   ` Baolin Wang
  2022-03-07  5:02     ` Huang, Ying
  2022-03-07 12:01     ` Miaohe Lin
  2022-03-07  5:01   ` Huang, Ying
  2 siblings, 2 replies; 73+ messages in thread
From: Baolin Wang @ 2022-03-07  1:57 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

Hi Miaohe,

On 3/4/2022 5:34 PM, Miaohe Lin wrote:
> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
> left in thp_split_pages list. We should move them back to migration
> list so that they could be put back to the right list by the caller
> otherwise the page refcnt will be leaked here. Also adjust nr_failed
> and nr_thp_failed accordingly to make vm events account more accurate.
> 
> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   mm/migrate.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e0db06927f02..6c2dfed2ddb8 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>   				}
>   
>   				nr_failed_pages += nr_subpages;
> +				/*
> +				 * There might be some subpages of fail-to-migrate THPs
> +				 * left in thp_split_pages list. Move them back to migration
> +				 * list so that they could be put back to the right list by
> +				 * the caller otherwise the page refcnt will be leaked.
> +				 */
> +				list_splice_init(&thp_split_pages, from);
> +				nr_failed += retry;
> +				nr_thp_failed += thp_retry;

Yes, I think we missed this case before, and your patch looks right. But 
we should also update the 'rc' to return the correct number of pages 
that were not migrated, right?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked
  2022-03-04  9:33 ` [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked Miaohe Lin
  2022-03-04 13:48   ` Muchun Song
@ 2022-03-07  2:00   ` Huang, Ying
  2022-03-08 11:41     ` Miaohe Lin
  1 sibling, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  2:00 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> When mapping_locked is true, TTU_RMAP_LOCKED is always set to ttu. We can
> check ttu instead so mapping_locked can be removed. And ttu is either 0
> or TTU_RMAP_LOCKED now. Change '|=' to '=' to reflect this.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 42e3a944f35c..50bc62d85eaf 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1230,7 +1230,6 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  		goto put_anon;
>  
>  	if (page_mapped(hpage)) {
> -		bool mapping_locked = false;
>  		enum ttu_flags ttu = 0;
>  
>  		if (!PageAnon(hpage)) {
> @@ -1244,14 +1243,13 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  			if (unlikely(!mapping))
>  				goto unlock_put_anon;
>  
> -			mapping_locked = true;
> -			ttu |= TTU_RMAP_LOCKED;
> +			ttu = TTU_RMAP_LOCKED;
>  		}
>  
>  		try_to_migrate(src, ttu);
>  		page_was_mapped = 1;
>  
> -		if (mapping_locked)
> +		if (ttu & TTU_RMAP_LOCKED)
>  			i_mmap_unlock_write(mapping);
>  	}

I think this is a code style issue.  The original code looks more
readable for me.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 02/16] mm/migration: remove unneeded out label
  2022-03-04  9:33 ` [PATCH 02/16] mm/migration: remove unneeded out label Miaohe Lin
  2022-03-04 12:12   ` Muchun Song
@ 2022-03-07  2:03   ` Huang, Ying
  2022-03-08 11:44     ` Miaohe Lin
  1 sibling, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  2:03 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> We can do prep_transhuge_page when newpage is not NULL. Thus we can remove
> out label to simplify the code.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 50bc62d85eaf..bc1867a5706c 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2025,12 +2025,9 @@ static struct page *alloc_misplaced_dst_page_thp(struct page *page,
>  
>  	newpage = alloc_pages_node(nid, (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>  				   HPAGE_PMD_ORDER);
> -	if (!newpage)
> -		goto out;
> +	if (newpage)
> +		prep_transhuge_page(newpage);
>  
> -	prep_transhuge_page(newpage);
> -
> -out:
>  	return newpage;
>  }

I don't think this change is necessary.  The original code is simple and
follows the common practice for error processing.  The new code is OK,
but it's unnecessary to change.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-04  9:34 ` [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
  2022-03-05  2:23   ` Muchun Song
@ 2022-03-07  2:14   ` Baolin Wang
  2022-03-07 12:20     ` Miaohe Lin
  2022-03-07  5:07   ` Huang, Ying
  2 siblings, 1 reply; 73+ messages in thread
From: Baolin Wang @ 2022-03-07  2:14 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

Hi Miaohe,

On 3/4/2022 5:34 PM, Miaohe Lin wrote:
> We should return errno (-EBUSY here) when failed to isolate the huge page
> rather than always return 1 which could confuse the user.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   mm/migrate.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6c2dfed2ddb8..279940c0c064 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>   		goto out_putpage;
>   
>   	if (PageHuge(page)) {
> -		if (PageHead(page)) {
> -			isolate_huge_page(page, pagelist);
> -			err = 1;
> -		}
> +		if (PageHead(page))
> +			err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;

Could you elaborate on which case the huge page isolation can be failed 
in this case? Or you met a real problem? Cause now we've got this head 
huge page refcnt, I can not see why we'll fail to isolate this huge page.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration
  2022-03-04  9:34 ` [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration Miaohe Lin
@ 2022-03-07  2:25   ` Baolin Wang
  2022-03-07  5:14     ` Huang, Ying
  2022-03-07  5:14   ` Huang, Ying
  1 sibling, 1 reply; 73+ messages in thread
From: Baolin Wang @ 2022-03-07  2:25 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel



On 3/4/2022 5:34 PM, Miaohe Lin wrote:
> If we failed to setup hotplug state callbacks for mm/demotion:online in
> some corner cases, node_demotion will be left uninitialized. Invalid node
> might be returned from the next_demotion_node() when doing reclaim-based
> migration. Use kcalloc to allocate node_demotion to fix the issue.
> 
> Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   mm/migrate.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 279940c0c064..7b1c0b988234 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2516,9 +2516,9 @@ static int __init migrate_on_reclaim_init(void)
>   {
>   	int ret;
>   
> -	node_demotion = kmalloc_array(nr_node_ids,
> -				      sizeof(struct demotion_nodes),
> -				      GFP_KERNEL);
> +	node_demotion = kcalloc(nr_node_ids,
> +				sizeof(struct demotion_nodes),
> +				GFP_KERNEL);

Nit: not sure if this is worthy of this rare corner case, but I think 
the target demotion nodes' default value should be NUMA_NO_NODE instead 
of 0.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization
  2022-03-04  9:34 ` [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization Miaohe Lin
  2022-03-04 13:57   ` Muchun Song
@ 2022-03-07  2:31   ` Baolin Wang
  1 sibling, 0 replies; 73+ messages in thread
From: Baolin Wang @ 2022-03-07  2:31 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, dave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel



On 3/4/2022 5:34 PM, Miaohe Lin wrote:
> Avoid unneeded next_pass and this_pass initialization as they're always
> set before using to save possible cpu cycles when there are plenty of
> nodes in the system.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   mm/migrate.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c84eec19072a..abb0c6715e1f 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2378,8 +2378,8 @@ static int establish_migrate_target(int node, nodemask_t *used,
>    */
>   static void __set_migration_target_nodes(void)
>   {
> -	nodemask_t next_pass	= NODE_MASK_NONE;
> -	nodemask_t this_pass	= NODE_MASK_NONE;
> +	nodemask_t next_pass;
> +	nodemask_t this_pass;
>   	nodemask_t used_targets = NODE_MASK_NONE;
>   	int node, best_distance;

LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/16] mm/migration: reduce the rcu lock duration
  2022-03-04  9:33 ` [PATCH 04/16] mm/migration: reduce the rcu lock duration Miaohe Lin
  2022-03-04 12:16   ` Muchun Song
@ 2022-03-07  2:32   ` Huang, Ying
  2022-03-08 12:09     ` Miaohe Lin
  1 sibling, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  2:32 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel,
	Christoph Lameter, David Howells, Linus Torvalds

Miaohe Lin <linmiaohe@huawei.com> writes:

> rcu_read_lock is required by grabbing the task refcount but it's not
> needed for ptrace_may_access. So we could release the rcu lock after
> task refcount is successfully grabbed to reduce the rcu holding time.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index da5a81052468..26943bd819e8 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1907,17 +1907,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>  		return ERR_PTR(-ESRCH);
>  	}
>  	get_task_struct(task);
> +	rcu_read_unlock();
>  
>  	/*
>  	 * Check if this process has the right to modify the specified
>  	 * process. Use the regular "ptrace_may_access()" checks.
>  	 */
>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
> -		rcu_read_unlock();
>  		mm = ERR_PTR(-EPERM);
>  		goto out;
>  	}
> -	rcu_read_unlock();
>  
>  	mm = ERR_PTR(security_task_movememory(task));
>  	if (IS_ERR(mm))

Digged some history via `git blame`, found that the RCU read lock is
extended in the following commit,

"
3268c63eded4612a3d07b56d1e02ce7731e6608e
Author:     Christoph Lameter <cl@linux.com>
AuthorDate: Wed Mar 21 16:34:06 2012 -0700
Commit:     Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Wed Mar 21 17:54:58 2012 -0700

mm: fix move/migrate_pages() race on task struct

Migration functions perform the rcu_read_unlock too early.  As a result
the task pointed to may change from under us.  This can result in an oops,
as reported by Dave Hansen in https://lkml.org/lkml/2012/2/23/302.

The following patch extend the period of the rcu_read_lock until after the
permissions checks are done.  We also take a refcount so that the task
reference is stable when calling security check functions and performing
cpuset node validation (which takes a mutex).

The refcount is dropped before actual page migration occurs so there is no
change to the refcounts held during page migration.

Also move the determination of the mm of the task struct to immediately
before the do_migrate*() calls so that it is clear that we switch from
handling the task during permission checks to the mm for the actual
migration.  Since the determination is only done once and we then no
longer use the task_struct we can be sure that we operate on a specific
address space that will not change from under us.
"

After that, the permission checking has been changed from __task_cred()
to ptrace_may_access().  So the situation may change somewhat.  Cced
some names found in git history to verify.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-04  9:34 ` [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages Miaohe Lin
  2022-03-04 15:21   ` Zi Yan
  2022-03-07  1:57   ` Baolin Wang
@ 2022-03-07  5:01   ` Huang, Ying
  2022-03-07 12:11     ` Miaohe Lin
  2 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  5:01 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
> left in thp_split_pages list. We should move them back to migration
> list so that they could be put back to the right list by the caller
> otherwise the page refcnt will be leaked here. Also adjust nr_failed
> and nr_thp_failed accordingly to make vm events account more accurate.
>
> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e0db06927f02..6c2dfed2ddb8 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				}
>  
>  				nr_failed_pages += nr_subpages;
> +				/*
> +				 * There might be some subpages of fail-to-migrate THPs
> +				 * left in thp_split_pages list. Move them back to migration
> +				 * list so that they could be put back to the right list by
> +				 * the caller otherwise the page refcnt will be leaked.
> +				 */
> +				list_splice_init(&thp_split_pages, from);
> +				nr_failed += retry;

It appears that we don't need to change nr_failed, because we don't use
it for this situation.  Otherwise looks good to me.

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying

> +				nr_thp_failed += thp_retry;
>  				goto out;
>  			case -EAGAIN:
>  				if (is_thp)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-07  1:57   ` Baolin Wang
@ 2022-03-07  5:02     ` Huang, Ying
  2022-03-07  6:00       ` Baolin Wang
  2022-03-07 12:01     ` Miaohe Lin
  1 sibling, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  5:02 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Miaohe Lin, akpm, mike.kravetz, shy828301, willy, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> Hi Miaohe,
>
> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
>> left in thp_split_pages list. We should move them back to migration
>> list so that they could be put back to the right list by the caller
>> otherwise the page refcnt will be leaked here. Also adjust nr_failed
>> and nr_thp_failed accordingly to make vm events account more accurate.
>> Fixes: b5bade978e9b ("mm: migrate: fix the return value of
>> migrate_pages()")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>   mm/migrate.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e0db06927f02..6c2dfed2ddb8 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>   				}
>>     				nr_failed_pages += nr_subpages;
>> +				/*
>> +				 * There might be some subpages of fail-to-migrate THPs
>> +				 * left in thp_split_pages list. Move them back to migration
>> +				 * list so that they could be put back to the right list by
>> +				 * the caller otherwise the page refcnt will be leaked.
>> +				 */
>> +				list_splice_init(&thp_split_pages, from);
>> +				nr_failed += retry;
>> +				nr_thp_failed += thp_retry;
>
> Yes, I think we missed this case before, and your patch looks
> right. But we should also update the 'rc' to return the correct number
> of pages that were not migrated, right?

Per my understanding, -ENOMEM should be returned to indicate an fatal
error.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-04  9:34 ` [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
  2022-03-05  2:23   ` Muchun Song
  2022-03-07  2:14   ` Baolin Wang
@ 2022-03-07  5:07   ` Huang, Ying
  2022-03-08 12:12     ` Miaohe Lin
  2 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  5:07 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> We should return errno (-EBUSY here) when failed to isolate the huge page
> rather than always return 1 which could confuse the user.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6c2dfed2ddb8..279940c0c064 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  		goto out_putpage;
>  
>  	if (PageHuge(page)) {
> -		if (PageHead(page)) {
> -			isolate_huge_page(page, pagelist);
> -			err = 1;
> -		}
> +		if (PageHead(page))
> +			err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;

IMHO, it's better to determine the proper errno inside
isolate_huge_page() instead of in the caller.  If you think it's
necessary to get errno here.  How about change isolate_huge_page()
instead?

Best Regards,
Huang, Ying

>  	} else {
>  		struct page *head;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration
  2022-03-07  2:25   ` Baolin Wang
@ 2022-03-07  5:14     ` Huang, Ying
  2022-03-07  7:04       ` Baolin Wang
  0 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  5:14 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Miaohe Lin, akpm, mike.kravetz, shy828301, willy, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>> If we failed to setup hotplug state callbacks for mm/demotion:online in
>> some corner cases, node_demotion will be left uninitialized. Invalid node
>> might be returned from the next_demotion_node() when doing reclaim-based
>> migration. Use kcalloc to allocate node_demotion to fix the issue.
>> Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes
>> demotion")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>   mm/migrate.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 279940c0c064..7b1c0b988234 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2516,9 +2516,9 @@ static int __init migrate_on_reclaim_init(void)
>>   {
>>   	int ret;
>>   -	node_demotion = kmalloc_array(nr_node_ids,
>> -				      sizeof(struct demotion_nodes),
>> -				      GFP_KERNEL);
>> +	node_demotion = kcalloc(nr_node_ids,
>> +				sizeof(struct demotion_nodes),
>> +				GFP_KERNEL);
>
> Nit: not sure if this is worthy of this rare corner case, but I think
> the target demotion nodes' default value should be NUMA_NO_NODE
> instead of 0.

The "nr" field of "struct demotion_nodes" should be initialized as 0.  I
think that is checked before "nodes[]" field.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration
  2022-03-04  9:34 ` [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration Miaohe Lin
  2022-03-07  2:25   ` Baolin Wang
@ 2022-03-07  5:14   ` Huang, Ying
  1 sibling, 0 replies; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  5:14 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> If we failed to setup hotplug state callbacks for mm/demotion:online in
> some corner cases, node_demotion will be left uninitialized. Invalid node
> might be returned from the next_demotion_node() when doing reclaim-based
> migration. Use kcalloc to allocate node_demotion to fix the issue.
>
> Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

LGTM!  Thanks!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying

> ---
>  mm/migrate.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 279940c0c064..7b1c0b988234 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2516,9 +2516,9 @@ static int __init migrate_on_reclaim_init(void)
>  {
>  	int ret;
>  
> -	node_demotion = kmalloc_array(nr_node_ids,
> -				      sizeof(struct demotion_nodes),
> -				      GFP_KERNEL);
> +	node_demotion = kcalloc(nr_node_ids,
> +				sizeof(struct demotion_nodes),
> +				GFP_KERNEL);
>  	WARN_ON(!node_demotion);
>  
>  	ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline
  2022-03-04  9:34 ` [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline Miaohe Lin
@ 2022-03-07  5:21   ` Huang, Ying
  2022-03-07  7:01     ` Muchun Song
  0 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  5:21 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> When follow_page peeks a page, the page could be reclaimed under heavy
> memory pressure

I don't think that memory pressure and reclaiming will be an issue.

> and thus be offlined while it's still being used by the
> do_pages_stat_array().

"offline" seems a possible problem.

Best Regards,
Huang, Ying

> Use FOLL_GET to hold the page refcnt to fix this
> potential issue.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 7b1c0b988234..98a968e6f465 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1788,13 +1788,18 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
>  			goto set_status;
>  
>  		/* FOLL_DUMP to ignore special (like zero) pages */
> -		page = follow_page(vma, addr, FOLL_DUMP);
> +		page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
>  
>  		err = PTR_ERR(page);
>  		if (IS_ERR(page))
>  			goto set_status;
>  
> -		err = page ? page_to_nid(page) : -ENOENT;
> +		if (page) {
> +			err = page_to_nid(page);
> +			put_page(page);
> +		} else {
> +			err = -ENOENT;
> +		}
>  set_status:
>  		*status = err;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-03-04  9:34 ` [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
@ 2022-03-07  5:37   ` Huang, Ying
  2022-03-08 12:19     ` Miaohe Lin
  2022-03-07  7:35   ` Alistair Popple
  1 sibling, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  5:37 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> __migration_entry_wait and migration_entry_wait_on_locked assume pte is
> always mapped from caller. But this is not the case when it's called from
> migration_entry_wait_huge and follow_huge_pmd. And a parameter unmap to
> indicate whether pte needs to be unmapped to fix this issue.

This seems a possible issue.

Have you tested it?  It appears that it's possible to trigger the
issue.  If so, you can paste the error log here.

BTW: have you tested the other functionality issues in your patchset?

Best Regards,
Huang, Ying

> Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  include/linux/migrate.h |  2 +-
>  include/linux/swapops.h |  4 ++--
>  mm/filemap.c            | 10 +++++-----
>  mm/hugetlb.c            |  2 +-
>  mm/migrate.c            | 14 ++++++++------
>  5 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 66a34eae8cb6..3ef4ff699bef 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -41,7 +41,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>  extern int migrate_page_move_mapping(struct address_space *mapping,
>  		struct page *newpage, struct page *page, int extra_count);
>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
> -				spinlock_t *ptl);
> +				spinlock_t *ptl, bool unmap);
>  void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
>  void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
>  int folio_migrate_mapping(struct address_space *mapping,
> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> index d356ab4047f7..d66556875d7d 100644
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -213,7 +213,7 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
>  }
>  
>  extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> -					spinlock_t *ptl);
> +					spinlock_t *ptl, bool unmap);
>  extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>  					unsigned long address);
>  extern void migration_entry_wait_huge(struct vm_area_struct *vma,
> @@ -235,7 +235,7 @@ static inline int is_migration_entry(swp_entry_t swp)
>  }
>  
>  static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> -					spinlock_t *ptl) { }
> +					spinlock_t *ptl, bool unmap) { }
>  static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>  					 unsigned long address) { }
>  static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 8f7e6088ee2a..18c353d52aae 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1389,6 +1389,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>   * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
>   *        for pte entries, pass NULL for pmd entries.
>   * @ptl: already locked ptl. This function will drop the lock.
> + * @unmap: indicating whether ptep need to be unmapped.
>   *
>   * Wait for a migration entry referencing the given page to be removed. This is
>   * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
> @@ -1402,7 +1403,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>   * there.
>   */
>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
> -				spinlock_t *ptl)
> +				spinlock_t *ptl, bool unmap)
>  {
>  	struct wait_page_queue wait_page;
>  	wait_queue_entry_t *wait = &wait_page.wait;
> @@ -1439,10 +1440,9 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>  	 * a valid reference to the page, and it must take the ptl to remove the
>  	 * migration entry. So the page is valid until the ptl is dropped.
>  	 */
> -	if (ptep)
> -		pte_unmap_unlock(ptep, ptl);
> -	else
> -		spin_unlock(ptl);
> +	spin_unlock(ptl);
> +	if (unmap && ptep)
> +		pte_unmap(ptep);
>  
>  	for (;;) {
>  		unsigned int flags;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 07668781c246..8088128c25db 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6713,7 +6713,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  	} else {
>  		if (is_hugetlb_entry_migration(pte)) {
>  			spin_unlock(ptl);
> -			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
> +			__migration_entry_wait(mm, (pte_t *)pmd, ptl, false);
>  			goto retry;
>  		}
>  		/*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 98a968e6f465..5519261f54fe 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -281,7 +281,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked)
>   * When we return from this function the fault will be retried.
>   */
>  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> -				spinlock_t *ptl)
> +				spinlock_t *ptl, bool unmap)
>  {
>  	pte_t pte;
>  	swp_entry_t entry;
> @@ -295,10 +295,12 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>  	if (!is_migration_entry(entry))
>  		goto out;
>  
> -	migration_entry_wait_on_locked(entry, ptep, ptl);
> +	migration_entry_wait_on_locked(entry, ptep, ptl, unmap);
>  	return;
>  out:
> -	pte_unmap_unlock(ptep, ptl);
> +	spin_unlock(ptl);
> +	if (unmap)
> +		pte_unmap(ptep);
>  }
>  
>  void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> @@ -306,14 +308,14 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>  {
>  	spinlock_t *ptl = pte_lockptr(mm, pmd);
>  	pte_t *ptep = pte_offset_map(pmd, address);
> -	__migration_entry_wait(mm, ptep, ptl);
> +	__migration_entry_wait(mm, ptep, ptl, true);
>  }
>  
>  void migration_entry_wait_huge(struct vm_area_struct *vma,
>  		struct mm_struct *mm, pte_t *pte)
>  {
>  	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
> -	__migration_entry_wait(mm, pte, ptl);
> +	__migration_entry_wait(mm, pte, ptl, false);
>  }
>  
>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> @@ -324,7 +326,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>  	ptl = pmd_lock(mm, pmd);
>  	if (!is_pmd_migration_entry(*pmd))
>  		goto unlock;
> -	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
> +	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl, false);
>  	return;
>  unlock:
>  	spin_unlock(ptl);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-07  5:02     ` Huang, Ying
@ 2022-03-07  6:00       ` Baolin Wang
  2022-03-07 12:03         ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Baolin Wang @ 2022-03-07  6:00 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Miaohe Lin, akpm, mike.kravetz, shy828301, willy, ziy, minchan,
	apopple, dave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel



On 3/7/2022 1:02 PM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> Hi Miaohe,
>>
>> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>>> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
>>> left in thp_split_pages list. We should move them back to migration
>>> list so that they could be put back to the right list by the caller
>>> otherwise the page refcnt will be leaked here. Also adjust nr_failed
>>> and nr_thp_failed accordingly to make vm events account more accurate.
>>> Fixes: b5bade978e9b ("mm: migrate: fix the return value of
>>> migrate_pages()")
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>    mm/migrate.c | 9 +++++++++
>>>    1 file changed, 9 insertions(+)
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index e0db06927f02..6c2dfed2ddb8 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>    				}
>>>      				nr_failed_pages += nr_subpages;
>>> +				/*
>>> +				 * There might be some subpages of fail-to-migrate THPs
>>> +				 * left in thp_split_pages list. Move them back to migration
>>> +				 * list so that they could be put back to the right list by
>>> +				 * the caller otherwise the page refcnt will be leaked.
>>> +				 */
>>> +				list_splice_init(&thp_split_pages, from);
>>> +				nr_failed += retry;
>>> +				nr_thp_failed += thp_retry;
>>
>> Yes, I think we missed this case before, and your patch looks
>> right. But we should also update the 'rc' to return the correct number
>> of pages that were not migrated, right?
> 
> Per my understanding, -ENOMEM should be returned to indicate an fatal
> error.
> 

Ah, right. Sorry for noise.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline
  2022-03-07  5:21   ` Huang, Ying
@ 2022-03-07  7:01     ` Muchun Song
  2022-03-07  7:42       ` Huang, Ying
  0 siblings, 1 reply; 73+ messages in thread
From: Muchun Song @ 2022-03-07  7:01 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Miaohe Lin, Andrew Morton, Mike Kravetz, Yang Shi,
	Matthew Wilcox, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On Mon, Mar 7, 2022 at 1:22 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Miaohe Lin <linmiaohe@huawei.com> writes:
>
> > When follow_page peeks a page, the page could be reclaimed under heavy
> > memory pressure
>
> I don't think that memory pressure and reclaiming will be an issue.

I think he means a page first to be reclaimed then to be offline
could encounter this issue and reclaiming is a precondition.

Thanks.

>
> > and thus be offlined while it's still being used by the
> > do_pages_stat_array().
>
> "offline" seems a possible problem.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration
  2022-03-07  5:14     ` Huang, Ying
@ 2022-03-07  7:04       ` Baolin Wang
  2022-03-08 11:46         ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Baolin Wang @ 2022-03-07  7:04 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Miaohe Lin, akpm, mike.kravetz, shy828301, willy, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel



On 3/7/2022 1:14 PM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>>> If we failed to setup hotplug state callbacks for mm/demotion:online in
>>> some corner cases, node_demotion will be left uninitialized. Invalid node
>>> might be returned from the next_demotion_node() when doing reclaim-based
>>> migration. Use kcalloc to allocate node_demotion to fix the issue.
>>> Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes
>>> demotion")
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>    mm/migrate.c | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 279940c0c064..7b1c0b988234 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -2516,9 +2516,9 @@ static int __init migrate_on_reclaim_init(void)
>>>    {
>>>    	int ret;
>>>    -	node_demotion = kmalloc_array(nr_node_ids,
>>> -				      sizeof(struct demotion_nodes),
>>> -				      GFP_KERNEL);
>>> +	node_demotion = kcalloc(nr_node_ids,
>>> +				sizeof(struct demotion_nodes),
>>> +				GFP_KERNEL);
>>
>> Nit: not sure if this is worthy of this rare corner case, but I think
>> the target demotion nodes' default value should be NUMA_NO_NODE
>> instead of 0.
> 
> The "nr" field of "struct demotion_nodes" should be initialized as 0.  I
> think that is checked before "nodes[]" field.

Right, but it will be confusing that if nr = 0, while the nodes[] still 
contains valid node id 0. While we are at this, why not initialize the 
node_demotion structure with a clear default value? Anyway, no strong 
opinion on this :)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-03-04  9:34 ` [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
  2022-03-07  5:37   ` Huang, Ying
@ 2022-03-07  7:35   ` Alistair Popple
  2022-03-08 11:55     ` Miaohe Lin
  1 sibling, 1 reply; 73+ messages in thread
From: Alistair Popple @ 2022-03-07  7:35 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4648 bytes --]

Miaohe Lin <linmiaohe@huawei.com> writes:

> __migration_entry_wait and migration_entry_wait_on_locked assume pte is
> always mapped from caller. But this is not the case when it's called from
> migration_entry_wait_huge and follow_huge_pmd. And a parameter unmap to
> indicate whether pte needs to be unmapped to fix this issue.

[...]

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 8f7e6088ee2a..18c353d52aae 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1389,6 +1389,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>   * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
>   *        for pte entries, pass NULL for pmd entries.
>   * @ptl: already locked ptl. This function will drop the lock.
> + * @unmap: indicating whether ptep need to be unmapped.
>   *
>   * Wait for a migration entry referencing the given page to be removed. This is
>   * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
> @@ -1402,7 +1403,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>   * there.
>   */
>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
> -				spinlock_t *ptl)
> +				spinlock_t *ptl, bool unmap)

It's a pity we have to pass unmap all the way down to
migration_entry_wait_on_locked().

>  {
>  	struct wait_page_queue wait_page;
>  	wait_queue_entry_t *wait = &wait_page.wait;
> @@ -1439,10 +1440,9 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>  	 * a valid reference to the page, and it must take the ptl to remove the
>  	 * migration entry. So the page is valid until the ptl is dropped.
>  	 */
> -	if (ptep)
> -		pte_unmap_unlock(ptep, ptl);
> -	else
> -		spin_unlock(ptl);
> +	spin_unlock(ptl);
> +	if (unmap && ptep)
> +		pte_unmap(ptep);

However we might not have to - afaict this is the only usage of ptep so callers
could do the pte_unmap() prior to calling migration_entry_wait_on_locked(). We
could then remove both the `ptep` and `unmap` parameters. Ie:

migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl)

This does assume it's ok to change the order of pte unmap/ptl unlock operations.
I'm not terribly familiar with CONFIG_HIGHPTE systems, but it seems like that
should be ok.

- Alistair

>
>  	for (;;) {
>  		unsigned int flags;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 07668781c246..8088128c25db 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6713,7 +6713,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  	} else {
>  		if (is_hugetlb_entry_migration(pte)) {
>  			spin_unlock(ptl);
> -			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
> +			__migration_entry_wait(mm, (pte_t *)pmd, ptl, false);
>  			goto retry;
>  		}
>  		/*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 98a968e6f465..5519261f54fe 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -281,7 +281,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked)
>   * When we return from this function the fault will be retried.
>   */
>  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> -				spinlock_t *ptl)
> +				spinlock_t *ptl, bool unmap)
>  {
>  	pte_t pte;
>  	swp_entry_t entry;
> @@ -295,10 +295,12 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>  	if (!is_migration_entry(entry))
>  		goto out;
>
> -	migration_entry_wait_on_locked(entry, ptep, ptl);
> +	migration_entry_wait_on_locked(entry, ptep, ptl, unmap);
>  	return;
>  out:
> -	pte_unmap_unlock(ptep, ptl);
> +	spin_unlock(ptl);
> +	if (unmap)
> +		pte_unmap(ptep);
>  }
>
>  void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> @@ -306,14 +308,14 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>  {
>  	spinlock_t *ptl = pte_lockptr(mm, pmd);
>  	pte_t *ptep = pte_offset_map(pmd, address);
> -	__migration_entry_wait(mm, ptep, ptl);
> +	__migration_entry_wait(mm, ptep, ptl, true);
>  }
>
>  void migration_entry_wait_huge(struct vm_area_struct *vma,
>  		struct mm_struct *mm, pte_t *pte)
>  {
>  	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
> -	__migration_entry_wait(mm, pte, ptl);
> +	__migration_entry_wait(mm, pte, ptl, false);
>  }
>
>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> @@ -324,7 +326,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>  	ptl = pmd_lock(mm, pmd);
>  	if (!is_pmd_migration_entry(*pmd))
>  		goto unlock;
> -	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
> +	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl, false);
>  	return;
>  unlock:
>  	spin_unlock(ptl);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline
  2022-03-07  7:01     ` Muchun Song
@ 2022-03-07  7:42       ` Huang, Ying
  2022-03-08 11:33         ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-07  7:42 UTC (permalink / raw)
  To: Muchun Song
  Cc: Miaohe Lin, Andrew Morton, Mike Kravetz, Yang Shi,
	Matthew Wilcox, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

Muchun Song <songmuchun@bytedance.com> writes:

> On Mon, Mar 7, 2022 at 1:22 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>
>> > When follow_page peeks a page, the page could be reclaimed under heavy
>> > memory pressure
>>
>> I don't think that memory pressure and reclaiming will be an issue.
>
> I think he means a page first to be reclaimed then to be offline
> could encounter this issue and reclaiming is a precondition.

I don't think reclaiming is a precondition.  It seems possible that the
virtual page is migrated, then the physical page is offlined.

Best Regards,
Huang, Ying

> Thanks.
>
>>
>> > and thus be offlined while it's still being used by the
>> > do_pages_stat_array().
>>
>> "offline" seems a possible problem.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 03/16] mm/migration: remove unneeded local variable page_lru
  2022-03-04  9:33 ` [PATCH 03/16] mm/migration: remove unneeded local variable page_lru Miaohe Lin
@ 2022-03-07 10:58   ` Alistair Popple
  2022-03-08 11:29     ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Alistair Popple @ 2022-03-07 10:58 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]

Miaohe Lin <linmiaohe@huawei.com> writes:

> We can use page_is_file_lru() directly to help account the isolated
> pages to simplify the code a bit as same as local variable follflags.

Looks good, but there are two independent changes here. Even though they are
small they should probably be split into two patches.

> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index bc1867a5706c..da5a81052468 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1609,7 +1609,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  {
>  	struct vm_area_struct *vma;
>  	struct page *page;
> -	unsigned int follflags;
>  	int err;
>
>  	mmap_read_lock(mm);
> @@ -1619,8 +1618,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  		goto out;
>
>  	/* FOLL_DUMP to ignore special (like zero) pages */
> -	follflags = FOLL_GET | FOLL_DUMP;
> -	page = follow_page(vma, addr, follflags);
> +	page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
>
>  	err = PTR_ERR(page);
>  	if (IS_ERR(page))
> @@ -2033,7 +2031,6 @@ static struct page *alloc_misplaced_dst_page_thp(struct page *page,
>
>  static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>  {
> -	int page_lru;
>  	int nr_pages = thp_nr_pages(page);
>  	int order = compound_order(page);
>
> @@ -2060,8 +2057,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>  	if (isolate_lru_page(page))
>  		return 0;
>
> -	page_lru = page_is_file_lru(page);
> -	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru,
> +	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
>  			    nr_pages);
>
>  	/*

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-05  2:23   ` Muchun Song
@ 2022-03-07 11:46     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-07 11:46 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox,
	Huang Ying, Zi Yan, Minchan Kim, Alistair Popple, ave.hansen,
	o451686892, Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On 2022/3/5 10:23, Muchun Song wrote:
> On Fri, Mar 4, 2022 at 5:36 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> We should return errno (-EBUSY here) when failed to isolate the huge page
>> rather than always return 1 which could confuse the user.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> IIUC, this should be a bug fix since the status reported to the user is
> wrong.  A Fixes tag may be needed.  But anyway:
> 

Agree. Thanks for pointing this out and many thanks for your review. :)

> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> 
> Thanks.
> 
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat
  2022-03-07  1:14   ` Andrew Morton
@ 2022-03-07 11:51     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-07 11:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

On 2022/3/7 9:14, Andrew Morton wrote:
> On Fri, 4 Mar 2022 17:34:00 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
>> We could use helper macro min_t to help set the chunk_nr to simplify
>> the code.
>>
>> ...
>>
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1858,9 +1858,7 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
>>  	while (nr_pages) {
>>  		unsigned long chunk_nr;
>>  
>> -		chunk_nr = nr_pages;
>> -		if (chunk_nr > DO_PAGES_STAT_CHUNK_NR)
>> -			chunk_nr = DO_PAGES_STAT_CHUNK_NR;
>> +		chunk_nr = min_t(unsigned long, nr_pages, DO_PAGES_STAT_CHUNK_NR);
>>  
>>  		if (in_compat_syscall()) {
>>  			if (get_compat_pages_array(chunk_pages, pages,
> 
> Getting the types correct is better than using min_t().
> 

Looks good. Many thanks for your suggestion. Will do it in v2.

Thanks.

> --- a/mm/migrate.c~mm-migration-use-helper-macro-min_t-in-do_pages_stat-fix
> +++ a/mm/migrate.c
> @@ -1844,14 +1844,12 @@ static int do_pages_stat(struct mm_struc
>  			 const void __user * __user *pages,
>  			 int __user *status)
>  {
> -#define DO_PAGES_STAT_CHUNK_NR 16
> +#define DO_PAGES_STAT_CHUNK_NR 16UL
>  	const void __user *chunk_pages[DO_PAGES_STAT_CHUNK_NR];
>  	int chunk_status[DO_PAGES_STAT_CHUNK_NR];
>  
>  	while (nr_pages) {
> -		unsigned long chunk_nr;
> -
> -		chunk_nr = min_t(unsigned long, nr_pages, DO_PAGES_STAT_CHUNK_NR);
> +		unsigned long chunk_nr = min(nr_pages, DO_PAGES_STAT_CHUNK_NR);
>  
>  		if (in_compat_syscall()) {
>  			if (get_compat_pages_array(chunk_pages, pages,
> _
> 
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-07  1:57   ` Baolin Wang
  2022-03-07  5:02     ` Huang, Ying
@ 2022-03-07 12:01     ` Miaohe Lin
  1 sibling, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-07 12:01 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

On 2022/3/7 9:57, Baolin Wang wrote:
> Hi Miaohe,
> 
> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
>> left in thp_split_pages list. We should move them back to migration
>> list so that they could be put back to the right list by the caller
>> otherwise the page refcnt will be leaked here. Also adjust nr_failed
>> and nr_thp_failed accordingly to make vm events account more accurate.
>>
>> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>   mm/migrate.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e0db06927f02..6c2dfed2ddb8 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>                   }
>>                     nr_failed_pages += nr_subpages;
>> +                /*
>> +                 * There might be some subpages of fail-to-migrate THPs
>> +                 * left in thp_split_pages list. Move them back to migration
>> +                 * list so that they could be put back to the right list by
>> +                 * the caller otherwise the page refcnt will be leaked.
>> +                 */
>> +                list_splice_init(&thp_split_pages, from);
>> +                nr_failed += retry;
>> +                nr_thp_failed += thp_retry;
> 
> Yes, I think we missed this case before, and your patch looks right. But we should also update the 'rc' to return the correct number of pages that were not migrated, right?

I'am not sure. -ENOMEM case always returns -ENOMEM since commit 95a402c3847c ("[PATCH] page migration:
use allocator function for migrate_pages()"). So I did not change rc. But I think you're right. We should
return the correct number of pages that were not migrated in this case.

Thanks.

> .


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-07  6:00       ` Baolin Wang
@ 2022-03-07 12:03         ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-07 12:03 UTC (permalink / raw)
  To: Baolin Wang, Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	dave.hansen, o451686892, almasrymina, jhubbard, rcampbell,
	peterx, naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 14:00, Baolin Wang wrote:
> 
> 
> On 3/7/2022 1:02 PM, Huang, Ying wrote:
>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>
>>> Hi Miaohe,
>>>
>>> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>>>> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
>>>> left in thp_split_pages list. We should move them back to migration
>>>> list so that they could be put back to the right list by the caller
>>>> otherwise the page refcnt will be leaked here. Also adjust nr_failed
>>>> and nr_thp_failed accordingly to make vm events account more accurate.
>>>> Fixes: b5bade978e9b ("mm: migrate: fix the return value of
>>>> migrate_pages()")
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>    mm/migrate.c | 9 +++++++++
>>>>    1 file changed, 9 insertions(+)
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index e0db06927f02..6c2dfed2ddb8 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>                    }
>>>>                      nr_failed_pages += nr_subpages;
>>>> +                /*
>>>> +                 * There might be some subpages of fail-to-migrate THPs
>>>> +                 * left in thp_split_pages list. Move them back to migration
>>>> +                 * list so that they could be put back to the right list by
>>>> +                 * the caller otherwise the page refcnt will be leaked.
>>>> +                 */
>>>> +                list_splice_init(&thp_split_pages, from);
>>>> +                nr_failed += retry;
>>>> +                nr_thp_failed += thp_retry;
>>>
>>> Yes, I think we missed this case before, and your patch looks
>>> right. But we should also update the 'rc' to return the correct number
>>> of pages that were not migrated, right?
>>
>> Per my understanding, -ENOMEM should be returned to indicate an fatal
>> error.
>>
> 
> Ah, right. Sorry for noise.
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Oh, I missed this email. So we should return -ENOMEM in this case. Many thanks for both of you.

> .


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages
  2022-03-07  5:01   ` Huang, Ying
@ 2022-03-07 12:11     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-07 12:11 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 13:01, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
>> left in thp_split_pages list. We should move them back to migration
>> list so that they could be put back to the right list by the caller
>> otherwise the page refcnt will be leaked here. Also adjust nr_failed
>> and nr_thp_failed accordingly to make vm events account more accurate.
>>
>> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/migrate.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e0db06927f02..6c2dfed2ddb8 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1422,6 +1422,15 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  				}
>>  
>>  				nr_failed_pages += nr_subpages;
>> +				/*
>> +				 * There might be some subpages of fail-to-migrate THPs
>> +				 * left in thp_split_pages list. Move them back to migration
>> +				 * list so that they could be put back to the right list by
>> +				 * the caller otherwise the page refcnt will be leaked.
>> +				 */
>> +				list_splice_init(&thp_split_pages, from);
>> +				nr_failed += retry;
> 
> It appears that we don't need to change nr_failed, because we don't use
> it for this situation.  Otherwise looks good to me.
> 

You're right. nr_failed is not used for this case.

> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Many thanks for your review.

> 
> Best Regards,
> Huang, Ying
> 
>> +				nr_thp_failed += thp_retry;
>>  				goto out;
>>  			case -EAGAIN:
>>  				if (is_thp)
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-07  2:14   ` Baolin Wang
@ 2022-03-07 12:20     ` Miaohe Lin
  2022-03-08  1:32       ` Baolin Wang
  0 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-07 12:20 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

On 2022/3/7 10:14, Baolin Wang wrote:
> Hi Miaohe,
> 
> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>> We should return errno (-EBUSY here) when failed to isolate the huge page
>> rather than always return 1 which could confuse the user.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>   mm/migrate.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 6c2dfed2ddb8..279940c0c064 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>           goto out_putpage;
>>         if (PageHuge(page)) {
>> -        if (PageHead(page)) {
>> -            isolate_huge_page(page, pagelist);
>> -            err = 1;
>> -        }
>> +        if (PageHead(page))
>> +            err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;
> 
> Could you elaborate on which case the huge page isolation can be failed in this case? Or you met a real problem? Cause now we've got this head huge page refcnt, I can not see why we'll fail to isolate this huge page.

IIUC, this could happen when hugepage is under migration which cleared HPageMigratable. Page refcnt cannot
prevent isolate_huge_page from happening. Or am I miss something?

Many thanks.

> .


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 10/16] mm/migration: remove PG_writeback handle in folio_migrate_flags
  2022-03-07  1:21   ` Andrew Morton
@ 2022-03-07 12:44     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-07 12:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

On 2022/3/7 9:21, Andrew Morton wrote:
> On Fri, 4 Mar 2022 17:34:03 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
>> When newfolio reaches here, it's guaranteed that PG_writeback is not set
>> because caller ensures writeback must have been completed. Remove this
>> unneeded check and cleanup the relevant comment.
> 
> What guarantees that writeback cannot start after the caller has checked?

Drivers should set writeback flag under page lock. But I'am not sure this is always obeyed.

> 
> I see no such check in iomap_migrate_page()?

iomap_migrate_page acts as a migratepage callback. The check is actually done in __unmap_and_move.

> 
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -549,18 +549,6 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
>>  	if (!folio_test_hugetlb(folio))
>>  		folio->private = NULL;
>>  
>> -	/*
>> -	 * If any waiters have accumulated on the new page then
>> -	 * wake them up.
>> -	 */
>> -	if (folio_test_writeback(newfolio))
>> -		folio_end_writeback(newfolio);

IIUC, this should also be broken too. If a page can be flagged writeback under page lock,
we can't simply do folio_end_writeback here as page might still be under the use of drivers.
More works should be done to handle the case correctly.

>> -
>> -	/*
>> -	 * PG_readahead shares the same bit with PG_reclaim.  The above
>> -	 * end_page_writeback() may clear PG_readahead mistakenly, so set the
>> -	 * bit after that.
>> -	 */
>>  	if (folio_test_readahead(folio))
>>  		folio_set_readahead(newfolio);
>>  
> 
> folio_migrate_flags() and folio_migrate_copy() are global,
> export-to-modules functions but have no interface documentation.  That
> was bad of us.  
> 
> I wonder why those two functions are exported to modules anyway.

That's really a pity.

> .
> 

Many thanks for your comment and reply!

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-07 12:20     ` Miaohe Lin
@ 2022-03-08  1:32       ` Baolin Wang
  2022-03-08  6:34         ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Baolin Wang @ 2022-03-08  1:32 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel



On 3/7/2022 8:20 PM, Miaohe Lin wrote:
> On 2022/3/7 10:14, Baolin Wang wrote:
>> Hi Miaohe,
>>
>> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>>> We should return errno (-EBUSY here) when failed to isolate the huge page
>>> rather than always return 1 which could confuse the user.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>    mm/migrate.c | 6 ++----
>>>    1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 6c2dfed2ddb8..279940c0c064 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>>            goto out_putpage;
>>>          if (PageHuge(page)) {
>>> -        if (PageHead(page)) {
>>> -            isolate_huge_page(page, pagelist);
>>> -            err = 1;
>>> -        }
>>> +        if (PageHead(page))
>>> +            err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;
>>
>> Could you elaborate on which case the huge page isolation can be failed in this case? Or you met a real problem? Cause now we've got this head huge page refcnt, I can not see why we'll fail to isolate this huge page.
> 
> IIUC, this could happen when hugepage is under migration which cleared HPageMigratable. Page refcnt cannot
> prevent isolate_huge_page from happening. Or am I miss something?

Yes, that's possible. Thanks for your explanation. It will be better if 
you can copy the possible scenario description to the commit log to help 
other understand the issue.

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-08  1:32       ` Baolin Wang
@ 2022-03-08  6:34         ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08  6:34 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	apopple, ave.hansen, o451686892, almasrymina, jhubbard,
	rcampbell, peterx, naoya.horiguchi, mhocko, riel, linux-mm,
	linux-kernel

On 2022/3/8 9:32, Baolin Wang wrote:
> 
> 
> On 3/7/2022 8:20 PM, Miaohe Lin wrote:
>> On 2022/3/7 10:14, Baolin Wang wrote:
>>> Hi Miaohe,
>>>
>>> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>>>> We should return errno (-EBUSY here) when failed to isolate the huge page
>>>> rather than always return 1 which could confuse the user.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>    mm/migrate.c | 6 ++----
>>>>    1 file changed, 2 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 6c2dfed2ddb8..279940c0c064 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>>>            goto out_putpage;
>>>>          if (PageHuge(page)) {
>>>> -        if (PageHead(page)) {
>>>> -            isolate_huge_page(page, pagelist);
>>>> -            err = 1;
>>>> -        }
>>>> +        if (PageHead(page))
>>>> +            err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;
>>>
>>> Could you elaborate on which case the huge page isolation can be failed in this case? Or you met a real problem? Cause now we've got this head huge page refcnt, I can not see why we'll fail to isolate this huge page.
>>
>> IIUC, this could happen when hugepage is under migration which cleared HPageMigratable. Page refcnt cannot
>> prevent isolate_huge_page from happening. Or am I miss something?
> 
> Yes, that's possible. Thanks for your explanation. It will be better if you can copy the possible scenario description to the commit log to help other understand the issue.
> 

Sounds reasonable. Will do. Many thanks for review.

> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> .


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 03/16] mm/migration: remove unneeded local variable page_lru
  2022-03-07 10:58   ` Alistair Popple
@ 2022-03-08 11:29     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 11:29 UTC (permalink / raw)
  To: Alistair Popple
  Cc: akpm, mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 18:58, Alistair Popple wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> We can use page_is_file_lru() directly to help account the isolated
>> pages to simplify the code a bit as same as local variable follflags.
> 
> Looks good, but there are two independent changes here. Even though they are
> small they should probably be split into two patches.

Sounds reasonable. Will try to do this in v2. Thanks.

> 
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/migrate.c | 8 ++------
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index bc1867a5706c..da5a81052468 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1609,7 +1609,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>  {
>>  	struct vm_area_struct *vma;
>>  	struct page *page;
>> -	unsigned int follflags;
>>  	int err;
>>
>>  	mmap_read_lock(mm);
>> @@ -1619,8 +1618,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>  		goto out;
>>
>>  	/* FOLL_DUMP to ignore special (like zero) pages */
>> -	follflags = FOLL_GET | FOLL_DUMP;
>> -	page = follow_page(vma, addr, follflags);
>> +	page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
>>
>>  	err = PTR_ERR(page);
>>  	if (IS_ERR(page))
>> @@ -2033,7 +2031,6 @@ static struct page *alloc_misplaced_dst_page_thp(struct page *page,
>>
>>  static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>>  {
>> -	int page_lru;
>>  	int nr_pages = thp_nr_pages(page);
>>  	int order = compound_order(page);
>>
>> @@ -2060,8 +2057,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>>  	if (isolate_lru_page(page))
>>  		return 0;
>>
>> -	page_lru = page_is_file_lru(page);
>> -	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru,
>> +	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
>>  			    nr_pages);
>>
>>  	/*


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline
  2022-03-07  7:42       ` Huang, Ying
@ 2022-03-08 11:33         ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 11:33 UTC (permalink / raw)
  To: Huang, Ying, Muchun Song
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Matthew Wilcox, Zi Yan,
	Minchan Kim, Alistair Popple, ave.hansen, o451686892,
	Mina Almasry, John Hubbard, Ralph Campbell, Peter Xu,
	HORIGUCHI NAOYA(堀口 直也),
	Michal Hocko, riel, Linux Memory Management List, LKML

On 2022/3/7 15:42, Huang, Ying wrote:
> Muchun Song <songmuchun@bytedance.com> writes:
> 
>> On Mon, Mar 7, 2022 at 1:22 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>
>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>
>>>> When follow_page peeks a page, the page could be reclaimed under heavy
>>>> memory pressure
>>>
>>> I don't think that memory pressure and reclaiming will be an issue.
>>
>> I think he means a page first to be reclaimed then to be offline
>> could encounter this issue and reclaiming is a precondition.
> 
> I don't think reclaiming is a precondition.  It seems possible that the
> virtual page is migrated, then the physical page is offlined.
> 

What I indeed mean is a page could first be reclaimed, migrated and so on.
And then be offlined. Sorry for confusing.

Thanks both of you.

> Best Regards,
> Huang, Ying
> 
>> Thanks.
>>
>>>
>>>> and thus be offlined while it's still being used by the
>>>> do_pages_stat_array().
>>>
>>> "offline" seems a possible problem.
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked
  2022-03-07  2:00   ` Huang, Ying
@ 2022-03-08 11:41     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 11:41 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 10:00, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> When mapping_locked is true, TTU_RMAP_LOCKED is always set to ttu. We can
>> check ttu instead so mapping_locked can be removed. And ttu is either 0
>> or TTU_RMAP_LOCKED now. Change '|=' to '=' to reflect this.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/migrate.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 42e3a944f35c..50bc62d85eaf 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1230,7 +1230,6 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>  		goto put_anon;
>>  
>>  	if (page_mapped(hpage)) {
>> -		bool mapping_locked = false;
>>  		enum ttu_flags ttu = 0;
>>  
>>  		if (!PageAnon(hpage)) {
>> @@ -1244,14 +1243,13 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>  			if (unlikely(!mapping))
>>  				goto unlock_put_anon;
>>  
>> -			mapping_locked = true;
>> -			ttu |= TTU_RMAP_LOCKED;
>> +			ttu = TTU_RMAP_LOCKED;
>>  		}
>>  
>>  		try_to_migrate(src, ttu);
>>  		page_was_mapped = 1;
>>  
>> -		if (mapping_locked)
>> +		if (ttu & TTU_RMAP_LOCKED)
>>  			i_mmap_unlock_write(mapping);
>>  	}
> 
> I think this is a code style issue.  The original code looks more
> readable for me.
> 

I think "ttu |= TTU_RMAP_LOCKED;" is the product of the code changing.
And (ttu & TTU_RMAP_LOCKED) can do the right thing as mapping_locked
while removing the page_was_mapped. But if you insist on this, I will
drop this patch.

Thanks.

> Best Regards,
> Huang, Ying
> 
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 02/16] mm/migration: remove unneeded out label
  2022-03-07  2:03   ` Huang, Ying
@ 2022-03-08 11:44     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 11:44 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 10:03, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> We can do prep_transhuge_page when newpage is not NULL. Thus we can remove
>> out label to simplify the code.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/migrate.c | 7 ++-----
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 50bc62d85eaf..bc1867a5706c 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2025,12 +2025,9 @@ static struct page *alloc_misplaced_dst_page_thp(struct page *page,
>>  
>>  	newpage = alloc_pages_node(nid, (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>>  				   HPAGE_PMD_ORDER);
>> -	if (!newpage)
>> -		goto out;
>> +	if (newpage)
>> +		prep_transhuge_page(newpage);
>>  
>> -	prep_transhuge_page(newpage);
>> -
>> -out:
>>  	return newpage;
>>  }
> 
> I don't think this change is necessary.  The original code is simple and
> follows the common practice for error processing.  The new code is OK,
> but it's unnecessary to change.
> 

IMO, this out label looks 'overkill'. We should remove it and make code more succinct.
Does this make sense to you? Thanks.

> Best Regards,
> Huang, Ying
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration
  2022-03-07  7:04       ` Baolin Wang
@ 2022-03-08 11:46         ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 11:46 UTC (permalink / raw)
  To: Baolin Wang, Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 15:04, Baolin Wang wrote:
> 
> 
> On 3/7/2022 1:14 PM, Huang, Ying wrote:
>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>
>>> On 3/4/2022 5:34 PM, Miaohe Lin wrote:
>>>> If we failed to setup hotplug state callbacks for mm/demotion:online in
>>>> some corner cases, node_demotion will be left uninitialized. Invalid node
>>>> might be returned from the next_demotion_node() when doing reclaim-based
>>>> migration. Use kcalloc to allocate node_demotion to fix the issue.
>>>> Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes
>>>> demotion")
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>    mm/migrate.c | 6 +++---
>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 279940c0c064..7b1c0b988234 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -2516,9 +2516,9 @@ static int __init migrate_on_reclaim_init(void)
>>>>    {
>>>>        int ret;
>>>>    -    node_demotion = kmalloc_array(nr_node_ids,
>>>> -                      sizeof(struct demotion_nodes),
>>>> -                      GFP_KERNEL);
>>>> +    node_demotion = kcalloc(nr_node_ids,
>>>> +                sizeof(struct demotion_nodes),
>>>> +                GFP_KERNEL);
>>>
>>> Nit: not sure if this is worthy of this rare corner case, but I think
>>> the target demotion nodes' default value should be NUMA_NO_NODE
>>> instead of 0.
>>
>> The "nr" field of "struct demotion_nodes" should be initialized as 0.  I
>> think that is checked before "nodes[]" field.
> 
> Right, but it will be confusing that if nr = 0, while the nodes[] still contains valid node id 0. While we are at this, why not initialize the node_demotion structure with a clear default value? Anyway, no strong opinion on this :)

IMO, this might not deserve initializing the node_demotion structure with a clear default value as
cpuhp_setup_state fails at init time should be a rare case.

Thanks both of you.

> .


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-03-07  7:35   ` Alistair Popple
@ 2022-03-08 11:55     ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 11:55 UTC (permalink / raw)
  To: Alistair Popple
  Cc: akpm, mike.kravetz, shy828301, willy, ying.huang, ziy, minchan,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 15:35, Alistair Popple wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> __migration_entry_wait and migration_entry_wait_on_locked assume pte is
>> always mapped from caller. But this is not the case when it's called from
>> migration_entry_wait_huge and follow_huge_pmd. And a parameter unmap to
>> indicate whether pte needs to be unmapped to fix this issue.
> 
> [...]
> 
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 8f7e6088ee2a..18c353d52aae 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -1389,6 +1389,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>   * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
>>   *        for pte entries, pass NULL for pmd entries.
>>   * @ptl: already locked ptl. This function will drop the lock.
>> + * @unmap: indicating whether ptep need to be unmapped.
>>   *
>>   * Wait for a migration entry referencing the given page to be removed. This is
>>   * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
>> @@ -1402,7 +1403,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>   * there.
>>   */
>>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>> -				spinlock_t *ptl)
>> +				spinlock_t *ptl, bool unmap)
> 
> It's a pity we have to pass unmap all the way down to
> migration_entry_wait_on_locked().
> 
>>  {
>>  	struct wait_page_queue wait_page;
>>  	wait_queue_entry_t *wait = &wait_page.wait;
>> @@ -1439,10 +1440,9 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>  	 * a valid reference to the page, and it must take the ptl to remove the
>>  	 * migration entry. So the page is valid until the ptl is dropped.
>>  	 */
>> -	if (ptep)
>> -		pte_unmap_unlock(ptep, ptl);
>> -	else
>> -		spin_unlock(ptl);
>> +	spin_unlock(ptl);
>> +	if (unmap && ptep)
>> +		pte_unmap(ptep);
> 
> However we might not have to - afaict this is the only usage of ptep so callers
> could do the pte_unmap() prior to calling migration_entry_wait_on_locked(). We
> could then remove both the `ptep` and `unmap` parameters. Ie:
> 
> migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl)
> 
> This does assume it's ok to change the order of pte unmap/ptl unlock operations.
> I'm not terribly familiar with CONFIG_HIGHPTE systems, but it seems like that
> should be ok.
> 

Looks like a good idea. We can leave the pte_unmap to the caller as only page table
spin_lock is needed to make sure the validity of the page.

Will try to do this in V2. Many thanks.

> - Alistair
> 
>>
>>  	for (;;) {
>>  		unsigned int flags;
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 07668781c246..8088128c25db 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6713,7 +6713,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>>  	} else {
>>  		if (is_hugetlb_entry_migration(pte)) {
>>  			spin_unlock(ptl);
>> -			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
>> +			__migration_entry_wait(mm, (pte_t *)pmd, ptl, false);
>>  			goto retry;
>>  		}
>>  		/*
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 98a968e6f465..5519261f54fe 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -281,7 +281,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked)
>>   * When we return from this function the fault will be retried.
>>   */
>>  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>> -				spinlock_t *ptl)
>> +				spinlock_t *ptl, bool unmap)
>>  {
>>  	pte_t pte;
>>  	swp_entry_t entry;
>> @@ -295,10 +295,12 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>  	if (!is_migration_entry(entry))
>>  		goto out;
>>
>> -	migration_entry_wait_on_locked(entry, ptep, ptl);
>> +	migration_entry_wait_on_locked(entry, ptep, ptl, unmap);
>>  	return;
>>  out:
>> -	pte_unmap_unlock(ptep, ptl);
>> +	spin_unlock(ptl);
>> +	if (unmap)
>> +		pte_unmap(ptep);
>>  }
>>
>>  void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>> @@ -306,14 +308,14 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>  {
>>  	spinlock_t *ptl = pte_lockptr(mm, pmd);
>>  	pte_t *ptep = pte_offset_map(pmd, address);
>> -	__migration_entry_wait(mm, ptep, ptl);
>> +	__migration_entry_wait(mm, ptep, ptl, true);
>>  }
>>
>>  void migration_entry_wait_huge(struct vm_area_struct *vma,
>>  		struct mm_struct *mm, pte_t *pte)
>>  {
>>  	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
>> -	__migration_entry_wait(mm, pte, ptl);
>> +	__migration_entry_wait(mm, pte, ptl, false);
>>  }
>>
>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>> @@ -324,7 +326,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>>  	ptl = pmd_lock(mm, pmd);
>>  	if (!is_pmd_migration_entry(*pmd))
>>  		goto unlock;
>> -	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
>> +	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl, false);
>>  	return;
>>  unlock:
>>  	spin_unlock(ptl);


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/16] mm/migration: reduce the rcu lock duration
  2022-03-07  2:32   ` Huang, Ying
@ 2022-03-08 12:09     ` Miaohe Lin
  2022-03-09  1:02       ` Huang, Ying
  0 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 12:09 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel,
	Christoph Lameter, David Howells, Linus Torvalds

On 2022/3/7 10:32, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> rcu_read_lock is required by grabbing the task refcount but it's not
>> needed for ptrace_may_access. So we could release the rcu lock after
>> task refcount is successfully grabbed to reduce the rcu holding time.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/migrate.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index da5a81052468..26943bd819e8 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1907,17 +1907,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>>  		return ERR_PTR(-ESRCH);
>>  	}
>>  	get_task_struct(task);
>> +	rcu_read_unlock();
>>  
>>  	/*
>>  	 * Check if this process has the right to modify the specified
>>  	 * process. Use the regular "ptrace_may_access()" checks.
>>  	 */
>>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
>> -		rcu_read_unlock();
>>  		mm = ERR_PTR(-EPERM);
>>  		goto out;
>>  	}
>> -	rcu_read_unlock();
>>  
>>  	mm = ERR_PTR(security_task_movememory(task));
>>  	if (IS_ERR(mm))
> 
> Digged some history via `git blame`, found that the RCU read lock is
> extended in the following commit,
> 
> "
> 3268c63eded4612a3d07b56d1e02ce7731e6608e
> Author:     Christoph Lameter <cl@linux.com>
> AuthorDate: Wed Mar 21 16:34:06 2012 -0700
> Commit:     Linus Torvalds <torvalds@linux-foundation.org>
> CommitDate: Wed Mar 21 17:54:58 2012 -0700
> 
> mm: fix move/migrate_pages() race on task struct
> 
> Migration functions perform the rcu_read_unlock too early.  As a result
> the task pointed to may change from under us.  This can result in an oops,
> as reported by Dave Hansen in https://lkml.org/lkml/2012/2/23/302.
> 
> The following patch extend the period of the rcu_read_lock until after the
> permissions checks are done.  We also take a refcount so that the task
> reference is stable when calling security check functions and performing
> cpuset node validation (which takes a mutex).
> 
> The refcount is dropped before actual page migration occurs so there is no
> change to the refcounts held during page migration.
> 
> Also move the determination of the mm of the task struct to immediately
> before the do_migrate*() calls so that it is clear that we switch from
> handling the task during permission checks to the mm for the actual
> migration.  Since the determination is only done once and we then no
> longer use the task_struct we can be sure that we operate on a specific
> address space that will not change from under us.
> "
> 
> After that, the permission checking has been changed from __task_cred()
> to ptrace_may_access().  So the situation may change somewhat.  Cced

In ptrace_may_access, __task_cred is access while holding the rcu read lock.
It seems this is ensured by the ptrace_may_access itself.

> some names found in git history to verify.

Thanks for your carefulness.

> 
> Best Regards,
> Huang, Ying
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-07  5:07   ` Huang, Ying
@ 2022-03-08 12:12     ` Miaohe Lin
  2022-03-09  1:00       ` Huang, Ying
  0 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 12:12 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 13:07, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> We should return errno (-EBUSY here) when failed to isolate the huge page
>> rather than always return 1 which could confuse the user.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/migrate.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 6c2dfed2ddb8..279940c0c064 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>  		goto out_putpage;
>>  
>>  	if (PageHuge(page)) {
>> -		if (PageHead(page)) {
>> -			isolate_huge_page(page, pagelist);
>> -			err = 1;
>> -		}
>> +		if (PageHead(page))
>> +			err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;
> 
> IMHO, it's better to determine the proper errno inside
> isolate_huge_page() instead of in the caller.  If you think it's
> necessary to get errno here.  How about change isolate_huge_page()
> instead?

IMO, -EBUSY should be enough for the user (as they could not do much) and this
errno keeps consistent with the non-hugetlb page case. What do you think?

Thanks.

> 
> Best Regards,
> Huang, Ying
> 
>>  	} else {
>>  		struct page *head;
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-03-07  5:37   ` Huang, Ying
@ 2022-03-08 12:19     ` Miaohe Lin
  2022-03-09  0:56       ` Huang, Ying
  0 siblings, 1 reply; 73+ messages in thread
From: Miaohe Lin @ 2022-03-08 12:19 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/7 13:37, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> __migration_entry_wait and migration_entry_wait_on_locked assume pte is
>> always mapped from caller. But this is not the case when it's called from
>> migration_entry_wait_huge and follow_huge_pmd. And a parameter unmap to
>> indicate whether pte needs to be unmapped to fix this issue.
> 
> This seems a possible issue.
> 
> Have you tested it?  It appears that it's possible to trigger the
> issue.  If so, you can paste the error log here.

This might happen iff on x86 machine with HIGHMEM enabled which is uncommon
now (at least in my work environment). So I can't paste the err log. And The
issues from this series mainly come from the code investigating with some tests.

Thanks. :)

> 
> BTW: have you tested the other functionality issues in your patchset?
> 
> Best Regards,
> Huang, Ying
> 
>> Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  include/linux/migrate.h |  2 +-
>>  include/linux/swapops.h |  4 ++--
>>  mm/filemap.c            | 10 +++++-----
>>  mm/hugetlb.c            |  2 +-
>>  mm/migrate.c            | 14 ++++++++------
>>  5 files changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>> index 66a34eae8cb6..3ef4ff699bef 100644
>> --- a/include/linux/migrate.h
>> +++ b/include/linux/migrate.h
>> @@ -41,7 +41,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>>  extern int migrate_page_move_mapping(struct address_space *mapping,
>>  		struct page *newpage, struct page *page, int extra_count);
>>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>> -				spinlock_t *ptl);
>> +				spinlock_t *ptl, bool unmap);
>>  void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
>>  void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
>>  int folio_migrate_mapping(struct address_space *mapping,
>> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
>> index d356ab4047f7..d66556875d7d 100644
>> --- a/include/linux/swapops.h
>> +++ b/include/linux/swapops.h
>> @@ -213,7 +213,7 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
>>  }
>>  
>>  extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>> -					spinlock_t *ptl);
>> +					spinlock_t *ptl, bool unmap);
>>  extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>  					unsigned long address);
>>  extern void migration_entry_wait_huge(struct vm_area_struct *vma,
>> @@ -235,7 +235,7 @@ static inline int is_migration_entry(swp_entry_t swp)
>>  }
>>  
>>  static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>> -					spinlock_t *ptl) { }
>> +					spinlock_t *ptl, bool unmap) { }
>>  static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>  					 unsigned long address) { }
>>  static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 8f7e6088ee2a..18c353d52aae 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -1389,6 +1389,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>   * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
>>   *        for pte entries, pass NULL for pmd entries.
>>   * @ptl: already locked ptl. This function will drop the lock.
>> + * @unmap: indicating whether ptep need to be unmapped.
>>   *
>>   * Wait for a migration entry referencing the given page to be removed. This is
>>   * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
>> @@ -1402,7 +1403,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>   * there.
>>   */
>>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>> -				spinlock_t *ptl)
>> +				spinlock_t *ptl, bool unmap)
>>  {
>>  	struct wait_page_queue wait_page;
>>  	wait_queue_entry_t *wait = &wait_page.wait;
>> @@ -1439,10 +1440,9 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>  	 * a valid reference to the page, and it must take the ptl to remove the
>>  	 * migration entry. So the page is valid until the ptl is dropped.
>>  	 */
>> -	if (ptep)
>> -		pte_unmap_unlock(ptep, ptl);
>> -	else
>> -		spin_unlock(ptl);
>> +	spin_unlock(ptl);
>> +	if (unmap && ptep)
>> +		pte_unmap(ptep);
>>  
>>  	for (;;) {
>>  		unsigned int flags;
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 07668781c246..8088128c25db 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6713,7 +6713,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>>  	} else {
>>  		if (is_hugetlb_entry_migration(pte)) {
>>  			spin_unlock(ptl);
>> -			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
>> +			__migration_entry_wait(mm, (pte_t *)pmd, ptl, false);
>>  			goto retry;
>>  		}
>>  		/*
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 98a968e6f465..5519261f54fe 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -281,7 +281,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked)
>>   * When we return from this function the fault will be retried.
>>   */
>>  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>> -				spinlock_t *ptl)
>> +				spinlock_t *ptl, bool unmap)
>>  {
>>  	pte_t pte;
>>  	swp_entry_t entry;
>> @@ -295,10 +295,12 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>  	if (!is_migration_entry(entry))
>>  		goto out;
>>  
>> -	migration_entry_wait_on_locked(entry, ptep, ptl);
>> +	migration_entry_wait_on_locked(entry, ptep, ptl, unmap);
>>  	return;
>>  out:
>> -	pte_unmap_unlock(ptep, ptl);
>> +	spin_unlock(ptl);
>> +	if (unmap)
>> +		pte_unmap(ptep);
>>  }
>>  
>>  void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>> @@ -306,14 +308,14 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>  {
>>  	spinlock_t *ptl = pte_lockptr(mm, pmd);
>>  	pte_t *ptep = pte_offset_map(pmd, address);
>> -	__migration_entry_wait(mm, ptep, ptl);
>> +	__migration_entry_wait(mm, ptep, ptl, true);
>>  }
>>  
>>  void migration_entry_wait_huge(struct vm_area_struct *vma,
>>  		struct mm_struct *mm, pte_t *pte)
>>  {
>>  	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
>> -	__migration_entry_wait(mm, pte, ptl);
>> +	__migration_entry_wait(mm, pte, ptl, false);
>>  }
>>  
>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>> @@ -324,7 +326,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>>  	ptl = pmd_lock(mm, pmd);
>>  	if (!is_pmd_migration_entry(*pmd))
>>  		goto unlock;
>> -	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
>> +	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl, false);
>>  	return;
>>  unlock:
>>  	spin_unlock(ptl);
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-03-08 12:19     ` Miaohe Lin
@ 2022-03-09  0:56       ` Huang, Ying
  2022-03-09  8:48         ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-09  0:56 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> On 2022/3/7 13:37, Huang, Ying wrote:
>> Miaohe Lin <linmiaohe@huawei.com> writes:
>> 
>>> __migration_entry_wait and migration_entry_wait_on_locked assume pte is
>>> always mapped from caller. But this is not the case when it's called from
>>> migration_entry_wait_huge and follow_huge_pmd. And a parameter unmap to
>>> indicate whether pte needs to be unmapped to fix this issue.
>> 
>> This seems a possible issue.
>> 
>> Have you tested it?  It appears that it's possible to trigger the
>> issue.  If so, you can paste the error log here.
>
> This might happen iff on x86 machine with HIGHMEM enabled which is uncommon
> now (at least in my work environment).

Yes.  32-bit isn't popular now.  But you can always test it via virtual
machine.

> So I can't paste the err log. And The
> issues from this series mainly come from the code investigating with some tests.

If not too complex, I still think it's better to test your code and
verify the problem.

Best Regards,
Huang, Ying

> Thanks. :)
>
>> 
>> BTW: have you tested the other functionality issues in your patchset?
>> 
>> Best Regards,
>> Huang, Ying
>> 
>>> Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  include/linux/migrate.h |  2 +-
>>>  include/linux/swapops.h |  4 ++--
>>>  mm/filemap.c            | 10 +++++-----
>>>  mm/hugetlb.c            |  2 +-
>>>  mm/migrate.c            | 14 ++++++++------
>>>  5 files changed, 17 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>>> index 66a34eae8cb6..3ef4ff699bef 100644
>>> --- a/include/linux/migrate.h
>>> +++ b/include/linux/migrate.h
>>> @@ -41,7 +41,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>>>  extern int migrate_page_move_mapping(struct address_space *mapping,
>>>  		struct page *newpage, struct page *page, int extra_count);
>>>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>> -				spinlock_t *ptl);
>>> +				spinlock_t *ptl, bool unmap);
>>>  void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
>>>  void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
>>>  int folio_migrate_mapping(struct address_space *mapping,
>>> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
>>> index d356ab4047f7..d66556875d7d 100644
>>> --- a/include/linux/swapops.h
>>> +++ b/include/linux/swapops.h
>>> @@ -213,7 +213,7 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
>>>  }
>>>  
>>>  extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>> -					spinlock_t *ptl);
>>> +					spinlock_t *ptl, bool unmap);
>>>  extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>>  					unsigned long address);
>>>  extern void migration_entry_wait_huge(struct vm_area_struct *vma,
>>> @@ -235,7 +235,7 @@ static inline int is_migration_entry(swp_entry_t swp)
>>>  }
>>>  
>>>  static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>> -					spinlock_t *ptl) { }
>>> +					spinlock_t *ptl, bool unmap) { }
>>>  static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>>  					 unsigned long address) { }
>>>  static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 8f7e6088ee2a..18c353d52aae 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -1389,6 +1389,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>>   * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
>>>   *        for pte entries, pass NULL for pmd entries.
>>>   * @ptl: already locked ptl. This function will drop the lock.
>>> + * @unmap: indicating whether ptep need to be unmapped.
>>>   *
>>>   * Wait for a migration entry referencing the given page to be removed. This is
>>>   * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
>>> @@ -1402,7 +1403,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>>   * there.
>>>   */
>>>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>> -				spinlock_t *ptl)
>>> +				spinlock_t *ptl, bool unmap)
>>>  {
>>>  	struct wait_page_queue wait_page;
>>>  	wait_queue_entry_t *wait = &wait_page.wait;
>>> @@ -1439,10 +1440,9 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>>  	 * a valid reference to the page, and it must take the ptl to remove the
>>>  	 * migration entry. So the page is valid until the ptl is dropped.
>>>  	 */
>>> -	if (ptep)
>>> -		pte_unmap_unlock(ptep, ptl);
>>> -	else
>>> -		spin_unlock(ptl);
>>> +	spin_unlock(ptl);
>>> +	if (unmap && ptep)
>>> +		pte_unmap(ptep);
>>>  
>>>  	for (;;) {
>>>  		unsigned int flags;
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 07668781c246..8088128c25db 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -6713,7 +6713,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>>>  	} else {
>>>  		if (is_hugetlb_entry_migration(pte)) {
>>>  			spin_unlock(ptl);
>>> -			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
>>> +			__migration_entry_wait(mm, (pte_t *)pmd, ptl, false);
>>>  			goto retry;
>>>  		}
>>>  		/*
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 98a968e6f465..5519261f54fe 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -281,7 +281,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked)
>>>   * When we return from this function the fault will be retried.
>>>   */
>>>  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>> -				spinlock_t *ptl)
>>> +				spinlock_t *ptl, bool unmap)
>>>  {
>>>  	pte_t pte;
>>>  	swp_entry_t entry;
>>> @@ -295,10 +295,12 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>>  	if (!is_migration_entry(entry))
>>>  		goto out;
>>>  
>>> -	migration_entry_wait_on_locked(entry, ptep, ptl);
>>> +	migration_entry_wait_on_locked(entry, ptep, ptl, unmap);
>>>  	return;
>>>  out:
>>> -	pte_unmap_unlock(ptep, ptl);
>>> +	spin_unlock(ptl);
>>> +	if (unmap)
>>> +		pte_unmap(ptep);
>>>  }
>>>  
>>>  void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>> @@ -306,14 +308,14 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>>  {
>>>  	spinlock_t *ptl = pte_lockptr(mm, pmd);
>>>  	pte_t *ptep = pte_offset_map(pmd, address);
>>> -	__migration_entry_wait(mm, ptep, ptl);
>>> +	__migration_entry_wait(mm, ptep, ptl, true);
>>>  }
>>>  
>>>  void migration_entry_wait_huge(struct vm_area_struct *vma,
>>>  		struct mm_struct *mm, pte_t *pte)
>>>  {
>>>  	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
>>> -	__migration_entry_wait(mm, pte, ptl);
>>> +	__migration_entry_wait(mm, pte, ptl, false);
>>>  }
>>>  
>>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>> @@ -324,7 +326,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>>>  	ptl = pmd_lock(mm, pmd);
>>>  	if (!is_pmd_migration_entry(*pmd))
>>>  		goto unlock;
>>> -	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
>>> +	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl, false);
>>>  	return;
>>>  unlock:
>>>  	spin_unlock(ptl);
>> .
>> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-08 12:12     ` Miaohe Lin
@ 2022-03-09  1:00       ` Huang, Ying
  2022-03-09  8:29         ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-09  1:00 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

Miaohe Lin <linmiaohe@huawei.com> writes:

> On 2022/3/7 13:07, Huang, Ying wrote:
>> Miaohe Lin <linmiaohe@huawei.com> writes:
>> 
>>> We should return errno (-EBUSY here) when failed to isolate the huge page
>>> rather than always return 1 which could confuse the user.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/migrate.c | 6 ++----
>>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 6c2dfed2ddb8..279940c0c064 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>>  		goto out_putpage;
>>>  
>>>  	if (PageHuge(page)) {
>>> -		if (PageHead(page)) {
>>> -			isolate_huge_page(page, pagelist);
>>> -			err = 1;
>>> -		}
>>> +		if (PageHead(page))
>>> +			err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;
>> 
>> IMHO, it's better to determine the proper errno inside
>> isolate_huge_page() instead of in the caller.  If you think it's
>> necessary to get errno here.  How about change isolate_huge_page()
>> instead?
>
> IMO, -EBUSY should be enough for the user (as they could not do much) and this
> errno keeps consistent with the non-hugetlb page case. What do you think?

I found the prototype of isolate_lru_page() is as follows,

  int isolate_lru_page(struct page *page)

And it will return errno directly.  I think we should follow same
convention here?

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/16] mm/migration: reduce the rcu lock duration
  2022-03-08 12:09     ` Miaohe Lin
@ 2022-03-09  1:02       ` Huang, Ying
  2022-03-09  8:28         ` Miaohe Lin
  0 siblings, 1 reply; 73+ messages in thread
From: Huang, Ying @ 2022-03-09  1:02 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel,
	Christoph Lameter, David Howells, Linus Torvalds

Miaohe Lin <linmiaohe@huawei.com> writes:

> On 2022/3/7 10:32, Huang, Ying wrote:
>> Miaohe Lin <linmiaohe@huawei.com> writes:
>> 
>>> rcu_read_lock is required by grabbing the task refcount but it's not
>>> needed for ptrace_may_access. So we could release the rcu lock after
>>> task refcount is successfully grabbed to reduce the rcu holding time.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/migrate.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index da5a81052468..26943bd819e8 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1907,17 +1907,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>>>  		return ERR_PTR(-ESRCH);
>>>  	}
>>>  	get_task_struct(task);
>>> +	rcu_read_unlock();
>>>  
>>>  	/*
>>>  	 * Check if this process has the right to modify the specified
>>>  	 * process. Use the regular "ptrace_may_access()" checks.
>>>  	 */
>>>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
>>> -		rcu_read_unlock();
>>>  		mm = ERR_PTR(-EPERM);
>>>  		goto out;
>>>  	}
>>> -	rcu_read_unlock();
>>>  
>>>  	mm = ERR_PTR(security_task_movememory(task));
>>>  	if (IS_ERR(mm))
>> 
>> Digged some history via `git blame`, found that the RCU read lock is
>> extended in the following commit,
>> 
>> "
>> 3268c63eded4612a3d07b56d1e02ce7731e6608e
>> Author:     Christoph Lameter <cl@linux.com>
>> AuthorDate: Wed Mar 21 16:34:06 2012 -0700
>> Commit:     Linus Torvalds <torvalds@linux-foundation.org>
>> CommitDate: Wed Mar 21 17:54:58 2012 -0700
>> 
>> mm: fix move/migrate_pages() race on task struct
>> 
>> Migration functions perform the rcu_read_unlock too early.  As a result
>> the task pointed to may change from under us.  This can result in an oops,
>> as reported by Dave Hansen in https://lkml.org/lkml/2012/2/23/302.
>> 
>> The following patch extend the period of the rcu_read_lock until after the
>> permissions checks are done.  We also take a refcount so that the task
>> reference is stable when calling security check functions and performing
>> cpuset node validation (which takes a mutex).
>> 
>> The refcount is dropped before actual page migration occurs so there is no
>> change to the refcounts held during page migration.
>> 
>> Also move the determination of the mm of the task struct to immediately
>> before the do_migrate*() calls so that it is clear that we switch from
>> handling the task during permission checks to the mm for the actual
>> migration.  Since the determination is only done once and we then no
>> longer use the task_struct we can be sure that we operate on a specific
>> address space that will not change from under us.
>> "
>> 
>> After that, the permission checking has been changed from __task_cred()
>> to ptrace_may_access().  So the situation may change somewhat.  Cced
>
> In ptrace_may_access, __task_cred is access while holding the rcu read lock.
> It seems this is ensured by the ptrace_may_access itself.

Please read the patch above.  Before extending rcu_read_lock protected
region, __task_cred() is protected by rcu_read_lock already.  The patch
above combines 2 regions into 1.

Best Regards,
Huang, Ying

>> some names found in git history to verify.
>
> Thanks for your carefulness.
>
>> 
>> Best Regards,
>> Huang, Ying
>> .
>> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/16] mm/migration: reduce the rcu lock duration
  2022-03-09  1:02       ` Huang, Ying
@ 2022-03-09  8:28         ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-09  8:28 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel,
	Christoph Lameter, David Howells, Linus Torvalds

On 2022/3/9 9:02, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> On 2022/3/7 10:32, Huang, Ying wrote:
>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>
>>>> rcu_read_lock is required by grabbing the task refcount but it's not
>>>> needed for ptrace_may_access. So we could release the rcu lock after
>>>> task refcount is successfully grabbed to reduce the rcu holding time.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/migrate.c | 3 +--
>>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index da5a81052468..26943bd819e8 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1907,17 +1907,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>>>>  		return ERR_PTR(-ESRCH);
>>>>  	}
>>>>  	get_task_struct(task);
>>>> +	rcu_read_unlock();
>>>>  
>>>>  	/*
>>>>  	 * Check if this process has the right to modify the specified
>>>>  	 * process. Use the regular "ptrace_may_access()" checks.
>>>>  	 */
>>>>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
>>>> -		rcu_read_unlock();
>>>>  		mm = ERR_PTR(-EPERM);
>>>>  		goto out;
>>>>  	}
>>>> -	rcu_read_unlock();
>>>>  
>>>>  	mm = ERR_PTR(security_task_movememory(task));
>>>>  	if (IS_ERR(mm))
>>>
>>> Digged some history via `git blame`, found that the RCU read lock is
>>> extended in the following commit,
>>>
>>> "
>>> 3268c63eded4612a3d07b56d1e02ce7731e6608e
>>> Author:     Christoph Lameter <cl@linux.com>
>>> AuthorDate: Wed Mar 21 16:34:06 2012 -0700
>>> Commit:     Linus Torvalds <torvalds@linux-foundation.org>
>>> CommitDate: Wed Mar 21 17:54:58 2012 -0700
>>>
>>> mm: fix move/migrate_pages() race on task struct
>>>
>>> Migration functions perform the rcu_read_unlock too early.  As a result
>>> the task pointed to may change from under us.  This can result in an oops,
>>> as reported by Dave Hansen in https://lkml.org/lkml/2012/2/23/302.
>>>
>>> The following patch extend the period of the rcu_read_lock until after the
>>> permissions checks are done.  We also take a refcount so that the task
>>> reference is stable when calling security check functions and performing
>>> cpuset node validation (which takes a mutex).
>>>
>>> The refcount is dropped before actual page migration occurs so there is no
>>> change to the refcounts held during page migration.
>>>
>>> Also move the determination of the mm of the task struct to immediately
>>> before the do_migrate*() calls so that it is clear that we switch from
>>> handling the task during permission checks to the mm for the actual
>>> migration.  Since the determination is only done once and we then no
>>> longer use the task_struct we can be sure that we operate on a specific
>>> address space that will not change from under us.
>>> "
>>>
>>> After that, the permission checking has been changed from __task_cred()
>>> to ptrace_may_access().  So the situation may change somewhat.  Cced
>>
>> In ptrace_may_access, __task_cred is access while holding the rcu read lock.
>> It seems this is ensured by the ptrace_may_access itself.
> 
> Please read the patch above.  Before extending rcu_read_lock protected
> region, __task_cred() is protected by rcu_read_lock already.  The patch
> above combines 2 regions into 1.
> 

Yep, you're right. Thanks.

> Best Regards,
> Huang, Ying
> 
>>> some names found in git history to verify.
>>
>> Thanks for your carefulness.
>>
>>>
>>> Best Regards,
>>> Huang, Ying
>>> .
>>>
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed
  2022-03-09  1:00       ` Huang, Ying
@ 2022-03-09  8:29         ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-09  8:29 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	ave.hansen, o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/9 9:00, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> On 2022/3/7 13:07, Huang, Ying wrote:
>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>
>>>> We should return errno (-EBUSY here) when failed to isolate the huge page
>>>> rather than always return 1 which could confuse the user.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/migrate.c | 6 ++----
>>>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 6c2dfed2ddb8..279940c0c064 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1618,10 +1618,8 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>>>  		goto out_putpage;
>>>>  
>>>>  	if (PageHuge(page)) {
>>>> -		if (PageHead(page)) {
>>>> -			isolate_huge_page(page, pagelist);
>>>> -			err = 1;
>>>> -		}
>>>> +		if (PageHead(page))
>>>> +			err = isolate_huge_page(page, pagelist) ? 1 : -EBUSY;
>>>
>>> IMHO, it's better to determine the proper errno inside
>>> isolate_huge_page() instead of in the caller.  If you think it's
>>> necessary to get errno here.  How about change isolate_huge_page()
>>> instead?
>>
>> IMO, -EBUSY should be enough for the user (as they could not do much) and this
>> errno keeps consistent with the non-hugetlb page case. What do you think?
> 
> I found the prototype of isolate_lru_page() is as follows,
> 
>   int isolate_lru_page(struct page *page)
> 
> And it will return errno directly.  I think we should follow same
> convention here?
> 

I see. Sounds reasonable to me. Will try to do it. Thanks.

> Best Regards,
> Huang, Ying
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-03-09  0:56       ` Huang, Ying
@ 2022-03-09  8:48         ` Miaohe Lin
  0 siblings, 0 replies; 73+ messages in thread
From: Miaohe Lin @ 2022-03-09  8:48 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, mike.kravetz, shy828301, willy, ziy, minchan, apopple,
	o451686892, almasrymina, jhubbard, rcampbell, peterx,
	naoya.horiguchi, mhocko, riel, linux-mm, linux-kernel

On 2022/3/9 8:56, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> On 2022/3/7 13:37, Huang, Ying wrote:
>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>
>>>> __migration_entry_wait and migration_entry_wait_on_locked assume pte is
>>>> always mapped from caller. But this is not the case when it's called from
>>>> migration_entry_wait_huge and follow_huge_pmd. And a parameter unmap to
>>>> indicate whether pte needs to be unmapped to fix this issue.
>>>
>>> This seems a possible issue.
>>>
>>> Have you tested it?  It appears that it's possible to trigger the
>>> issue.  If so, you can paste the error log here.
>>
>> This might happen iff on x86 machine with HIGHMEM enabled which is uncommon
>> now (at least in my work environment).
> 
> Yes.  32-bit isn't popular now.  But you can always test it via virtual
> machine.
> 

Good idea.

>> So I can't paste the err log. And The
>> issues from this series mainly come from the code investigating with some tests.
> 
> If not too complex, I still think it's better to test your code and
> verify the problem.

Sure! :)
Many thanks.

> 
> Best Regards,
> Huang, Ying
> 
>> Thanks. :)
>>
>>>
>>> BTW: have you tested the other functionality issues in your patchset?
>>>
>>> Best Regards,
>>> Huang, Ying
>>>
>>>> Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  include/linux/migrate.h |  2 +-
>>>>  include/linux/swapops.h |  4 ++--
>>>>  mm/filemap.c            | 10 +++++-----
>>>>  mm/hugetlb.c            |  2 +-
>>>>  mm/migrate.c            | 14 ++++++++------
>>>>  5 files changed, 17 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>>>> index 66a34eae8cb6..3ef4ff699bef 100644
>>>> --- a/include/linux/migrate.h
>>>> +++ b/include/linux/migrate.h
>>>> @@ -41,7 +41,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>>>>  extern int migrate_page_move_mapping(struct address_space *mapping,
>>>>  		struct page *newpage, struct page *page, int extra_count);
>>>>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>>> -				spinlock_t *ptl);
>>>> +				spinlock_t *ptl, bool unmap);
>>>>  void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
>>>>  void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
>>>>  int folio_migrate_mapping(struct address_space *mapping,
>>>> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
>>>> index d356ab4047f7..d66556875d7d 100644
>>>> --- a/include/linux/swapops.h
>>>> +++ b/include/linux/swapops.h
>>>> @@ -213,7 +213,7 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
>>>>  }
>>>>  
>>>>  extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>>> -					spinlock_t *ptl);
>>>> +					spinlock_t *ptl, bool unmap);
>>>>  extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>>>  					unsigned long address);
>>>>  extern void migration_entry_wait_huge(struct vm_area_struct *vma,
>>>> @@ -235,7 +235,7 @@ static inline int is_migration_entry(swp_entry_t swp)
>>>>  }
>>>>  
>>>>  static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>>> -					spinlock_t *ptl) { }
>>>> +					spinlock_t *ptl, bool unmap) { }
>>>>  static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>>>  					 unsigned long address) { }
>>>>  static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>> index 8f7e6088ee2a..18c353d52aae 100644
>>>> --- a/mm/filemap.c
>>>> +++ b/mm/filemap.c
>>>> @@ -1389,6 +1389,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>>>   * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
>>>>   *        for pte entries, pass NULL for pmd entries.
>>>>   * @ptl: already locked ptl. This function will drop the lock.
>>>> + * @unmap: indicating whether ptep need to be unmapped.
>>>>   *
>>>>   * Wait for a migration entry referencing the given page to be removed. This is
>>>>   * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
>>>> @@ -1402,7 +1403,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>>>>   * there.
>>>>   */
>>>>  void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>>> -				spinlock_t *ptl)
>>>> +				spinlock_t *ptl, bool unmap)
>>>>  {
>>>>  	struct wait_page_queue wait_page;
>>>>  	wait_queue_entry_t *wait = &wait_page.wait;
>>>> @@ -1439,10 +1440,9 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
>>>>  	 * a valid reference to the page, and it must take the ptl to remove the
>>>>  	 * migration entry. So the page is valid until the ptl is dropped.
>>>>  	 */
>>>> -	if (ptep)
>>>> -		pte_unmap_unlock(ptep, ptl);
>>>> -	else
>>>> -		spin_unlock(ptl);
>>>> +	spin_unlock(ptl);
>>>> +	if (unmap && ptep)
>>>> +		pte_unmap(ptep);
>>>>  
>>>>  	for (;;) {
>>>>  		unsigned int flags;
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index 07668781c246..8088128c25db 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -6713,7 +6713,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>>>>  	} else {
>>>>  		if (is_hugetlb_entry_migration(pte)) {
>>>>  			spin_unlock(ptl);
>>>> -			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
>>>> +			__migration_entry_wait(mm, (pte_t *)pmd, ptl, false);
>>>>  			goto retry;
>>>>  		}
>>>>  		/*
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 98a968e6f465..5519261f54fe 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -281,7 +281,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked)
>>>>   * When we return from this function the fault will be retried.
>>>>   */
>>>>  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>>> -				spinlock_t *ptl)
>>>> +				spinlock_t *ptl, bool unmap)
>>>>  {
>>>>  	pte_t pte;
>>>>  	swp_entry_t entry;
>>>> @@ -295,10 +295,12 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>>>>  	if (!is_migration_entry(entry))
>>>>  		goto out;
>>>>  
>>>> -	migration_entry_wait_on_locked(entry, ptep, ptl);
>>>> +	migration_entry_wait_on_locked(entry, ptep, ptl, unmap);
>>>>  	return;
>>>>  out:
>>>> -	pte_unmap_unlock(ptep, ptl);
>>>> +	spin_unlock(ptl);
>>>> +	if (unmap)
>>>> +		pte_unmap(ptep);
>>>>  }
>>>>  
>>>>  void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>>> @@ -306,14 +308,14 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>>>>  {
>>>>  	spinlock_t *ptl = pte_lockptr(mm, pmd);
>>>>  	pte_t *ptep = pte_offset_map(pmd, address);
>>>> -	__migration_entry_wait(mm, ptep, ptl);
>>>> +	__migration_entry_wait(mm, ptep, ptl, true);
>>>>  }
>>>>  
>>>>  void migration_entry_wait_huge(struct vm_area_struct *vma,
>>>>  		struct mm_struct *mm, pte_t *pte)
>>>>  {
>>>>  	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
>>>> -	__migration_entry_wait(mm, pte, ptl);
>>>> +	__migration_entry_wait(mm, pte, ptl, false);
>>>>  }
>>>>  
>>>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>>> @@ -324,7 +326,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>>>>  	ptl = pmd_lock(mm, pmd);
>>>>  	if (!is_pmd_migration_entry(*pmd))
>>>>  		goto unlock;
>>>> -	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
>>>> +	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl, false);
>>>>  	return;
>>>>  unlock:
>>>>  	spin_unlock(ptl);
>>> .
>>>
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2022-03-09  8:48 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04  9:33 [PATCH 00/16] A few cleanup and fixup patches for migration Miaohe Lin
2022-03-04  9:33 ` [PATCH 01/16] mm/migration: remove unneeded local variable mapping_locked Miaohe Lin
2022-03-04 13:48   ` Muchun Song
2022-03-07  2:00   ` Huang, Ying
2022-03-08 11:41     ` Miaohe Lin
2022-03-04  9:33 ` [PATCH 02/16] mm/migration: remove unneeded out label Miaohe Lin
2022-03-04 12:12   ` Muchun Song
2022-03-07  2:03   ` Huang, Ying
2022-03-08 11:44     ` Miaohe Lin
2022-03-04  9:33 ` [PATCH 03/16] mm/migration: remove unneeded local variable page_lru Miaohe Lin
2022-03-07 10:58   ` Alistair Popple
2022-03-08 11:29     ` Miaohe Lin
2022-03-04  9:33 ` [PATCH 04/16] mm/migration: reduce the rcu lock duration Miaohe Lin
2022-03-04 12:16   ` Muchun Song
2022-03-07  2:32   ` Huang, Ying
2022-03-08 12:09     ` Miaohe Lin
2022-03-09  1:02       ` Huang, Ying
2022-03-09  8:28         ` Miaohe Lin
2022-03-04  9:33 ` [PATCH 05/16] mm/migration: fix the confusing PageTransHuge check Miaohe Lin
2022-03-04 12:20   ` Muchun Song
2022-03-04  9:33 ` [PATCH 06/16] mm/migration: use helper function vma_lookup() in add_page_for_migration Miaohe Lin
2022-03-04  9:34 ` [PATCH 07/16] mm/migration: use helper macro min_t in do_pages_stat Miaohe Lin
2022-03-04 13:51   ` Muchun Song
2022-03-07  1:14   ` Andrew Morton
2022-03-07 11:51     ` Miaohe Lin
2022-03-04  9:34 ` [PATCH 08/16] mm/migration: avoid unneeded nodemask_t initialization Miaohe Lin
2022-03-04 13:57   ` Muchun Song
2022-03-07  2:31   ` Baolin Wang
2022-03-04  9:34 ` [PATCH 09/16] mm/migration: remove some duplicated codes in migrate_pages Miaohe Lin
2022-03-04 15:16   ` Zi Yan
2022-03-07  1:44   ` Baolin Wang
2022-03-04  9:34 ` [PATCH 10/16] mm/migration: remove PG_writeback handle in folio_migrate_flags Miaohe Lin
2022-03-07  1:21   ` Andrew Morton
2022-03-07 12:44     ` Miaohe Lin
2022-03-04  9:34 ` [PATCH 11/16] mm/migration: remove unneeded lock page and PageMovable check Miaohe Lin
2022-03-04  9:34 ` [PATCH 12/16] mm/migration: fix potential page refcounts leak in migrate_pages Miaohe Lin
2022-03-04 15:21   ` Zi Yan
2022-03-07  1:57   ` Baolin Wang
2022-03-07  5:02     ` Huang, Ying
2022-03-07  6:00       ` Baolin Wang
2022-03-07 12:03         ` Miaohe Lin
2022-03-07 12:01     ` Miaohe Lin
2022-03-07  5:01   ` Huang, Ying
2022-03-07 12:11     ` Miaohe Lin
2022-03-04  9:34 ` [PATCH 13/16] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
2022-03-05  2:23   ` Muchun Song
2022-03-07 11:46     ` Miaohe Lin
2022-03-07  2:14   ` Baolin Wang
2022-03-07 12:20     ` Miaohe Lin
2022-03-08  1:32       ` Baolin Wang
2022-03-08  6:34         ` Miaohe Lin
2022-03-07  5:07   ` Huang, Ying
2022-03-08 12:12     ` Miaohe Lin
2022-03-09  1:00       ` Huang, Ying
2022-03-09  8:29         ` Miaohe Lin
2022-03-04  9:34 ` [PATCH 14/16] mm/migration: fix potential invalid node access for reclaim-based migration Miaohe Lin
2022-03-07  2:25   ` Baolin Wang
2022-03-07  5:14     ` Huang, Ying
2022-03-07  7:04       ` Baolin Wang
2022-03-08 11:46         ` Miaohe Lin
2022-03-07  5:14   ` Huang, Ying
2022-03-04  9:34 ` [PATCH 15/16] mm/migration: fix possible do_pages_stat_array racing with memory offline Miaohe Lin
2022-03-07  5:21   ` Huang, Ying
2022-03-07  7:01     ` Muchun Song
2022-03-07  7:42       ` Huang, Ying
2022-03-08 11:33         ` Miaohe Lin
2022-03-04  9:34 ` [PATCH 16/16] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
2022-03-07  5:37   ` Huang, Ying
2022-03-08 12:19     ` Miaohe Lin
2022-03-09  0:56       ` Huang, Ying
2022-03-09  8:48         ` Miaohe Lin
2022-03-07  7:35   ` Alistair Popple
2022-03-08 11:55     ` Miaohe Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.