linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: fabecassis@nvidia.com, jhubbard@nvidia.com, cl@linux.com,
	vbabka@suse.cz, mgorman@techsingularity.net,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [v2 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
Date: Thu, 5 Dec 2019 12:31:56 +0100	[thread overview]
Message-ID: <20191205113156.GE28317@dhcp22.suse.cz> (raw)
In-Reply-To: <1575519678-86510-1-git-send-email-yang.shi@linux.alibaba.com>

On Thu 05-12-19 12:21:18, Yang Shi wrote:
> Felix Abecassis reports move_pages() would return random status if the
> pages are already on the target node by the below test program:
> 
> ---8<---
> 
> int main(void)
> {
> 	const long node_id = 1;
> 	const long page_size = sysconf(_SC_PAGESIZE);
> 	const int64_t num_pages = 8;
> 
> 	unsigned long nodemask =  1 << node_id;
> 	long ret = set_mempolicy(MPOL_BIND, &nodemask, sizeof(nodemask));
> 	if (ret < 0)
> 		return (EXIT_FAILURE);
> 
> 	void **pages = malloc(sizeof(void*) * num_pages);
> 	for (int i = 0; i < num_pages; ++i) {
> 		pages[i] = mmap(NULL, page_size, PROT_WRITE | PROT_READ,
> 				MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS,
> 				-1, 0);
> 		if (pages[i] == MAP_FAILED)
> 			return (EXIT_FAILURE);
> 	}
> 
> 	ret = set_mempolicy(MPOL_DEFAULT, NULL, 0);
> 	if (ret < 0)
> 		return (EXIT_FAILURE);
> 
> 	int *nodes = malloc(sizeof(int) * num_pages);
> 	int *status = malloc(sizeof(int) * num_pages);
> 	for (int i = 0; i < num_pages; ++i) {
> 		nodes[i] = node_id;
> 		status[i] = 0xd0; /* simulate garbage values */
> 	}
> 
> 	ret = move_pages(0, num_pages, pages, nodes, status, MPOL_MF_MOVE);
> 	printf("move_pages: %ld\n", ret);
> 	for (int i = 0; i < num_pages; ++i)
> 		printf("status[%d] = %d\n", i, status[i]);
> }
> ---8<---
> 
> Then running the program would return nonsense status values:
> $ ./move_pages_bug
> move_pages: 0
> status[0] = 208
> status[1] = 208
> status[2] = 208
> status[3] = 208
> status[4] = 208
> status[5] = 208
> status[6] = 208
> status[7] = 208
> 
> This is because the status is not set if the page is already on the
> target node, but move_pages() should return valid status as long as it
> succeeds.  The valid status may be errno or node id.
> 
> We can't simply initialize status array to zero since the pages may be
> not on node 0.  Fix it by updating status with node id which the page is
> already on.  And, it looks we have to update the status inside
> add_page_for_migration() since the page struct is not available outside
> it.

The code is indeed more complex than I wanted but I couldn't figure an
easier way back then. I wanted to keep store_status at a single place
because the failure handling is quite complex already.

> Make add_page_for_migration() return 1 if store_status() is failed in
> order to not mix up the status value since -EFAULT is also a valid
> status.

Can we simply return 1 when there is something to migrate instead?
Something like
diff --git a/mm/migrate.c b/mm/migrate.c
index 4fe45d1428c8..f3730804b8d4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1516,9 +1516,9 @@ static int do_move_pages_to_node(struct mm_struct *mm,
 /*
  * Resolves the given address to a struct page, isolates it from the LRU and
  * puts it to the given pagelist.
- * Returns -errno if the page cannot be found/isolated or 0 when it has been
- * queued or the page doesn't need to be migrated because it is already on
- * the target node
+ * Returns -errno if the page cannot be found/isolated or 0 when it doesn't have
+ * to be migrate or 1 dwhen it has been queued or the page doesn't need to be
+ * migrated because it is already on the target node
  */
 static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		int node, struct list_head *pagelist, bool migrate_all)
@@ -1557,7 +1557,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 	if (PageHuge(page)) {
 		if (PageHead(page)) {
 			isolate_huge_page(page, pagelist);
-			err = 0;
+			err = 1;
 		}
 	} else {
 		struct page *head;
@@ -1567,7 +1567,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		if (err)
 			goto out_putpage;
 
-		err = 0;
+		err = 1;
 		list_add_tail(&head->lru, pagelist);
 		mod_node_page_state(page_pgdat(head),
 			NR_ISOLATED_ANON + page_is_file_cache(head),
@@ -1644,8 +1644,14 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 		 */
 		err = add_page_for_migration(mm, addr, current_node,
 				&pagelist, flags & MPOL_MF_MOVE_ALL);
-		if (!err)
+		if (!err) {
+			err = store_status(status, i, current_node, 1);
+			if (err)
+				goto out_flush;
+			continue;
+		} else if (err > 0) {
 			continue;
+		}
 
 		err = store_status(status, i, err, 1);
 		if (err)

this would still keep store_status ugliness at a single place.

>
> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
> Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> Tested-by: Felix Abecassis <fabecassis@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: <stable@vger.kernel.org> 4.17+
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> v2: *Correted the return value when add_page_for_migration() returns 1.
> 
> John noticed another return value inconsistency between the implementation and
> the manpage.  The manpage says it should return -ENOENT if the page is already
> on the target node, but it doesn't.  It looks the original code didn't return
> -ENOENT either, I'm not sure if this is a document issue or not.  Anyway this
> is another issue, once we confirm it we can fix it later.

I do not remember all the details but my recollection is that there were
several inconsistencies present before I touched the code and I've
decided to not touch them without a clear usecase.
-- 
Michal Hocko
SUSE Labs


  parent reply	other threads:[~2019-12-05 11:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05  4:21 [v2 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node Yang Shi
2019-12-05  5:44 ` John Hubbard
2019-12-05  5:50   ` John Hubbard
2019-12-05 17:20   ` Yang Shi
2019-12-05  9:42 ` Qian Cai
2019-12-05 17:39   ` Yang Shi
2019-12-05 18:11     ` Qian Cai
2019-12-05 18:17       ` Christopher Lameter
2019-12-05 11:31 ` Michal Hocko [this message]
2019-12-05 17:18   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191205113156.GE28317@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=fabecassis@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).